CISD 42

Ace your homework & exams now with Quizwiz!

Each database created in hive is stored as A file A directory A HDFS block A jar fill

A directory

How to change the column data type in Hive ALTER and CHANGE ALTER CHANGE

ALTER and CHANGE

What is Hive used as? Hadoop query engine MapReduce wrapper Hadoop SQL interface All of the above

All of the above

Which of the following is the join in Hive? Join Full outer join Right outer join All of the above

All of the above

________ is the slave/worker node and holds the user data in the form of Data Blocks. a) DataNode b) NameNode c) Data block d) Replication

Answer: a Explanation: A DataNode stores data in the [HadoopFileSystem]. A functional filesystem has more than one DataNode, with data replicated across them.

For ________ the HBase Master UI provides information about the HBase Master uptime. a) HBase b) Oozie c) Kafka d) All of the mentioned

Answer: a Explanation: HBase Master UI provides information about the number of live, dead and transitional servers, logs, ZooKeeper information, debug dumps, and thread stacks.

A ________ serves as the master and there is only one NameNode per cluster. a) Data Node b) NameNode c) Data block d) Replication

Answer: b Explanation: All the metadata related to HDFS including the information about data nodes, files stored on HDFS, and Replication, etc. are stored and maintained on the NameNode.

HDFS is implemented in _____________ programming language. a) C++ b) Java c) Scala d) None of the mentioned

Answer: b Explanation: HDFS is implemented in Java and any computer which can run Java can host a NameNode/DataNode on it.

During start up, the ___________ loads the file system state from the fsimage and the edits log file. a) DataNode b) NameNode c) ActionNode d) None of the mentioned

Answer: b Explanation: HDFS is implemented on any computer which can run Java can host a NameNode/DataNode on it.

HDFS provides a command line interface called __________ used to interact with HDFS. a) "HDFS Shell" b) "FS Shell" c) "DFS Shell" d) None of the mentioned

Answer: b Explanation: The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS).

For YARN, the ___________ Manager UI provides host and port information. a) Data Node b) NameNode c) Resource d) Replication

Answer: c Explanation: All the metadata related to HDFS including the information about data nodes, files stored on HDFS, and Replication, etc. are stored and maintained on the NameNode.

________ NameNode is used when the Primary NameNode goes down. a) Rack b) Data c) Secondary d) None of the mentioned

Answer: c Explanation: Secondary namenode is used for all time availability and reliability.

The need for data replication can arise in various scenarios like ____________ a) Replication Factor is changed b) DataNode goes down c) Data Blocks get corrupted d) All of the mentioned

Answer: d Explanation: Data is replicated across different DataNodes to ensure a high degree of fault-tolerance.

Which of the following scenario may not be a good fit for HDFS? a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file b) HDFS is suitable for storing data related to applications requiring low latency data access c) HDFS is suitable for storing data related to applications requiring low latency data access d) None of the mentioned

Answer: a Explanation: HDFS can be used for storing archive data since it is cheaper as HDFS allows storing the data on low cost commodity hardware while ensuring a high degree of fault-tolerance.

HDFS works in a __________ fashion. a) master-worker b) master-slave c) worker/slave d) all of the mentioned

Answer: a Explanation: NameNode servers as the master and each DataNode servers as a worker/slave

Point out the correct statement. a) The Hadoop framework publishes the job flow status to an internally running web server on the master nodes of the Hadoop cluster b) Each incoming file is broken into 32 MB by default c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance d) None of the mentioned

Answer: a Explanation: The web interface for the Hadoop Distributed File System (HDFS) shows information about the NameNode itself. advertisement

Point out the correct statement. a) DataNode is the slave/worker node and holds the user data in the form of Data Blocks b) Each incoming file is broken into 32 MB by default c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance d) None of the mentioned

Answer: a Explanation: There can be any number of DataNodes in a Hadoop Cluster.

Point out the wrong statement. a) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file level b) Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode c) User data is stored on the local file system of DataNodes d) DataNode is aware of the files to which the blocks stored on it belong to

Answer: d Explanation: NameNode is aware of the files to which the blocks stored on it belong to.

Managed tables in Hive: Can load the data only from HDFS Can load the data only from local file system Are useful for enterprise wide data Are Managed by Hive for their data and metadata

Are Managed by Hive for their data and metadata

The default delimiter in hive to separate the element in STRUCT is A - '\001' B - '\oo2' C - '\oo3' D - '\oo4'

B - '\oo2'

To see the partitions present in a Hive table the command used is A - Describe B - show C - describe extended D - show extended

B - show

By default when a database is dropped in Hive A - the tables are also deleted B - the directory is deleted if there are no tables C - the hdfs blocks are formatted D - Only the comments associated with database is deleted

B - the directory is deleted if there are no tables

The export and import of data between sqoop and relational system happens through which of the following programs? A. Sqoop client program B. Mapreduce job submitted by the sqoop command C. Database stores procedure D. Hdfs file management program

B. Mapreduce job submitted by the sqoop command

HiveServer2 introduced in Hive 0.11 has a new CLI called BeeLine SqlLine HiveLine CLilLine

BeeLine

On dropping a managed table The schema gets dropped without dropping the data The data gets dropped without dropping the schema An error is thrown Both the schema and the data is dropped

Both the schema and the data is dropped

________ text is appropriate for most non-binary data types. A. Character B. Binary C. Delimited D. None of the mentioned

C. Delimited

The fields parsed by ____________ are backed by an internal buffer. A. LargeObjectLoader B. ProcessingException C. RecordParser D. None of the Mentioned

C. RecordParser

The following tool imports a set of tables from an RDBMS to HDFS A. export-all-tables B. import-all-tables C. import-tables D. none of the mentioned

C. import-tables

Which of the following function will remove duplicates? COLLECT_LIST() COLLECT_SET()

COLLECT_SET()

The partition of an Indexed table is dropped. then, Corresponding partition from all indexes are dropped. No indexes are dropped Indexes refresh themselves automatically Error is shown asking to first drop the indexes

Corresponding partition from all indexes are dropped.

By default the records from databases imported to HDFS by sqoop are A - Tab separated B - Concatenated columns C - space separated D - comma separated

D - comma separated The default record separator is comm.

Which of the following query displays the name of the database, the root location on the file system and comments if any. Describe extended Show Describe Show extended

Describe

Which of the following gives the details of the database or schema in a detailed manner. Describe extended Show Describe Show extended

Describe extended

If an Index is dropped then The directory containing the index is deleted The underlying table is not dropped The underlying table is also dropped Error is thrown by hive

Error is thrown by hive

The reverse() function reverses a string passed to it in a Hive query. This is an example of Standard UDF Aggregate UDF Table Generating UDF None of the above

Standard UDF

The query "SHOW DATABASE LIKE 'h.*' gives the output with database name Containing h in their name Starting with h Ending with h Containing 'h.'

Starting with h

The thrift service component in hive is used for Moving hive data files between different servers Use multiple hive versions Submit hive queries from a remote client Installing hive

Submit hive queries from a remote client

By default when a database is dropped in Hive The tables are also deleted The directory is deleted if there are no tables The HDFS blocks are formatted None of the above

The directory is deleted if there are no tables

On dropping a external table The schema gets dropped without dropping the data The data gets dropped without dropping the schema An error is thrown Both the schema and the data is dropped

The schema gets dropped without dropping the data

For optimizing join of three tables, the largest sized tables should be placed as The first table in the join clause Second table in the join clause Third table in the join clause Does not matter

Third table in the join clause

Explode in Hive is used to convert complex data types into desired table formats. True False

True

Can we run UNIX shell commands from Hive? Yes No

Yes

Point out the correct statement. a) Hive is not a relational database, but a query engine that supports the parts of SQL specific to querying data b) Hive is a relational database with SQL support c) Pig is a relational database with SQL support d) All of the mentioned

a) Hive is a SQL-based data warehouse system for Hadoop that facilitates data summarization, ad hoc queries, and the analysis of large datasets stored in Hadoop-compatible file systems.

_________ is the base class for all implementations of InputFormat that use files as their data source. a) FileTextFormat b) FileInputFormat c) FileOutputFormat d) None of the mentioned

b Explanation: FileInputFormat provides implementation for generating splits for the input files.

In _____________ the default job is similar, but not identical, to the Java equivalent. a) Mapreduce b) Streaming c) Orchestration d) All of the mentioned

b Explanation: MapReduce Types and Formats MapReduce has a simple model of data processing.

______ is a framework for performing remote procedure calls and data serialization. a) Drill b) BigTop c) Avro d) Chukwa

c)Avro In the context of Hadoop, Avro can be used to pass data from one program or language to another.

_______ jobs are optimized for scalability but not latency. a) Mapreduce b) Drill c) Oozie d) Hive

d) Hive Hive Queries are translated to MapReduce jobs to exploit the scalability of MapReduce.

You have one column in hive table named as "my_ts" having datatype as string and sample value like "2018-02-24 17:22:35". how would you extract only day from it i.e. 24? extract(myts) not possible directly day(myts) get_date(myts)

day(myts)

Which of the following data type is supported by Hive? map record string enum

enum

What will be the output of CONCAT_WS('|','hey','coder','how','are','you') hey|coder|how|are|you 'heycoderhowareyou' 'hey,coder,how,are,you' 'hey coder how are you'

hey|coder|how|are|you

The 2 default TBLPROPERTIES added by hive when a hive table is created is hive_version and last_modified by last_modified_by and last_modified_time last_modified_time and hive_version last_modified_by and table_location

last_modified_by and last_modified_time

Which of the following function will return the size of string? size() length()

length()

Hive uses _________ for logging. logj4 log4l log4i log4j

log4j

What is the output of regexp_replace("pqrser", "qr|er", "") ? ps pqr|ser qrer true

ps

Although the Hadoop framework is implemented in Java, MapReduce applications need not be written in ____________. a) Java b) C c) C# d) None of the mentioned

a) Jave Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce applications (non JNITM based).

_________ function is responsible for consolidating the results produced by each of the Map() functions/tasks. a) Reduce b) Map c) Reducer d) All of the mentioned

a) Reduce Reduce function collates the work and resolves the results.

What is the extension of hive query file? .txt .hive .sql .hql

.hql

Which of the following hint is used to optimize the join queries /* joinlast(table_name) */ /* joinfirst(table_name) */ /* streamtable(table_name) */ /* cacheable(table_name) */

/* streamtable(table_name) */

When a Hive query joins 3 tables, How many mapreduce jobs will be started? 0 1 2 3

2

Which of the following is used to analyse data stored in Hadoop cluster using SQL like query Mahoot Hive Pig All of the above

Hive

Calling a unix bash script inside a Hive Query is an example of Hive Pipeline Hive Caching Hive Forking Hive Streaming

Hive Streaming

Which of the following is true for Hive? Hive is the database of Hadoop Hive supports schema checking Hive doesn't allow row level updates Hive can replace an OLTP system

Hive doesn't allow row level updates

Hive can be accessed remotely by using programs written in C++, Ruby etc, over a single port. This is achieved by using HiveServer HiveMetaStore HiveWeb Hive Streaming

HiveServer

Which of the following is not a complex data type in Hive? Matrix Array Map STRUCT

Matrix

Are multiline comments supported in Hive? Yes No

No

Can the default "Hive Metastore" be used by multiple users (processes) at the same time? Yes No

No

When a partition is archived in Hive it Reduces space through compression Reduces the length of records Reduces the number of files stored Reduces the block size

Reduces the number of files stored

In which mode HiveServer2 only accepts valid Thrift calls. Remote HTTP Embedded Interactive

Remote

Each database created in hive is stored as A - a directory B - a file C - a hdfs block D - a jar file

A - a directory

If the database contains some tables then it can be forced to drop without dropping the tables by using the keyword A - RESTRICT B - OVERWRITE C - F DROP D - CASCADE

D - CASCADE

Is it possible to change the default location of Managed Tables in Hive Yes No

Yes

Which of the following method will remove the spaces from both the ends of " bigdata " ? whitespace() substring() remove() trim()

trim()

How would you delete the data of hive table without deleting the table? drop <table_name>; remove <table_name>; disable <table_name>; truncate <table_name>;

truncate <table_name>;

To see the data types details of only a column (not the table) we should use the command A - DESCRIBE B - DESCRIBE EXTENDED C - DESCRIBE FORMATTED D - DESCRIBE COLUMN

A - DESCRIBE

In a table import the name of the mapreduce job A - Is named after the table name B - Can be customized C - Can be passed as a query parameter D - Is a random name decided by the system.

A - Is named after the table name The name of the job is based on the name of the table which is being imported.

Which of the following is not a complex data type in Hive? A - Matrix B - Array C - Map D - STRUCT

A - Matrix

On dropping an external table A - The schema gets dropped without dropping the data B - The data gets dropped without dropping the schema C - An error is thrown D - Both the schema and the data is dropped

A - The schema gets dropped without dropping the data

The parameter used to identify the individual row in HBase while importing data to it using sqoop is A - --hbase-row-key B - --hbase-rowkey C - --hbase-rowid D - --hbase-row-id

A - --hbase-row-key the parameter --hbase-row-key is used in sqoop to identify each row in the HBase table.

The temporary location to which sqoop moves the data before loading into hive is specified by the parameter A - --target-dir B - --source-dir C - --hive-dir D - --sqoop-dir

A - --target-dir The --target-dir parameter mentions the directory used for temporary staging the data before loading into the hive table.

The partition of an Indexed table is dropped. then, A - Corresponding partition from all indexes are dropped. B - No indexes are dropped C - Indexes refresh themselves automatically D - Error is shown asking to first drop the indexes

A - Corresponding partition from all indexes are dropped.

A table contains 4 columns (C1,C2,C3,C4). With -update-key C2,C4, the sqoop generated query will be like A - Update table set C1 = 'newval', c3 = 'newval' where c2 = 'oldval' and c4 = 'oldval' B - Update table set C2 = 'newval', c4 = 'newval' where c2 = 'oldval' and c4 = 'oldval' C - Update table set C1 = 'newval', c2 = 'newval', c3 = 'newval', c4 = 'newval' where c2 = 'oldval' and c4 = 'oldval' D - None

A - Update table set C1 = 'newval', c3 = 'newval' where c2 = 'oldval' and c4 = 'oldval' only the columns other than in the -update-key parameter will be appear in the SET clause.

The tables created in hive are stored as A - a subdirectory under the database directory B - a file under the database directory C - a hdfs block containing the database directory D - a .java file present in the database directory

A - a subdirectory under the database directory

Hive is A - schema on read B - schema on write C - schema on update D - all the above

A - schema on read

Which of the following is the data types in Hive ARRAY STRUCT MAP All the above

All the above

What is achieved by the command - sqoop job -exec myjob A - Sqoop job named myjob is saved to sqoop metastore B - Sqoop job named myjob starts running C - Sqoop job named myjob is scheduled D - Sqoop job named myjob gets created

B - Sqoop job named myjob starts running This is the command to execute a sqoop job already saved in the metastore.

_________ tool can list all the available database schemas. A. Sqoop-list-tables B. sqoop-list-databases C. sqoop-list-schema D. sqoop-list-columns

B. sqoop-list-databases

The results of a hive query can be stored as A - local file B - hdfs file C - both D - can not be stored

C - both

Which of the following will cast a column "a" having value 3.2 to 3? INT(a) a.to_int CAST(a as Float) CAST(a as INT)

CAST(a as INT)

The difference between the MAP and STRUCT data type in Hive is A - MAP is Key-value pair but STRUCT is series of values B - There can not be more than one MAP dat type column in a table but more than one STRUCT data type in a table is allowed. C - The Keys in MAP can not be integers but in STRUCT they can be. D - Only one pair of data types is allowed in the key-value pair of MAP while mixed types are allowed in STRUCT.

D - Only one pair of data types is allowed in the key-value pair of MAP while mixed types are allowed in STRUCT.

The drawback of managed tables in hive is A - they are always stored under default directory B - They cannot grow bigger than a fixed size of 100GB C - They can never be dropped D - They cannot be shared with other applications

D - They cannot be shared with other applications

How do we decide the order of columns in which data is loaded to the target table? A - By using -- order by parameter B - By using a new mapreduce job aftet submitting sqoop export command C - By using a database stored procedure D - By using -columns parameter with comma separated column names in the required order.

D - By using -columns parameter with comma separated column names in the required order. we can use the -column parameter and specify the required column in the required order.

In hive when the schema does not match the file content A - It cannot read the file B - It reads only the string data type C - it throws an error and stops reading the file D - It returns null values for mismatched fields.

D - It returns null values for mismatched fields.

Point out the correct statement Hive is not a relational database, but a query engine that supports the parts of SQL Hive is a relational database with SQL support Pig is a relational database with SQL support None of the above

Hive is not a relational database, but a query engine that supports the parts of SQL

If the schema of the table does not match with the data types present in the file containing the table then Hive Automatically drops the file Automatically corrects the data Reports Null values for mismatched data Does not allow any query to run on the table

Reports Null values for mismatched data

Which among the following command is used to change the settings within Hive session RESET SET

SET

Which of the following is NOT a window function? DENSE_RANK() ROW_NUMBER() SPLIT() RANK()

SPLIT()

In Hive you can copy The schema without the data The data without the schema Both schema and its data Neither the schema nor its data

The schema without the data

Which of the following method add a path or paths to the list of inputs? a) setInputPaths() b) addInputPath() c) setInput() d) none of the mentioned

b Explanation: FileInputFormat offers four static convenience methods for setting a JobConf input paths.

An input _________ is a chunk of the input that is processed by a single map. a) textformat b) split c) datanode d) all of the mentioned

b Explanation: Each split is divided into records, and the map processes each record—a key-value pair—in turn.

Is it possible to overwrite Hadoop MapReduce configuration in Hive? Yes No

Yes

Point out the wrong statement. a) Elastic MapReduce (EMR) is Facebook's packaged Hadoop offering b) Amazon Web Service Elastic MapReduce (EMR) is Amazon's packaged Hadoop offering c) Scalding is a Scala API on top of Cascading that removes most Java boilerplate d) All of the mentioned

a) Rather than building Hadoop deployments manually on EC2 (Elastic Compute Cloud) clusters, users can spin up fully configured Hadoop installations using simple invocation commands, either through the AWS Web Console or through command-line tools.

Point out the correct statement. a) MapReduce tries to place the data and the compute as close as possible b) Map Task in MapReduce is performed using the Mapper() function c) Reduce Task in MapReduce is performed using the Map() function d) All of the mentioned

a) This feature of MapReduce is "Data Locality".

__________ maps input key/value pairs to a set of intermediate key/value pairs. a) Mapper b) Reducer c) Both Mapper and Reducer d) None of the mentioned

a) Mapper Maps are the individual tasks that transform input records into intermediate records.

___________ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results. a) Maptask b) Mapper c) Task execution d) All of the mentioned

a) Maptask Map Task in MapReduce is performed using the Map() function.

Hive also support custom extensions written in ____________. a) C# b) Java c) C d) C++

b) Java Hive also support custom extensions written in Java, including user-defined functions (UDFs) and serializer-deserializers for reading and optionally writing custom formats.

______________ is another implementation of the MapRunnable interface that runs mappers concurrently in a configurable number of threads. a) MultithreadedRunner b) MultithreadedMap c) MultithreadedMapRunner d) SinglethreadedMapRunner

c Explanation: A RecordReader is little more than an iterator over records, and the map task uses one to generate record key-value pairs, which it passes to the map function.

_________ hides the limitations of Java behind a powerful and concise Clojure API for Cascading. a) Scalding b) HCatalog c) Cascalog d) All of the mentioned

c) Cascalog Cascalog also adds Logic Programming concepts inspired by Datalog. Hence the name "Cascalog" is a contraction of Cascading and Datalog.

_________ is the default Partitioner for partitioning key space. a) HashPar b) Partitioner c) HashPartitioner d) None of the mentioned

c) HashPartitioner The default partitioner in Hadoop is the HashPartitioner which has a method called getPartition to partition.

Point out the wrong statement. a) If V2 and V3 are the same, you only need to use setOutputValueClass() b) The overall effect of Streaming job is to perform a sort of the input c) A Streaming application can control the separator that is used when a key-value pair is turned into a series of bytes and sent to the map or reduce process over standard input d) None of the mentioned

d Explanation: If a combine function is used then it is the same form as the reduce function, except its output types are the intermediate key and value types (K2 and V2), so they can feed the reduce function.

Point out the correct statement. a) The reduce input must have the same types as the map output, although the reduce output types may be different again b) The map input key and value types (K1 and V1) are different from the map output types c) The partition function operates on the intermediate key d) All of the mentioned

d Explanation: In practice, the partition is determined solely by the key (the value is ignored).

Point out the wrong statement. a) A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner b) The MapReduce framework operates exclusively on <key, value> pairs c) Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods d) None of the mentioned

d) The MapReduce framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.

In Hive you can copy A - The schema without the data B - The data without the schema C - Both schema and it's data D - neither the schema nor its data

A - The schema without the data

What is achieved by using the --meta-connect parameter in a sqoop command? A - run metastore as a service accessible remotely B - run metastore as a service accessible locally C - connect to the meastore tables D - connect to the metadata of the external relational tables form which data has to be imported

A - run metastore as a service accessible remotely with the --meta-connect parameter the metastore starts running as a service with the default port 16000.Now this metastore service becomes accessible throughout the cluster.

The "strict" mode when querying a partitioned table is used to A - stop queries of partitioned tables without a where clause B - automatically add a where clause to the queries on a partitioned table C - Limit the result of a query on partitioned table to 100 D - Ignore any error in the name of the partitioned table

A - stop queries of partitioned tables without a where clause

The tables created in hive are stored as A file under the database directory A subdirectory under the database directory A .java file present in the database directory A HDFS block containing the database directory

A subdirectory under the database directory

Which of the following is the Key components of Hive Architecture User Interface Metastore Driver All of the above

All of the above

Which of the following is the commonly used Hive services Command Line Interface (cli) Hive Web Interface (hwi) HiveServer (hiveserver) All of the above

All of the above

What Hive can not offer A - storing data in tables and columns B - Online transaction processing C - Handling date time data D - Partitioning stored data

B - Online transaction processing

The main advantage of creating table partition is A - Effective storage memory utilization B - faster query performance C - Less RAM required by namenode D - simpler query syntax

B - faster query performance

The 2 default TBLPROPERTIES added by hive when a hive table is created is A - hive_version and last_modified by B - last_modified_by and last_modified_time C - last_modified_time and hive_version D - last_modified_by and table_location

B - last_modified_by and last_modified_time

The query "SHOW DATABASE LIKE 'h.*' gives the output with database name A - containing h in their name B - starting with h C - ending with h D - containing 'h.'

B - starting with h

The partitioning of a table in Hive creates more A - subdirectories under the database name B - subdirectories under the table name C - files under databse name D - files under the table name

B - subdirectories under the table name

What does the --last-value parameter in sqoop incremental import signify? A - What is the number of rows sucessfully imported in append type import B - what is the date value to be used to select the rows for import in the last_update_date type import C - Both of the above D - The count of the number of rows that were succesful in the current import.

C - Both of the above Sqoop uses the --last-value parameter in both the append mode and the last_update_date mode to import the incremental data form source.

Using the ALTER DATABASE command in an database you can change the A - database name B - database creation time C - dbproperties D - directory where the database is stored

C - dbproperties

The thrift service component in hive is used for A - moving hive data files between different servers B - use multiple hive versions C - submit hive queries from a remote client D - Installing hive

C - submit hive queries from a remote client

Which of the following is a disadvantage of using the -staging-table parameter? A - Data is stored twice and consumes more memory B - The overall export time is more than direct export to final table C - User should ensure the structure of staging table and final tables are in Sync. D - All of the above

D - All of the above All the listed options are disadvantages while using the -staging-table option.

On dropping a managed table A - The schema gets dropped without dropping the data B - The data gets dropped without dropping the schema C - An error is thrown D - Both the schema and the data is dropped

D - Both the schema and the data is dropped

The drawback of managed tables in hive is They are always stored under default directory They cannot grow bigger than a fixed size of 100GB They can never be dropped They cannot be shared with other applications

They cannot be shared with other applications

Running a ___________ program involves running mapping tasks on many or all of the nodes in our cluster. a) MapReduce b) Map c) Reducer d) All of the mentioned

a) MapReduce In some applications, component tasks need to create and/or write to side-files, which differ from the actual job-output files.

The Pig Latin scripting language is not only a higher-level data flow language but also has operators similar to ____________. a) SQL b) JSON c) XML d) All of the mentioned

a) SQL Pig Latin, in essence, is designed to fill the gap between the declarative style of SQL and the low-level procedural style of MapReduce.

The number of maps is usually driven by the total size of ____________. a) inputs b) outputs c) tasks d) None of the mentioned

a) inputs Total size of inputs means the total number of blocks of the input files.

___________ is general-purpose computing model and runtime system for distributed data analytics. a) Mapreduce b) Drill c) Oozie d) None of the mentioned

a)Mapreduce Mapreduce provides a flexible and scalable foundation for analytics, from traditional reporting to leading-edge machine learning algorithms.

Which of the following is the only way of running mappers? a) MapReducer b) MapRunner c) MapRed d) All of the mentioned

b Explanation: Having calculated the splits, the client sends them to the jobtracker.

___________ generates keys of type LongWritable and values of type Text. a) TextOutputFormat b) TextInputFormat c) OutputInputFormat d) None of the mentioned

b Explanation: If K2 and K3 are the same, you don't need to call setMapOutputKeyClass().

________ is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer. a) Hadoop Strdata b) Hadoop Streaming c) Hadoop Stream d) None of the mentioned

b) Hadoop streaming is one of the most important utilities in the Apache Hadoop distribution.

________ is a platform for constructing data flows for extract, transform, and load (ETL) processing and analysis of large datasets. a) Pig Latin b) Oozie c) Pig d) Hive

c) Pig Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs.

A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker. a) MapReduce b) Mapper c) TaskTracker d) JobTracker

c) TaskTracker TaskTracker receives the information necessary for the execution of a Task from JobTracker, Executes the Task, and Sends the Results back to JobTracker.

An ___________ is responsible for creating the input splits, and dividing them into records. a) TextOutputFormat b) TextInputFormat c) OutputInputFormat d) InputFormat

d Explanation: As a MapReduce application writer, you don't need to deal with InputSplits directly, as they are created by an InputFormat.

________ is the most popular high-level Java API in Hadoop Ecosystem a) Scalding b) HCatalog c) Cascalog d) Cascading

d) Cascading Cascading hides many of the complexities of MapReduce programming behind more intuitive pipes and data flow abstractions.

Using the ALTER DATABASE command in an database you can change the Database name dbproperties Database creation time Directory where the database is stored

dbproperties


Related study sets

UNIT 3 Chapter 19 Inflammation and the Immune Response

View Set

NSCC HMGT-2670 Front Office Procedures Final Ch 9-15

View Set

Section 17: AWS Elastic Beanstalk

View Set

Intermediate Accounting 3313 (Elsie Ameen) Ch. 4

View Set

Research Methods II - Chapter 7 Quiz

View Set

Chapter 20: Respiratory Function

View Set

MAP Chapter 25 Adaptive Follow-up

View Set