BUS-S 524 Quiz
Combining data from weather sensors with individuals' images posted to social networks.
An example of the variety of data available today is best highlighted by which of the following options below
DESCRIBE abc;
Which of the following can be used to display the schema of a Hive table named "abc"?
TRUNCATE
Which of the following command deletes data from an HBase table but still keeps the table structure?
NoSQL
Which of the following is NOT a basic component of Hadoop?
Spark Core
Which of the following is NOT a library that resides on the Spark platform?
Calculate the total runs for each player and insert the top 5 players with highest total runs into a Hive table
Choose the best description of the following statement: CREATE TABLE myTable AS SELECT player_id, SUM(runs) sumruns FROM batting GROUP BY player_id ORDER BY sumruns DESC LIMIT 5;
True
During execution, each Pig statement is checked and verified. If a statement is valid, it gets added to a logical plan. The steps in the logical plan do not execute until some special Pig command(s) is(are) used.
Column Family
HBase is an example of ______________ NoSQL database.
All of the above
Hive's language supports which of the following:
<k, list(v)>
In Hadoop, the input of a Reduce function takes the form of which of the following? (k=key, v=value)
relation
In Pig, a _________ is a collection of rows/tuples.
NameNode
In a Hadoop Cluster, which node is tasked with the responsibility of maintaining metadata regarding block storage locations for files?
Document
MongoDB is an example of ______________ NoSQL database.
Hive Metastore
Where is the storage location for any metadata (e.g., table schema) created during the use of Hive?
LIMIT
Which of the following Pig keyword can be used to keep the first 10 records in a Pig relation?
False
The DESCRIBE operator can be used to display the content in a Pig bag.
BASE
The NoSQL database uses a(n) ______________ system for data processing.
Pig Latin; complier
The two main components of Pig are _______.
Volume, Variety, Velocity
What are the three Vs that are commonly used to characterize big data?
Not only SQL
What does NoSQL stand for?
Blocks of the same file will be stored on the same datanode.
Which of the following statement about HDFS is false?
NoSQL databases follow the relational model to map and model data
Which of the following statement about NoSQL databases is NOT true?
Select two columns (i.e., age and salary) from an existing relation 'A'.
Which of the following statement correctly describes the Pig command below? B = FOREACH A GENERATE age, salary;
Each MapReduce job must involve at least one reducer.
Which of the following statement is False?
Pig can handle only the data that has metadata defined
Which of the following statement is False?
I and II
Which of the following statement is true: I. DataNodes send heartbeats to NameNode to report the status and update the metadata. II. When writing a file to HDFS, the client talks directly to NameNode to find out which DataNode(s) the file can be written to. III. When writing a file to HDFS, the file is first transferred to NameNode. NameNode will further transfer the file to DataNodes through heartbeats.
ii only
Which of the following statement is/are true about Spark? i. Spark resides on Hadoop and is designed based on HDFS ii. Spark performs in-memory computation and stores all intermediate outputs in RAM iii. Scala is a distributed machine learning framework that sits above Spark
i and iii
Which of the following statement(s) is(are) True about Apache Pig's philosophy? i. Pig can process both structured and unstructured data. ii. Pig is intended to work exclusively with the MapReduce framework. iii. Pig has an optimizer that can figure out how to do the work quickly (e.g., by rearranging some operations to give better performance, by combining Map Reduce jobs together, etc.)
i and ii
Which of the following statement(s) is(are) True? i. The shuffle phase is responsible for sorting the key/value pairs and generating a list of values for each unique key. ii. Map function applies the same operation to every element in an array. iii. Map function can be processed in parallel on multiple nodes, reduce function can only be processed on one node.
MongoDB support cross-collection ad-hoc queries.
Which of the following statements about MongoDB is not true?
Hive is designed to handle a wide variety of data types, including structured, unstructured, and semi-structured data.
Which of the following statements is false?
None of the above
Which of the following statements is true?
I only
Hadoop is considered a scale-out (horizontal) computing technology because of which of the following reasons: I. Data is stored across many commodity hardware nodes II. All the data is stored in memory for fast access III. All the data is transferred to one node for processing
The schema gets dropped without dropping the data.
What happens when one drops an external table?
DataNode
What is not a master nodes on a Hadoop cluster?
MapReduce
What is the "analytics/processing engine" of a Hadoop cluster?
This task will be assigned to a different work node to run again.
When a MapReduce job is distributed to multiple workers, what happens if one task does not report back within reasonable time in the first run?
Hive
Which of the following abstractions that we discussed is referred to as the "data warehouse" component of Hadoop?
PUT
Which of the following keyword can be used for updating an existing row in an HBase table?
f(x) = 1
Which of the following map function is used in the word count program?