Big Data Test 1
A ________ serves as the master and there is only one NameNode per cluster.
nameNode
Which of the following stores metadata? 1. NameNode 2. datanode 3. Secondary Datanode 4 All of the above
namenode
What are the two majority types of nodes in HDFS?
namenode DataNode
The ____________ is the ultimate authority that arbitrates resources among all the applications in the system.
ResourceManager
Volume
Data is huge, need clustered or distributed system
Velocity
Data is often streaming. Constantly updated
6 Vs of Big Data
Volume, Velocity, Variety, Veracity, Value,Valence
What are the two basic layers comprising the Hadoop Architecture?
MapReduce and HDFS
What is the following command to upload a file into HDFS?
hdfs dfs -put FILENAME HDFS_PATH
Pig operates in mainly how many mode:
2
How many blocks in HDFS by giving a file of 10GB, with the default block size of 64MB, and 3 replications.
480
What is the organizing data structure for map/reduce programs?
A list of identification keys and some value associated with that identifier
The __________ is a framework-specific entity that negotiates resources from the ResourceManager
ApplicationMaster
In counting the total word count example, the algorithm is: 1. Cascading Map/Reduce and re-using wordcount result 2. No cascading allows in Map/Reduce 3. Cascading Map/Reduce and re-using word daily count result
Cascading Map/Reduce and re-using wordcount result
What are Hadoop advantages over a traditional distributed computing platform?
Cost, Flexibility, Scalability
Value
Data can add value to an organization
Point out the wrong statement : 1. DataNode is aware of the files to which the blocks stored on it belong to 2. Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file level 3. Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode 4. User data is stored on the local file system of DataNodes
DataNode is aware of the files to which the blocks stored on it belong to
Point out the correct statement : 1. DataNode is the slave/worker node and holds the user data in the form of Data Blocks 2. Each incoming file is broken into 32 MB by default 3. Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance 4. none of the above
DataNode is the slave/worker node and holds the user data in the form of Data Blocks
In map/reduce framework, which of these logistics does Map/Reduce do with the map function? 1. Gather data distributed across a cluster to the user's computer and run map 2.Distribute map to cluster nodes, run map on the data partitions at the same time 3.Distribute map to cluster nodes, run map at one node, wait for it to finish, then run map at the next node, etc,
Distribute map to cluster nodes, run map on the data partitions at the same time
The PIG starts to compile the pig logical plan into physical plan and executed when using the command ______
Dump
HDFS provides a command line interface called __________ used to interact with HDFS.
FS Shell
The client reading the data from HDFS filesystem in Hadoop does which of the following? 1. Gets only the block locations form the namenode 2. Gets the data from the namenode 3. Gets both the data and block location from the namenode 4. Gets the block location from the datanode
Gets the block location from the datanode AND Gets only the block locations form the namenode
You can run Pig in interactive mode using the ______ shell. Grunt FS HDFS None of the mentioned
Grunt
Which of the following feature overcomes the limited single namenode namespace? HDFS Federation HDFS NameNode High Availability Speculative execution
HDFS Federation
Which of the following scenario may not be a good fit for HDFS ? 1. HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file 2. HDFS is suitable for storing data related to applications requiring low latency data access 3. HDFS is suitable for storing data related to applications requiring low latency data access 4. none of the above
HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file
What does HDFS stand for?
Hadoop Distributed File System
The namenode knows that the datanode is active using a mechanism known as-
Hearbeats
What is not part of the basic Hadoop Stack 'Zoo'? Horse pig Hive Elephant
Horse
What is NOT considered to be part of the Apache Basic Hadoop Modules? HDFS Impala MapReduce Yarn
Impala
When you design a map/reduce algorithm, you are in charge of designing: 1. Input Key/Value 2. Mapper function 3. Reducer function 4. the logistics including shuffles and groupings
Input Key/Value AND Mapper function AND Reducer function
When the user initiates a Map/Reduce task, the master node will create a
Jobtracker
___________ schedules, monitors, and re-executes failed tasks
Jobtracker
Which of the following function is used to read data in PIG ? Write Read Load none of the mentioned
LOAD
Which of these kinds of data motivated the Map/Reduce framework? 1.Large number of customer internet transactions that are often retrieved by a billing id. 2.Large number of internet documents that need to be indexed for searching by words. 3. Large number of patient records that are updated immediately after each patient visit.
Large number of internet documents that need to be indexed for searching by words.
The basic MapReduce program includes
Map() and Reduce()
What is Yarn used as an alternative to in Hadoop 2.0 and higher versions of Hadoop?
MapReduce
Which of the following statement is WRONG? 1. Pig development cycle is shorter than MapReduce 2. Pig is higher level programming than MapReduce 3. Pig is much slower than MapReduce 4. Performing data set joint is much easier in Pig than MapReduce
Pig is much slower than MapReduce
Streaming map/reduce allows mappers and reducers to be written in what languages: 1. R 2. Java 3. Unix shell commands 4. RUBY 5. Python
R, Unix shell commands, ruby, python
Map/Reduce performs a 'shuffle' and grouping. That means it... 1. Shuffles <key,value> pairs into random bins and then within a bin it groups keys. 2. Shuffles <key,value> pairs into different partitions according to the key value, and then aggregates all pairs in 1 partition into 1 group. 3. Shuffles <key,value> pairs into different partitions according to the key value, and sorts within the partitions by key.
Shuffles <key,value> pairs into different partitions according to the key value, and sorts within the partitions by key.
Pig: Point out the correct statement 1. The PIg interpreter builds a logical plan for every relational operation. 2. Invoke the Grunt shell using the "enter" command 3. Pig does not support jar files 4. All of the mentioned
The PIg interpreter builds a logical plan for every relational operation.
You can run Map/Reduce in the following way: WebRestAPI commandline streaming API calling
Streaming AND API calling
Valence
THICC. Interconnected
Point out the wrong statement : 1. To run Pig in local mode, you need access to a single machine 2. The DISPLAY operator will display the results to your terminal screen 3. To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation 4. all of the mentioned
The DISPLAY operator will display the results to your terminal screen
Apache Hadoop YARN stands for : 1. Yet Another Resource Negotiator 2. Yet Another Resource Network 3. Yet Another Reserve Negotiator
Yet Another Resource Negotiator
Point out the correct statement : 1. You can run Pig in either mode using the "pig" command 2.You can run Pig in batch mode using the Grunt shell 3. You can run Pig in interactive mode using the FS shell 4. none of the above
You can run Pig in either mode using the "pig" command
The need for data replication can arise in various scenarios like : 1.Data Blocks get corrupted 2.DataNode goes down 3.DataNode goes down 4.All of the above
all of the above
Which of the following is the true about metadata? 1. Metadata contain information like number of blocks, their location, replicas 2. FsImage & EditLogs are metadata file 3. Metadata shows the structure of HDFS directories/files 4. all of the above
all of the above
Pig Latin statements are generally organized in one of the following ways : 1. A LOAD statement to read data from the file system 2. A series of "transformation" statements to process the data 3. A DUMP statement to view results or a STORE statement to save the results 4. all of the mentioned
all of the mentioned
_______ is the slave/worker node and holds the user data in the form of Data Blocks.
datanode
In the word count by day example, what is the key?
date-word
In the MapReduce framework, Hadoop does NOT handle: 1. producing the intermediate results 2. communicating intermediate results to the reducers 3. parallel execution 4. designing the key/value pair
designing the key/value pair
Which of the following is used to adjust the size of HDFS block?
dfs.blocksize AND dis.block.size
Which utility is used for checking the health of a HDFS file system? fsck fiche fiche fcks
fsck
The number of blocks 1. is limited by the name node memory 2. is limited by the name node cpu 3. is limited by the bandwidth 4. affects the number of map tasks
is limited by the NameNode memory AND is limited by the Bandwidth AND affects the number of Map tasks
HDFS is implemented in _____________ programming language.
java
Variety
many different forms, pictures, videos, text
HDFS works in a __________ fashion.
master-worker
You can run Pig in batch using __________ Pig scripts Pig shell command Pig options All of the mentioned
pig scripts
The __________ is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc.
scheduler
In the MapReduce framework, Hadoop will do: shuffle sort group extract
shuffle and group
In the word count example, what is the key in both Map and Reduce?
the word itself
Veracity
truth of the data. Be aware of bias
In the Join operation example, the key is ...
word