HDFS
6. Which of the following scenario may not be a good fit for HDFS ? a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file b) HDFS is suitable for storing data related to applications requiring low latency data access c) HDFS is suitable for storing data related to applications requiring low latency data access d) None of the mentioned
Answer: a Explanation: HDFS can be used for storing archive data since it is cheaper as HDFS allows storing the data on low cost commodity hardware while ensuring a high degree of fault-tolerance.
4. The JobTracker pushes work out to available _______ nodes in the cluster, striving to keep the work as close to the data as possible a) DataNodes b) TaskTracker c) ActionNodes d) All of the mentioned
Answer: b Explanation: A heartbeat is sent from the TaskTracker to the JobTracker every few minutes to check its status whether the node is dead or alive.
1. ________ is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks. a) Hive b) MapReduce c) Pig d) Lucene
Answer: b Explanation: MapReduce is the heart of hadoop.
6. Interface ____________ reduces a set of intermediate values which share a key to a smaller set of values. a) Mapper b) Reducer c) Writable d) Readable
Answer: b Explanation: Reducer implementations can access the JobConf for the job.
9. HDFS provides a command line interface called __________ used to interact with HDFS. a) "HDFS Shell" b) "FS Shell" c) "DFS Shell" d) None of the mentioned
Answer: b Explanation: The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS).
5. Point out the wrong statement : a) The framework calls reduce method for each pair in the grouped inputs b) The output of the Reducer is re-sorted c) reduce method reduces values for a given key d) None of the mentioned
Answer: b Explanation: The output of the Reducer is not re-sorted.
6. InputFormat class calls the ________ function and computes splits for each file and then sends them to the jobtracker. a) puts b) gets c) getSplits d) all of the mentioned
Answer: c Explanation: InputFormat uses their storage locations to schedule map tasks to process them on the tasktrackers.
4. ________ NameNode is used when the Primary NameNode goes down. a) Rack b) Data c) Secondary d) None of the mentioned
Answer: c Explanation: Secondary namenode is used for all time availability and reliability.
8. The output of the reduce task is typically written to the FileSystem via : a) OutputCollector b) InputCollector c) OutputCollect d) All of the mentioned
Answer: a Explanation: In reduce phase the reduce(Object, Iterator, OutputCollector, Reporter) method is called for each pair in the grouped inputs.
7. Reducer is input the grouped output of a : a) Mapper b) Reducer c) Writable d) Readable
Answer: a Explanation: In the phase the framework, for each Reducer, fetches the relevant partition of the output of all the Mappers, via HTTP.
3. HDFS works in a __________ fashion. a) master-worker b) master-slave c) worker/slave d) all of the mentioned
Answer: a Explanation: NameNode servers as the master and each DataNode servers as a worker/slave
2. Point out the correct statement : a) DataNode is the slave/worker node and holds the user data in the form of Data Blocks b) Each incoming file is broken into 32 MB by default c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance d) None of the mentioned
Answer: a Explanation: There can be any number of DataNodes in a Hadoop Cluster.
4. _____________ is used to read data from bytes buffers . a) write() b) read() c) readwrite() d) all of the mentioned
Answer: a Explanation: readfully method can also be used instead of read method.
2. Point out the correct statement : a) The framework groups Reducer inputs by keys b) The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged c) Since JobConf.setOutputKeyComparatorClass(Class) can be used to control how intermediate keys are grouped, these can be used in conjunction to simulate secondary sort on values d) All of the mentioned
Answer: d Explanation: If equivalence rules for keys while grouping the intermediates are different from those for grouping keys before reduction, then one may specify a Comparator.
10. Which of the following parameter is to collect keys and combined values ? a) key b) values c) reporter d) output
Answer: d Explanation: reporter parameter is for facility to report progress.
5. Point out the wrong statement : a) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file level b) Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode c) User data is stored on the local file system of DataNodes d) DataNode is aware of the files to which the blocks stored on it belong to
Answer: d Explanation: NameNode is aware of the files to which the blocks stored on it belong to.
5. Point out the wrong statement : a) The map function in Hadoop MapReduce have the following general form:map:(K1, V1) → list(K2, V2) b) The reduce function in Hadoop MapReduce have the following general form: reduce: (K2, list(V2)) → list(K3, V3) c) MapReduce has a complex model of data processing: inputs and outputs for the map and reduce functions are key-value pairs d) None of the mentioned
Answer: c Explanation: MapReduce is relatively simple model to implement in Hadoop.
7. The need for data replication can arise in various scenarios like : a) Replication Factor is changed b) DataNode goes down c) Data Blocks get corrupted d) All of the mentioned
Answer: d Explanation: Data is replicated across different DataNodes to ensure a high degree of fault-tolerance.
8. ________ is the slave/worker node and holds the user data in the form of Data Blocks. a) DataNode b) NameNode c) Data block d) Replication
Answer: a Explanation: A DataNode stores data in the [HadoopFileSystem]. A functional filesystem has more than one DataNode, with data replicated across them.
2. Point out the correct statement : a) Data locality means movement of algorithm to the data instead of data to algorithm b) When the processing is done on the data algorithm is moved across the Action Nodes rather than data to the algorithm c) Moving Computation is expensive than Moving Data d) None of the mentioned
Answer: a Explanation: Data flow framework possesses the feature of data locality.
1. In order to read any file in HDFS, instance of __________ is required. a) filesystem b) datastream c) outstream d) inputstream
Answer: a Explanation: InputDataStream is used to read data from file.
1. A ________ serves as the master and there is only one NameNode per cluster. a) Data Node b) NameNode c) Data block d) Replication
Answer: b Explanation: All the metadata related to HDFS including the information about data nodes, files stored on HDFS, and Replication, etc. are stored and maintained on the NameNode.
10. HDFS is implemented in _____________ programming language. a) C++ b) Java c) Scala d) None of the mentioned
Answer: b Explanation: HDFS is implemented in Java and any computer which can run Java can host a NameNode/DataNode on it.
9. Applications can use the _________ provided to report progress or just indicate that they are alive. a) Collector b) Reporter c) Dashboard d) None of the mentioned
Answer: b Explanation: In scenarios where the application takes a significant amount of time to process individual key/value pairs, this is crucial since the framework might assume that the task has timed-out and kill that task.
3. The daemons associated with the MapReduce phase are ________ and task-trackers. a) job-tracker b) map-tracker c) reduce-tracker d) all of the mentioned
nswer: a Explanation: Map-Reduce jobs are submitted on job-tracker.