Hadoop Midterm Practice Exam

Ace your homework & exams now with Quizwiz!

Which daemon distributes individual tasks to machines?

Job tracker

How are keys and values presented and passed to the reducers during a standard sort and shuffle phase of MapReduce?

Keys are presented to reducer in dotted order; values for a given key are not sorted

What is map-side join?

Map side join is done in the map phase and done in memory

Which salmon is responsible for instantiating and monitoring individual map and reduce task?

Task tracker

What is the default input format?

TextInputFormat with byte offset as a key and entire line as a value

What applies to distributed cache?

Transfer happens behind the scenes before any task is executed Distributed cache is read only Files int he distributed cache are automatically deleted from slave nodes when the job finishes

Hadoop will start transferring the data as soon as Mapper finishes it tasks and it will not wait until last Map Task Finished (T/F)?

True

The intermediate data is held on the data node local disk (T/F)?

True

What is the size of a block in HDFS

64 mb or 128 mb

How can you disable the reduce step?

A developer can always set the number of the reducer to zero. That will completely disable the reduce step

What is writable?

A java interface that needs to be implemented for MapReduce processing

Which is the correct for pseudo distributed mode of the Hadoop?

A single machine cluster All daemons run on the same machine

In a MapReduce job which process millions of input records and generated the same amount of key-value pairs (in millions). The data is not uniformly distributed. Hence MapReduce job is going to create a significant amount of intermediate data that it needs to transfer between mappers and reducers which is potential bottleneck. A custom implementation of which interface is most likely to reduce the amount of intermediate data transferred across the network?

Combiner

What's is data localization?

Hadoop will state the Map Task on the node when data block is kept via HDFS

If a Mapper runs slow relative to others, then:

No reducer can start until last Mapper finished If mapper is running slowly, the another instance of Mapper will be started by Hadoop on another machine Hadoop will kill the slow mapper if it keeps running while the new one finished The result of the first mapper finished will be used

What are the features of the Hadoop framework?

Nodes talk to each other as little as possible Computation happens where the data is stored Data is replicated multiple times on the system

What are the common problems with map side join?

Out of memory exceptions on slave nodes

Suppose that your jobs input is a (huge) set of word tokens and their number of occurrences (word count) and that you want to sort them by number of occurrences. Then which of the following class will help you to get globally sorted file?

Partitioner

What is the function of combiner?

Runs locally on a single mappers output Using combiner can reduce the network traffic Generally, combiner and reducer code is the same

Which daemon is responsible for the housekeeping of the name node?

Secondary name node

What is the role of the namenode?

Splits big files into smaller blacks and sends them to different data nodes Manages HDFS system and supplies addresses of the data on the different datanodes

Hadoop Midterm Practice Exam

Related study sets

Strategic Management-Chapter 2

NUR4165 Quiz #1

Raster and Vector Graphics

hh

math final

A&P Exam 3 review

Knowledge check: Create, insert and select

Micro Chapter 4

Exam 1 (Units 1-5)

ap psych chapter 3 synthesia and types of pain

Income tax Quiz 2

PANCE Psych

Intro to Business Test 4 Turnbull

matrimonial regimes class

MGMT 304 Chapter 3

National Real Estate Exam

Med Term Body Organizations

Concepts in ABA Review

EMT: Module 1-4

Le potenze di Frulia