hadoop quiz

¡Supera tus tareas y exámenes ahora con Quizwiz!

Point out the correct statement : a) MapReduce tries to place the data and the compute as close as possible b) Map Task in MapReduce is performed using the Mapper() function c) Reduce Task in MapReduce is performed using the Map() function d) All of the mentioned

Answer:a MapReduce tries to place the data and the compute as close as possible Explanation:This feature of MapReduce is "Data Locality".

part of the MapReduce is responsible for processing one or more chunks of data and producing the output results. a) Maptask b) Mapper c) Task execution d) All of the mentioned

Answer:a Maptask Explanation:Map Task in MapReduce is performed using the Map() function.

The number of maps is usually driven by the total size of : a) inputs b) outputs c) tasks d) None of the mentioned

Answer:a inputs Explanation:Total size of inputs means total number of blocks of the input files.

is the primary interface for a user to describe a MapReduce job to the Hadoop framework for execution. a) Map Parameters b) JobConf c) MemoryConf d) None of the mentioned

Answer:b JobConf Explanation:JobConf represents a MapReduce job configuration

is the default Partitioner for partitioning key space. a) HashPar b) Partitioner c) HashPartitioner d) None of the mentioned

Answer:c HashPartitioner Explanation: The default partitioner in Hadoop is the HashPartitioner which has a method called getPartition to partition.

According to analysts, for what can traditional IT systems provide a foundation when they're integrated with big data technologies like Hadoop ? a) Big data management and data mining b) Data warehousing and business intelligence c) Management of Hadoop clusters d) Collecting and storing unstructured data

a) Big data management and data mining Explanation:Data warehousing integrated with Hadoop would give better understanding of data.

Facebook Tackles Big Data With _______ based on Hadoop. a) 'Project Prism' b) 'Prism' c) 'Project Big' d) 'Project Data'

a) 'Project Prism' Explanation:Prism automatically replicates and moves data wherever it's needed across a vast network of computing facilities.

Although the Hadoop framework is implemented in Java , MapReduce applications need not be written in : a) Java b) C c) C# d) None of the mentioned

a) Java

Point out the correct statement : a) Applications can use the Reporter to report progress b) The Hadoop MapReduce framework spawns one map task for each InputSplit generated by the InputFormat for the job c) The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format d) All of the mentioned

d) All of the mentioned

IBM and ________ have announced a major initiative to use Hadoop to support university courses in distributed computer programming. a) Google Latitude b) Android (operating system) c) Google Variations d) Google

d) Google Explanation:Google and IBM Announce University Initiative to Address Internet-Scale.

As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including: a) Improved data storage and information retrieval b) Improved extract, transform and load features for data integration c) Improved data warehousing functionality d) Improved security, workload management and SQL support

d) Improved security, workload management and SQL support Explanation:Adding security to Hadoop is challenging because all the interactions do not follow the classic client- server pattern.

has the world's largest Hadoop cluster. a) Apple b) Datamatics c) Facebook d) None of the mentioned

c) Facebook Explanation:Facebook has many Hadoop clusters, the largest among them is the one that is used for Data warehousing.

What was Hadoop named after? a) Creator Doug Cutting's favorite circus act b) Cutting's high school rock band c) The toy elephant of Cutting's son d) A sound Cutting's laptop made during Hadoop's development

c) The toy elephant of Cutting's son

Sun also has the Hadoop Live CD ________ project, which allows running a fully functional Hadoop cluster using a live CD. a) OpenOffice.org b) OpenSolaris c) GNU d) Linux

b) OpenSolaris Explanation:The OpenSolaris Hadoop LiveCD project built a bootable CD-ROM image

All of the following accurately describe Hadoop, EXCEPT: a) Open source b) Real-time c) Java-based d) Distributed computing approach

b) Real-time Explanation:Apache Hadoop is an open-source software framework for distributed storage and distributed processing of Big Data on clusters of commodity hardware.

Which of the following phases occur simultaneously ? a) Shuffle and Sort b) Reduce and Sort c) Shuffle and Map d) All of the mentioned

Answer:a Shuffle and Sort Explanation:The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.

maps input key/value pairs to a set of intermediate key/value pairs. a) Mapper b) Reducer c) Both Mapper and Reducer d) None of the mentioned

Answer:a mapper Explanation:Maps are the individual tasks that transform input records into intermediate records.

is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer. a) Hadoop Strdata b) Hadoop Streaming c) Hadoop Stream d) None of the mentioned

Answer:b Hadoop Streaming Explanation:Hadoop streaming is one of the most important utilities in the Apache Hadoop distribution.

The output of the _______ is not sorted in the Mapreduce framework for Hadoop. a) Mapper b) Cascader c) Scalding d) None of the mentioned

Answer:d None of the mentioned Explanation:The output of the reduce task is typically written to the FileSystem. The output of the Reducer is not sorted.

What license is Hadoop distributed under ? a) Apache License 2.0 b) Mozilla Public License c) Shareware d) Commercial

a) Apache License 2.0 Explanation:Hadoop is Open Source, released under Apache 2 license.

The Hadoop list includes the HBase database, the Apache Mahout ________ system, and matrix operations. a) Machine learning b) Pattern recognition c) Statistical classification d) Artificial intelligence

a) Machine learning Explanation:The Apache Mahout project's goal is to build a scalable machine learning tool

Input to the _______ is the sorted output of the mappers. a) Reducer b) Mapper c) Shuffle d) All of the mentioned

a) Reducer

Point out the correct statement : a) Hadoop do need specialized hardware to process the data b) Hadoop 2.0 allows live stream processing of real time data c) In Hadoop programming framework output files are divided in to lines or records d) None of the mentioned

b) Hadoop 2.0 allows live stream processing of real time data Explanation:Hadoop batch processes data distributed over a number of computers ranging in 100s and 1000s.

Which of the following platforms does Hadoop run on ? a) Bare metal b) Debian c) Cross-platform d) Unix-like

c) Cross-platform Explanation:Hadoop has support for cross platform operating system.

Mapper and Reducer implementations can use the ________ to report progress or just indicate that they are alive. a) Partitioner b) OutputCollector c) Reporter d) All of the mentioned

Answer:c Reporter Explanation:Reporter is a facility for MapReduce applications to report progress, set application-level status messages and update Counters.

A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker. a) MapReduce b) Mapper c) TaskTracker d) JobTracker

Answer:c TaskTracker Explanation:TaskTracker receives the information necessary for execution of a Task from JobTracker, Executes the Task, and Sends the Results back to JobTracker.

Which of the following genres does Hadoop produce ? a) Distributed file system b) JAX-RS c) Java Message Service d) Relational Database Management System

a) Distributed file system Explanation:The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user.

Above the file systems comes the ________ engine, which consists of one Job Tracker, to which client applications submit MapReduce jobs. a) MapReduce b) Google c) Functional programming d) Facebook

a) MapReduce Explanation:MapReduce engine uses to distribute work around a cluster.

can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data. a) MapReduce b) Mahout c) Oozie d) All of the mentioned

a) MapReduce Explanation:MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm.

Hadoop is a framework that works with a variety of related tools. Common cohorts include: a) MapReduce, Hive and HBase b) MapReduce, MySQL and Google Apps c) MapReduce, Hummer and Iguana d) MapReduce, Heron and Trumpet

a) MapReduce, Hive and HBase Explanation:To use Hive with HBase you'll typically want to launch two clusters, one to run HBase and the other to run Hive.

Point out the wrong statement : a) Hardtop's processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabytes of data b) Hadoop uses a programming model called "MapReduce", all the programs should confirms to this model in order to work on Hadoop platform c) The programming model, MapReduce, used by Hadoop is difficult to write and test d) All of the mentioned

c) The programming model, MapReduce, used by Hadoop is difficult to write and test Explanation:The programming model, MapReduce, used by Hadoop is simple to write and test.

The right number of reduces seems to be : a) 0.90 b) 0.80 c) 0.36 d) 0.95

d) 0.95

function is responsible for consolidating the results produced by each of the Map() functions/tasks. a) Reduce b) Map c) Reducer d) All of the mentioned

Answer:a Reduce Explanation:Reduce function collates the work and resolves the results.

is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer a) Partitioner b) OutputCollector c) Reporter d) All of the mentioned

Answer:b OutputCollector Explanation:Hadoop MapReduce comes bundled with a library of generally useful mappers, reducers, and partitioners.

Point out the wrong statement : a) A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner b) The MapReduce framework operates exclusively on pairs c) Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods d) None of the mentioned

Answer:d Explanation:The MapReduce framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.

Mapper implementations are passed the JobConf for the job via the ________ method a) JobConfigure.configure b) JobConfigurable.configure c) JobConfigurable.configureable d) None of the mentioned

b) JobConfigurable.configure

Point out the correct statement : a) Hadoop is an ideal environment for extracting and transforming small volumes of data b) Hadoop stores data in HDFS and supports data compression/decompression c) The Giraph framework is less useful than a MapReduce job to solve graph and machine learning d) None of the mentioned

b) Hadoop stores data in HDFS and supports data compression/decompression Explanation:Data compression can be achieved using compression algorithms like bzip2, gzip, LZO, etc. Different algorithms can be used in different scenarios based on their capabilities.

Hadoop achieves reliability by replicating the data across multiple hosts, and hence does not require ________ storage on hosts. a) RAID b) Standard RAID levels c) ZFS d) Operating system

a) RAID Explanation:With the default replication value, 3, data is stored on three nodes: two on the same rack, and one on a different rack.

Point out the wrong statement : a) Reducer has 2 primary phases b) Increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures c) It is legal to set the number of reduce-tasks to zero if no reduction is desired d) The framework groups Reducer inputs by keys (since different mappers may have output the same key) in sort stage

a) Reducer has 2 primary phases Reducer has 3 primary phases: shuffle, sort and reduce.

What was Hadoop written in ? a) Java (software platform) b) Perl c) Java (programming language) d) Lua (programming language)

c) Java (programming language) Explanation:The Hadoop framework itself is mostly written in the Java programming language, with some native code in C and command line utilities written as shell-scripts.


Conjuntos de estudio relacionados

Paramedic Jb Learning All Chapter Exams

View Set

Chapter 13 Anatomy and Physiology

View Set

9.REF/moratorium/recasting/short sale addendum/ chapter 11/foreclosure by advertisement/entry&possession/write of entry/deficiency judgement/reduction act/recourse clause/depreciation value/notice of default levy/ redemption period/reinstatement

View Set

Nursing Care of the Child With an Alteration in Bowel Elimination/Gastrointestinal Disorder

View Set

Econ 202 Test Review (Quizzes 9-12)

View Set

Ch.11 Making Alliances & Acquisitions

View Set

Management Control Systems-Exam 2

View Set

AP Psych midterm review MCQ, AP Psychology - Unit 2 Progress Check MCQ

View Set