Big Data Test 1

¡Supera tus tareas y exámenes ahora con Quizwiz!

A ________ serves as the master and there is only one NameNode per cluster.

nameNode

Which of the following stores metadata? 1. NameNode 2. datanode 3. Secondary Datanode 4 All of the above

namenode

What are the two majority types of nodes in HDFS?

namenode DataNode

The ____________ is the ultimate authority that arbitrates resources among all the applications in the system.

ResourceManager

Volume

Data is huge, need clustered or distributed system

Velocity

Data is often streaming. Constantly updated

6 Vs of Big Data

Volume, Velocity, Variety, Veracity, Value,Valence

What are the two basic layers comprising the Hadoop Architecture?

MapReduce and HDFS

What is the following command to upload a file into HDFS?

hdfs dfs -put FILENAME HDFS_PATH

Pig operates in mainly how many mode:

How many blocks in HDFS by giving a file of 10GB, with the default block size of 64MB, and 3 replications.

480

What is the organizing data structure for map/reduce programs?

A list of identification keys and some value associated with that identifier

The __________ is a framework-specific entity that negotiates resources from the ResourceManager

ApplicationMaster

In counting the total word count example, the algorithm is: 1. Cascading Map/Reduce and re-using wordcount result 2. No cascading allows in Map/Reduce 3. Cascading Map/Reduce and re-using word daily count result

Cascading Map/Reduce and re-using wordcount result

What are Hadoop advantages over a traditional distributed computing platform?

Cost, Flexibility, Scalability

Value

Data can add value to an organization

Point out the wrong statement : 1. DataNode is aware of the files to which the blocks stored on it belong to 2. Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file level 3. Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode 4. User data is stored on the local file system of DataNodes

DataNode is aware of the files to which the blocks stored on it belong to

Point out the correct statement : 1. DataNode is the slave/worker node and holds the user data in the form of Data Blocks 2. Each incoming file is broken into 32 MB by default 3. Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance 4. none of the above

DataNode is the slave/worker node and holds the user data in the form of Data Blocks

In map/reduce framework, which of these logistics does Map/Reduce do with the map function? 1. Gather data distributed across a cluster to the user's computer and run map 2.Distribute map to cluster nodes, run map on the data partitions at the same time 3.Distribute map to cluster nodes, run map at one node, wait for it to finish, then run map at the next node, etc,

Distribute map to cluster nodes, run map on the data partitions at the same time

The PIG starts to compile the pig logical plan into physical plan and executed when using the command ______

Dump

HDFS provides a command line interface called __________ used to interact with HDFS.

FS Shell

The client reading the data from HDFS filesystem in Hadoop does which of the following? 1. Gets only the block locations form the namenode 2. Gets the data from the namenode 3. Gets both the data and block location from the namenode 4. Gets the block location from the datanode

Gets the block location from the datanode AND Gets only the block locations form the namenode

You can run Pig in interactive mode using the ______ shell. Grunt FS HDFS None of the mentioned

Grunt

Which of the following feature overcomes the limited single namenode namespace? HDFS Federation HDFS NameNode High Availability Speculative execution

HDFS Federation

Which of the following scenario may not be a good fit for HDFS ? 1. HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file 2. HDFS is suitable for storing data related to applications requiring low latency data access 3. HDFS is suitable for storing data related to applications requiring low latency data access 4. none of the above

HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file

What does HDFS stand for?

Hadoop Distributed File System

The namenode knows that the datanode is active using a mechanism known as-

Hearbeats

What is not part of the basic Hadoop Stack 'Zoo'? Horse pig Hive Elephant

Horse

What is NOT considered to be part of the Apache Basic Hadoop Modules? HDFS Impala MapReduce Yarn

Impala

When you design a map/reduce algorithm, you are in charge of designing: 1. Input Key/Value 2. Mapper function 3. Reducer function 4. the logistics including shuffles and groupings

Input Key/Value AND Mapper function AND Reducer function

When the user initiates a Map/Reduce task, the master node will create a

Jobtracker

___________ schedules, monitors, and re-executes failed tasks

Jobtracker

Which of the following function is used to read data in PIG ? Write Read Load none of the mentioned

LOAD

Which of these kinds of data motivated the Map/Reduce framework? 1.Large number of customer internet transactions that are often retrieved by a billing id. 2.Large number of internet documents that need to be indexed for searching by words. 3. Large number of patient records that are updated immediately after each patient visit.

Large number of internet documents that need to be indexed for searching by words.

The basic MapReduce program includes

Map() and Reduce()

What is Yarn used as an alternative to in Hadoop 2.0 and higher versions of Hadoop?

MapReduce

Which of the following statement is WRONG? 1. Pig development cycle is shorter than MapReduce 2. Pig is higher level programming than MapReduce 3. Pig is much slower than MapReduce 4. Performing data set joint is much easier in Pig than MapReduce

Pig is much slower than MapReduce

Streaming map/reduce allows mappers and reducers to be written in what languages: 1. R 2. Java 3. Unix shell commands 4. RUBY 5. Python

R, Unix shell commands, ruby, python

Map/Reduce performs a 'shuffle' and grouping. That means it... 1. Shuffles <key,value> pairs into random bins and then within a bin it groups keys. 2. Shuffles <key,value> pairs into different partitions according to the key value, and then aggregates all pairs in 1 partition into 1 group. 3. Shuffles <key,value> pairs into different partitions according to the key value, and sorts within the partitions by key.

Shuffles <key,value> pairs into different partitions according to the key value, and sorts within the partitions by key.

Pig: Point out the correct statement 1. The PIg interpreter builds a logical plan for every relational operation. 2. Invoke the Grunt shell using the "enter" command 3. Pig does not support jar files 4. All of the mentioned

The PIg interpreter builds a logical plan for every relational operation.

You can run Map/Reduce in the following way: WebRestAPI commandline streaming API calling

Streaming AND API calling

Valence

THICC. Interconnected

Point out the wrong statement : 1. To run Pig in local mode, you need access to a single machine 2. The DISPLAY operator will display the results to your terminal screen 3. To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation 4. all of the mentioned

The DISPLAY operator will display the results to your terminal screen

Apache Hadoop YARN stands for : 1. Yet Another Resource Negotiator 2. Yet Another Resource Network 3. Yet Another Reserve Negotiator

Yet Another Resource Negotiator

Point out the correct statement : 1. You can run Pig in either mode using the "pig" command 2.You can run Pig in batch mode using the Grunt shell 3. You can run Pig in interactive mode using the FS shell 4. none of the above

You can run Pig in either mode using the "pig" command

The need for data replication can arise in various scenarios like : 1.Data Blocks get corrupted 2.DataNode goes down 3.DataNode goes down 4.All of the above

all of the above

Which of the following is the true about metadata? 1. Metadata contain information like number of blocks, their location, replicas 2. FsImage & EditLogs are metadata file 3. Metadata shows the structure of HDFS directories/files 4. all of the above

all of the above

Pig Latin statements are generally organized in one of the following ways : 1. A LOAD statement to read data from the file system 2. A series of "transformation" statements to process the data 3. A DUMP statement to view results or a STORE statement to save the results 4. all of the mentioned

all of the mentioned

_______ is the slave/worker node and holds the user data in the form of Data Blocks.

datanode

In the word count by day example, what is the key?

date-word

In the MapReduce framework, Hadoop does NOT handle: 1. producing the intermediate results 2. communicating intermediate results to the reducers 3. parallel execution 4. designing the key/value pair

designing the key/value pair

Which of the following is used to adjust the size of HDFS block?

dfs.blocksize AND dis.block.size

Which utility is used for checking the health of a HDFS file system? fsck fiche fiche fcks

fsck

The number of blocks 1. is limited by the name node memory 2. is limited by the name node cpu 3. is limited by the bandwidth 4. affects the number of map tasks

is limited by the NameNode memory AND is limited by the Bandwidth AND affects the number of Map tasks

HDFS is implemented in _____________ programming language.

java

Variety

many different forms, pictures, videos, text

HDFS works in a __________ fashion.

master-worker

You can run Pig in batch using __________ Pig scripts Pig shell command Pig options All of the mentioned

pig scripts

The __________ is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc.

Big Data Test 1

Conjuntos de estudio relacionados

MATH-11-SP18: Module 3 Quiz

Supply Chain Management Exam 3

Linux Chapter 10 Review Questions

CNL Questions 101-150

MIDTERM

Test One Review

Personal Finance Exam 2

REE Ch. 20 - Valuation and Pricing Properties

SKILLS LAB: Medical-Surgical: Dermatological

Chapter Exam - Medical Expense Insurance

CompTIA Server + SK0-004 Chap 5

ECON 3357 Intermediate Microeconomics Comprehensive

Animal, Plant, and bacterial Cell Functions

Chapter 10. Decision Making And Leadership In Groups

Tableau Desktop Certification

Post Test for Channel It

cost acct ch 3

ACCT 580- exam 3

Grammar 1: Subject and Object Pronouns

GWM 1.6