Big Data Exam 1 (Q1 -Q4)

¡Supera tus tareas y exámenes ahora con Quizwiz!

What class performs the summary operations against intermediate data on the mapper node?

Combiner

In a Hadoop "stack", what node periodically replicates and stores data from the Name node should it fail?

Secondary node

Data loading in HDFS

Sqoop

______________ refers to both how fast data is being produced and how fast the data must be processed (i.e, captured, stored and analyzed) to meet the demand or need. RFID tags, automated sensors, GPS devices, and smart meters are driving an increasing need to deal with torrents of data in near-real time.

Velocity

Cluster Management in HDFS

YArn

HDFS is

a distributed file system made of commodity hardware

Slave node in hadoop is:

a node where data is stored and processed

Using data to understand customers/clients and business operations to sustain and foster growth and profitability is:

an increasingly challenging task for today's enterprises

Which Big Data approach promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources?

grid computing

Correct format for submitting MapReduce jobs

hadoop -jar MRjar .jar MRDriver inputdir outputdir

Command to list files in hdfs

hdfs dfs -ls

Command to display a list of blocks that make up each file:

hdfs fsck / -files -blocks

Config file for HDFS is stored in

hdfs-site.xml

The number of map tasks equals the number of input file splits, and may even be increased by the user by reducing the input split size. This leads to

improved resource utilization

What are the details about data stored in the name node?

location and size of blocks, size of files, permissions and ownership

Clients make communication in HDFS through the

name node only

Functions of reducer in MapReduce

reduce, shuffle and sort

The analytics layer of the Big Data stack is experiencing what type of development currently?

significant development

In HDFS blocks are stored

spread over multiple data nodes

Traditional data warehouses have not been able to keep up with

the variety and complexity of data

Data flows can be highly inconsistent, with periodic peaks, making data loads hard to manage. What is this feature of Big Data called?

variability

A newly popular unit of dat in the Big Data era is the petabyte (PB), which is:

10^15 bytes

Default block size for HDFS replication is

128 MB

In MapReduce how many main functions are involved?

2

Default config for mapper and reducer:

2 mappers and 1 reducer

Default replication factor in HDFS is

3

Which of the following options are used to pass parameters to a MapReduce program during runtime?

A + B

Choose the non-relational database that is a part of the Hadoop ecosystem A. Hive B. Hbase C. NoSQL D. ZooKeeper

B. Hbase

Which statement is not true? A. Hadoop consists of multiple products B. HDFS is a file system, not a RDBMS C. HDFS and network file systems are related D. Hive resembles SQL but is not standard SQL

C

Which tool simplifies Java-based MapReduce processing?

Pig Latin

T/F: Block size and replication factor cannot be configured once we set up the HDFS

False

The requirement is to upload the data to HDFS as soon as it reaches the company. Which tool is meant to satisfy the requirements in hadoop ecosystem?

Flume

The final output of the reduced tasks are stored in:

HDFS (due to the smaller size of it)

How does Hadoop work?

It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers.

What inputformat is best suited to run a second map reduce job that takes the output key-value pairs from the first as its input?

KeyValueTextInputFormat

Which takes key-value pairs as input and produces it as output too?

Mappers and reducers

___________ node in a Hadoop cluster provides client information on where in the cluster particular data is stored.

Name node

True statement

The number of input splits is equal to the number of map tasks

The term "Big Data" is relative as it depends on the size of the using organization.

True


Conjuntos de estudio relacionados

Physics , all the test and quizzes questions&answers

View Set

M05b: Information Systems Ethics

View Set

Architecture of the Ancient Greece

View Set

Patho-Pharm: Exam 1 Practice Questions

View Set

Physics First H(Vining) - Semester 1 Final Study Guide

View Set

BEM 251 (MIS) MIDTERM (Lectures 2-6; Austin)

View Set

Data Collection, Behavior, and Decisions

View Set