ITSS 4354 Exam 1

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

What class would be used to perform the summary operations against intermediate data on the Mapper node?

Combiner

In a Hadoop "stack," what node periodically replicates and stores data from the Name Node should it fail?

Secondary Node

Choose the Cluster Management tool in Hadoop

Sqoop

refers to both how fast data is being produced and how fast the data must be processed (i.e., captured, stored, and analyzed) to meet the need or demand. RFID tags, automated sensors, GPS devices, and smart meters are driving an increasing need to deal with torrents of data in near-real time.

Velocity

Choose the Cluster Management tool in Hadoop

Yarn

In a Hadoop "stack," what is a slave node?

a node where data is stored and processed

what are the details about data stored in the Name node?

all of the above

Using data to understand customers/clients and business operations to sustain and foster growth and profitability is:

an increasingly challenging task for today's enterprises

In HDFS blocks are stored

blocks are spread over multiple data nodes

Which among the following take (key-value) pairs as input and produce output as (key-value)?

both mappers and reducers

The final output of the reduced tasks are stored on the ----------------------------

due to the smaller size it is stored in HDFS

Which Big Data approach promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources?

grid computing

Choose the correct format for submitting MapReduce Job.

hadoop -jar MRJar.jar MRDriver inputdir outputdir

what is the command to list files in HDFS?

hdfs dfs -ls

Configuration file for HDFS stored in

hdfs-site.xml

The analytics layer of the Big Data stack is experiencing what type of development currently?

significant development

Traditional data warehouses have not been able to keep up with:

the variety and complexity of data.

To display a list of blocks that make up each file?

%hdfs fsck / -files -blocks

A newly popular unit of data in the Big Data era is the petabyte (PB), which is:

10^15 bytes.

The default block size for HDFS replication in Hadoop 2.0 and above

128 MB

In MapReduce how many main functions involved?

2

Choose the default configuration for Mapper and Reducer.

2 mapper and 1 reducer

What is the default replication factor in HDFS ?

3

which of the following options is/are used to pass parameters to a MapReduce program during runtime?

Both A and B

Choose the non-relational database is part of Hadoop ecosystem.

Hbase

Which tool simplifies Java-based Mapreduce processing?

Pig Latin

Functions of Reducer in MapReduce.

All of the above

HDFS is

Distributed File system made of Commodity hardware

Block size and replication factor cannot be configured once we set up the HDFS

False

The requirement is to upload the data to HDFS as soon as it reaches the company. Which tool is meant to satisfy the requirement in hadoop ecosystem?

Flume

Which statement is not true ?

HDFS and network file system are related

The number of map tasks equals the number of input file splits, and may even be increased by the user by reducing the input split size. This leads to ______.

Improved resource utilization

How does Hadoop work?

It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers.

MapReduce jobs runs and store the files using TextOutputFormat. Now, you need to run a second MapReduce job that takes the output key-value pairs from the first job as its input key-value pairs. Which InputFormat is best suited for the second MapReduce job ?

KeyValueTextInputFormat

Node in a Hadoop cluster provides client information on where in the cluster particular data is stored.

Name Node

How clients initially make communication to

Name node only

which statement is true?

The number of input splits is equal to the number of map tasks.

The term "Big Data" is relative as it depends on the size of the using organization.

True

Data flows can be highly inconsistent, with periodic peaks, making data loads hard to manage. What is this feature of Big Data called?

Variability


Ensembles d'études connexes

Chapter 1: Introduction to Corporate Finance

View Set

AP Gov Court Cases - Marbury v. Madison (1803)

View Set

NUR 1215 FLUID AND ELECTROLYTES UNIT 1

View Set

Economics Private and Public Choice 13e Chapter 1

View Set

Role of fibroblasts in periodontium

View Set

Risk Management Test 1, Finance Quiz - Auto Insurance and Society, Finance 350 - Chapter 3, Quiz #5 questions, Quiz #4 questions - finance 350, Finance Quiz 2 (possible questions), Insurance Quiz 3, Finance 350 Homeowners Section 2 Quiz, Quiz 11 - Fi...

View Set

ATI Engage Mental Health RN: Foundations of Mental Health Nursing - Client and Mental Health Team Member Safety: Legal & Ethical Considerations

View Set