ITSS 4354 Exam 1
What class would be used to perform the summary operations against intermediate data on the Mapper node?
Combiner
In a Hadoop "stack," what node periodically replicates and stores data from the Name Node should it fail?
Secondary Node
Choose the Cluster Management tool in Hadoop
Sqoop
refers to both how fast data is being produced and how fast the data must be processed (i.e., captured, stored, and analyzed) to meet the need or demand. RFID tags, automated sensors, GPS devices, and smart meters are driving an increasing need to deal with torrents of data in near-real time.
Velocity
Choose the Cluster Management tool in Hadoop
Yarn
In a Hadoop "stack," what is a slave node?
a node where data is stored and processed
what are the details about data stored in the Name node?
all of the above
Using data to understand customers/clients and business operations to sustain and foster growth and profitability is:
an increasingly challenging task for today's enterprises
In HDFS blocks are stored
blocks are spread over multiple data nodes
Which among the following take (key-value) pairs as input and produce output as (key-value)?
both mappers and reducers
The final output of the reduced tasks are stored on the ----------------------------
due to the smaller size it is stored in HDFS
Which Big Data approach promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources?
grid computing
Choose the correct format for submitting MapReduce Job.
hadoop -jar MRJar.jar MRDriver inputdir outputdir
what is the command to list files in HDFS?
hdfs dfs -ls
Configuration file for HDFS stored in
hdfs-site.xml
The analytics layer of the Big Data stack is experiencing what type of development currently?
significant development
Traditional data warehouses have not been able to keep up with:
the variety and complexity of data.
To display a list of blocks that make up each file?
%hdfs fsck / -files -blocks
A newly popular unit of data in the Big Data era is the petabyte (PB), which is:
10^15 bytes.
The default block size for HDFS replication in Hadoop 2.0 and above
128 MB
In MapReduce how many main functions involved?
2
Choose the default configuration for Mapper and Reducer.
2 mapper and 1 reducer
What is the default replication factor in HDFS ?
3
which of the following options is/are used to pass parameters to a MapReduce program during runtime?
Both A and B
Choose the non-relational database is part of Hadoop ecosystem.
Hbase
Which tool simplifies Java-based Mapreduce processing?
Pig Latin
Functions of Reducer in MapReduce.
All of the above
HDFS is
Distributed File system made of Commodity hardware
Block size and replication factor cannot be configured once we set up the HDFS
False
The requirement is to upload the data to HDFS as soon as it reaches the company. Which tool is meant to satisfy the requirement in hadoop ecosystem?
Flume
Which statement is not true ?
HDFS and network file system are related
The number of map tasks equals the number of input file splits, and may even be increased by the user by reducing the input split size. This leads to ______.
Improved resource utilization
How does Hadoop work?
It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers.
MapReduce jobs runs and store the files using TextOutputFormat. Now, you need to run a second MapReduce job that takes the output key-value pairs from the first job as its input key-value pairs. Which InputFormat is best suited for the second MapReduce job ?
KeyValueTextInputFormat
Node in a Hadoop cluster provides client information on where in the cluster particular data is stored.
Name Node
How clients initially make communication to
Name node only
which statement is true?
The number of input splits is equal to the number of map tasks.
The term "Big Data" is relative as it depends on the size of the using organization.
True
Data flows can be highly inconsistent, with periodic peaks, making data loads hard to manage. What is this feature of Big Data called?
Variability