ITSS 4354 - Exam 1
A newly popular unit of data in the Big Data era is the petabyte (PB), which is: A. 10^9 bytes. B. 10^12 bytes. C. 10^15 bytes. D. 10^18 bytes
C. 10^15 bytes.
Choose the default configuration for Mapper and Reducer. A. 1 mapper and 1 reducer B. 4 mapper and 2 reducer C. 2 mapper and 1 reducer D. 2 mapper and 2 Reducer
C. 2 mapper and 1 reducer
Which statement is not true ? A. Hadoop consists of multiple products B. HDFS is a file system, not a RDBMS C. HDFS and network file system are related D. Hive resembles SQL but is not standard SQL
C. HDFS and network file system are related
MapReduce jobs runs and store the files using TextOutputFormat. Now, you need to run a second MapReduce job that takes the output key-value pairs from the first job as its input key-value pairs. Which InputFormat is best suited for the second MapReduce job ? A. TextInputFormat B. NLineInputFormat C. KeyValueTextInputFormat D. SequenceFileInputFormat
C. KeyValueTextInputFormat
How clients initially make communication to A. Data node B. data node and name node simultaneously C. Name node only D. all nodes have client data
C. Name node only
---------------- refers to both how fast data is being produced and how fast the data must be processed (i.e., captured, stored, and analyzed) to meet the need or demand. RFID tags, automated sensors, GPS devices, and smart meters are driving an increasing need to deal with torrents of data in near-real time. A. Volume B. Variety C. Velocity D. Veracity
C. Velocity
Choose the Cluster Management tool in Hadoop A. Job tracker B. MapReduce C. Yarn D. Workflow manager
C. Yarn
Using data to understand customers/clients and business operations to sustain and foster growth and profitability is: A. easier with the advent of BI and Big Data. B. essentially the same now as it has always been. C. an increasingly challenging task for today's enterprises. D. now completely automated with no human intervention required.
C. an increasingly challenging task for today's enterprises.
Which among the following take (key-value) pairs as input and produce output as (key-value)? A. Reducers B. All of the above C. both mappers and reducers D. Mappers
C. both mappers and reducers
Which Big Data approach promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources? A. in-memory analytics B. in-database analytics C. grid computing D. appliances
C. grid computing
Choose the correct format for submitting MapReduce Job. A. mapreduce jar MRJar.jar MRDriver inputdir outputdir B. hadoop jar MRJar.jar MRDriver inputdir outputdir C. hadoop -jar MRJar.jar MRDriver inputdir outputdir D. mapreduce -jar MRJar.jar MRDriver inputdir outputdir
C. hadoop -jar MRJar.jar MRDriver inputdir outputdir
what is the command to list files in HDFS? A. hadoop ls -l B. hadoop dfs ls -l C. hdfs dfs -ls D. hadoop ls
C. hdfs dfs -ls
In a Hadoop "stack," what node periodically replicates and stores data from the Name Node should it fail? A. backup node B. slave node C. secondary node D. substitute node
C. secondary node
The default block size for HDFS replication in Hadoop 2.0 and above 64 MB 128 KB 128 MB 64 KB
128 MB
In MapReduce how many main functions involved? 4 2 3 7
2
What is the default replication factor in HDFS ? 2 1 3 5
3
What class would be used to perform the summary operations against intermediate data on the Mapper node? A. Combiner B. Partitioner C. Name Node D. Distributed File system Cache
A. Combiner
The number of map tasks equals the number of input file splits, and may even be increased by the user by reducing the input split size. This leads to ______. A. Improved resource utilization B. Higher number of reduce tasks C. Lesser number of reduce tasks D. Lesser communication cost
A. Improved resource utilization
Which tool simplifies Java-based Mapreduce processing? A. Pig Latin B. Ambari C. MapReduce D. hive
A. Pig Latin
which of the following options is/are used to pass parameters to a MapReduce program during runtime? A. Both C and D B. Both A and B C. Only D D. Only C
B. Both A and B
Choose the non-relational database is part of Hadoop ecosystem. A. Hive B. Hbase C. NOSQL D. ZooKeeper
B. Hbase
----------- Node in a Hadoop cluster provides client information on where in the cluster particular data is stored. A. Master Node B. Name Node C. Data Node D. Slave Node
B. Name Node
Choose the tool useful for data loading in HDFS? A. MapReduce B. Sqoop C. Ozzie D. Yarn
B. Sqoop
In HDFS blocks are stored A. Continuously in data node B. blocks are spread over multiple data nodes C. blocks are spread over multiple name nodes D. all blocks are stored as per client request
B. blocks are spread over multiple data nodes
Configuration file for HDFS stored in A. hadoop.xml B. hdfs-site.xml C. hdfs.xml D. core.xml
B. hdfs-site.xml
Traditional data warehouses have not been able to keep up with: A. the evolution of the SQL language. B. the variety and complexity of data. C. expert systems that run on them. D. OLAP
B. the variety and complexity of data.
which statement is true? A. The number of input splits is equal to the number of map tasks. B. Each reducer outputs a file containing the end results of the data it processed. Each of these files is named from part-00000 to part-99999 which you cannot change. C. By default, the number of reducers for a job is 0.95 or 1.75 multiplied by (<no. of nodes> * <no. of maximum containers per node>). D. set the number of reducers to zero.
A. The number of input splits is equal to the number of map tasks.
In a Hadoop "stack," what is a slave node? A. a node where data is stored and processed B. a node where bits of programs are stored C. a node responsible for holding all the source programs D. a node where metadata is stored and used to organize data processing
A. a node where data is stored and processed
The analytics layer of the Big Data stack is experiencing what type of development currently? A. significant development B. limited development C. no development/stagnant D. no development/reject as being non-important
A. significant development
To display a list of blocks that make up each file? A.%hdfs fsck / -files-blocks B. %hdfs -fsck / -files-block C. %hdfs fsck / -files -blocks D. %hdfs -fsck / -files -blocks
A.%hdfs fsck / -files-blocks
Functions of Reducer in MapReduce. A. Shuffle B. Sort C. Reduce D. All of the above
D. All of the above
HDFS is A. File system from Storage Area Network B. File system from Network Attached Storage C. file system made of Flash Storage Devices D. Distributed File system made of Commodity hardware
D. Distributed File system made of Commodity hardware
The requirement is to upload the data to HDFS as soon as it reaches the company. Which tool is meant to satisfy the requirement in hadoop ecosystem? A. SQOOP B. ZooKeeper C. Ambari D. Flume
D. Flume
How does Hadoop work? A. It integrates Big Data into a whole so large data elements can be processed as a whole on one computer. B. It integrates Big Data into a whole so large data elements can be processed as a whole on multiple computers. C. It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on one computer. D. It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers.
D. It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers.
what are the details about data stored in the Name node? A. location of the blocks B. size of the files C. permissions and ownership D. all of the above
D. all of the above
The final output of the reduced tasks are stored on the ---------------------------- A. Name Node B. Data node C. local file system of data node where job ran D. due to the smaller size it is stored in HDFS
D. due to the smaller size it is stored in HDFS
Data flows can be highly inconsistent, with periodic peaks, making data loads hard to manage. What is this feature of Big Data called? A. volatility B. periodicity C. inconsistency D. variability
D. variability
Block size and replication factor cannot be configured once we set up the HDFS True False
False
The term "Big Data" is relative as it depends on the size of the using organization. True False
True