ITSS 4354 - Exam 1

¡Supera tus tareas y exámenes ahora con Quizwiz!

A newly popular unit of data in the Big Data era is the petabyte (PB), which is: A. 10^9 bytes. B. 10^12 bytes. C. 10^15 bytes. D. 10^18 bytes

C. 10^15 bytes.

Choose the default configuration for Mapper and Reducer. A. 1 mapper and 1 reducer B. 4 mapper and 2 reducer C. 2 mapper and 1 reducer D. 2 mapper and 2 Reducer

C. 2 mapper and 1 reducer

Which statement is not true ? A. Hadoop consists of multiple products B. HDFS is a file system, not a RDBMS C. HDFS and network file system are related D. Hive resembles SQL but is not standard SQL

C. HDFS and network file system are related

MapReduce jobs runs and store the files using TextOutputFormat. Now, you need to run a second MapReduce job that takes the output key-value pairs from the first job as its input key-value pairs. Which InputFormat is best suited for the second MapReduce job ? A. TextInputFormat B. NLineInputFormat C. KeyValueTextInputFormat D. SequenceFileInputFormat

C. KeyValueTextInputFormat

How clients initially make communication to A. Data node B. data node and name node simultaneously C. Name node only D. all nodes have client data

C. Name node only

---------------- refers to both how fast data is being produced and how fast the data must be processed (i.e., captured, stored, and analyzed) to meet the need or demand. RFID tags, automated sensors, GPS devices, and smart meters are driving an increasing need to deal with torrents of data in near-real time. A. Volume B. Variety C. Velocity D. Veracity

C. Velocity

Choose the Cluster Management tool in Hadoop A. Job tracker B. MapReduce C. Yarn D. Workflow manager

C. Yarn

Using data to understand customers/clients and business operations to sustain and foster growth and profitability is: A. easier with the advent of BI and Big Data. B. essentially the same now as it has always been. C. an increasingly challenging task for today's enterprises. D. now completely automated with no human intervention required.

C. an increasingly challenging task for today's enterprises.

Which among the following take (key-value) pairs as input and produce output as (key-value)? A. Reducers B. All of the above C. both mappers and reducers D. Mappers

C. both mappers and reducers

Which Big Data approach promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources? A. in-memory analytics B. in-database analytics C. grid computing D. appliances

C. grid computing

Choose the correct format for submitting MapReduce Job. A. mapreduce jar MRJar.jar MRDriver inputdir outputdir B. hadoop jar MRJar.jar MRDriver inputdir outputdir C. hadoop -jar MRJar.jar MRDriver inputdir outputdir D. mapreduce -jar MRJar.jar MRDriver inputdir outputdir

C. hadoop -jar MRJar.jar MRDriver inputdir outputdir

what is the command to list files in HDFS? A. hadoop ls -l B. hadoop dfs ls -l C. hdfs dfs -ls D. hadoop ls

C. hdfs dfs -ls

In a Hadoop "stack," what node periodically replicates and stores data from the Name Node should it fail? A. backup node B. slave node C. secondary node D. substitute node

C. secondary node

The default block size for HDFS replication in Hadoop 2.0 and above 64 MB 128 KB 128 MB 64 KB

128 MB

In MapReduce how many main functions involved? 4 2 3 7

2

What is the default replication factor in HDFS ? 2 1 3 5

3

What class would be used to perform the summary operations against intermediate data on the Mapper node? A. Combiner B. Partitioner C. Name Node D. Distributed File system Cache

A. Combiner

The number of map tasks equals the number of input file splits, and may even be increased by the user by reducing the input split size. This leads to ______. A. Improved resource utilization B. Higher number of reduce tasks C. Lesser number of reduce tasks D. Lesser communication cost

A. Improved resource utilization

Which tool simplifies Java-based Mapreduce processing? A. Pig Latin B. Ambari C. MapReduce D. hive

A. Pig Latin

which of the following options is/are used to pass parameters to a MapReduce program during runtime? A. Both C and D B. Both A and B C. Only D D. Only C

B. Both A and B

Choose the non-relational database is part of Hadoop ecosystem. A. Hive B. Hbase C. NOSQL D. ZooKeeper

B. Hbase

----------- Node in a Hadoop cluster provides client information on where in the cluster particular data is stored. A. Master Node B. Name Node C. Data Node D. Slave Node

B. Name Node

Choose the tool useful for data loading in HDFS? A. MapReduce B. Sqoop C. Ozzie D. Yarn

B. Sqoop

In HDFS blocks are stored A. Continuously in data node B. blocks are spread over multiple data nodes C. blocks are spread over multiple name nodes D. all blocks are stored as per client request

B. blocks are spread over multiple data nodes

Configuration file for HDFS stored in A. hadoop.xml B. hdfs-site.xml C. hdfs.xml D. core.xml

B. hdfs-site.xml

Traditional data warehouses have not been able to keep up with: A. the evolution of the SQL language. B. the variety and complexity of data. C. expert systems that run on them. D. OLAP

B. the variety and complexity of data.

which statement is true? A. The number of input splits is equal to the number of map tasks. B. Each reducer outputs a file containing the end results of the data it processed. Each of these files is named from part-00000 to part-99999 which you cannot change. C. By default, the number of reducers for a job is 0.95 or 1.75 multiplied by (<no. of nodes> * <no. of maximum containers per node>). D. set the number of reducers to zero.

A. The number of input splits is equal to the number of map tasks.

In a Hadoop "stack," what is a slave node? A. a node where data is stored and processed B. a node where bits of programs are stored C. a node responsible for holding all the source programs D. a node where metadata is stored and used to organize data processing

A. a node where data is stored and processed

The analytics layer of the Big Data stack is experiencing what type of development currently? A. significant development B. limited development C. no development/stagnant D. no development/reject as being non-important

A. significant development

To display a list of blocks that make up each file? A.%hdfs fsck / -files-blocks B. %hdfs -fsck / -files-block C. %hdfs fsck / -files -blocks D. %hdfs -fsck / -files -blocks

A.%hdfs fsck / -files-blocks

Functions of Reducer in MapReduce. A. Shuffle B. Sort C. Reduce D. All of the above

D. All of the above

HDFS is A. File system from Storage Area Network B. File system from Network Attached Storage C. file system made of Flash Storage Devices D. Distributed File system made of Commodity hardware

D. Distributed File system made of Commodity hardware

The requirement is to upload the data to HDFS as soon as it reaches the company. Which tool is meant to satisfy the requirement in hadoop ecosystem? A. SQOOP B. ZooKeeper C. Ambari D. Flume

D. Flume

How does Hadoop work? A. It integrates Big Data into a whole so large data elements can be processed as a whole on one computer. B. It integrates Big Data into a whole so large data elements can be processed as a whole on multiple computers. C. It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on one computer. D. It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers.

D. It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers.

what are the details about data stored in the Name node? A. location of the blocks B. size of the files C. permissions and ownership D. all of the above

D. all of the above

The final output of the reduced tasks are stored on the ---------------------------- A. Name Node B. Data node C. local file system of data node where job ran D. due to the smaller size it is stored in HDFS

D. due to the smaller size it is stored in HDFS

Data flows can be highly inconsistent, with periodic peaks, making data loads hard to manage. What is this feature of Big Data called? A. volatility B. periodicity C. inconsistency D. variability

D. variability

Block size and replication factor cannot be configured once we set up the HDFS True False

False

The term "Big Data" is relative as it depends on the size of the using organization. True False

True


Conjuntos de estudio relacionados

Health Assessment Chapter 22 Prep-U Neurological Assessment

View Set

Psy 2403 - Horney, Chapter 6: Horney, Introduction to Personality: Ch 5- Karen Horney

View Set

Differences Between Articles of Confederation and Constitution

View Set

Personal Finance Ch 1-4 Quiz Review For Exam 1 UCO

View Set

Unit 2, challenge 1, 2, 3: What is Sociology

View Set