Topics n BigData EXAM

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

The client reading the data from HDFS filesystem in Hadoop -gets the data from the namenode -gets the block location from the datanode -gets both the data and block location from the namenode -gets only the block locations from the namenode

gets only the block locations from the namenode

When you setup a Hadoop cluster, which of the command is used to verify whether all the Hadoop daemons are running on the machine -top -ps -jps -fsck

jps

In which language hadoop is written in -c++ -Python -c# -Java

Java

Which is the slave daemon of Yarn. -NodeManager -Container -ApplicationMaster -ResourceManager

NodeManager

Users can control which keys (and hence records) go to which Reducer by implementing a custom? -All of the mentioned -Reporter -Partitioner -OutputSplit

Partitioner

Apache Spark has API's in · Java · Scala · Python · All of the above

All of the above

In which of the following languages you can code in Hadoop? -Python -R -Java -C++

All of them

Which of the following is not a scheduling option available in YARN -FIFO scheduler -Fair scheduler -Balanced scheduler -Capacity scheduler

Balanced scheduler

Namenode stores filesystem metadata which is further divided in ____ -Editlog -work directory -None of the above -Fsimage

Editlog Fsimage

Which of the following key features of HDFS ensure against data loss? -Fault tolerant -Scalable -Replication -Portable

Replication

________ is the slave/worker node and holds the user data in the form of Data Blocks. -NameNode -Data block -Replication -DataNode

DataNode

MapReduce is a programming model used in Hadoop for processing Big Data. It's also a processing technique for what? -Distributed computing -System with multiple components -Java -Python

Distributed computing

Point out the wrong statement regarding driver class in MapReduce implementation -The driver class is responsible for setting up MapReduce job to run in Hadoop -We specify job name, data type of input/output and names of the mapper and reducer classes in the driver class -Driver class is optional in MapReduce -We also need to set input and output directories for the MapReduce job

Driver class is optional in MapReduce

Select the statement that identifies all the data types associated with Big Data. · Semi-structured data is not associated with Big Data. · Unstructured data is not associated with Big Data. · Structured, semi-structured, and unstructured data are all associated with Big Data. · Only unstructured data is associated with Big Data.

Structured, semi-structured, and unstructured data are all associated with Big Data.

The datanode and namenode are respectively ___ -Master and worker nodes -None -Both are worker nodes -Worker and Master nodes

Worker and Master nodes

The hdfs command put is used to -Copy files from local file system to HDFS. -Copy files from from HDFS to local filesystem. -Copy files or directories from HDFS to local filesystem. -Copy files or directories from local file system to HDFS.

-Copy files from local file system to HDFS. -Copy files or directories from local file system to HDFS.

When writing data to HDFS what is true if the replication factor is three? -Data is written to DataNodes on three separate racks (if Rack Aware). -Data is written to blocks on three different DataNodes. -None of the above -Data is written to DataNodes on two separate racks (if Rack Aware).

-Data is written to blocks on three different DataNodes. -Data is written to DataNodes on two separate racks (if Rack Aware).

Namenode keeps metadata in -HDFS -Memory -Both -None of the above

-HDFS

What are the components of a Hadoop 1 architecture (before 2014)? -HDFS and MapReduce -HDFS, MapReduce, and YARN -DataNode and NameNode -Jobtracker and Tasktracker

-HDFS and MapReduce

In a Hadoop cluster, what is true for a HDFS block that is no longer available due to disk corruption or machine failure? -It can be replicated form its alternative locations to other live machines. -The namenode allows new client request to keep trying to read it. -It is lost for ever -The Mapreduce job process runs ignoring the block and the data stored in it.

-It can be replicated form its alternative locations to other live machines.

Which of the following is component of Hadoop? -YARN -HDFS -MapReduce -Spark

-YARN -HDFS -MapReduce

HDFS works in a __________ fashion. -worker-master fashion -master-slave fashion -master-worker fashion -slave-master fashion

-master-slave fashion -master-worker fashion

The default number of reducers for a MapReduce job is __________ -3 -1 -2 -None of the above

1

Which Cluster Manager do Spark Support? · Standalone Cluster Manager · MESOS · YARN · All of the above

All of the above

Which of the following capabilities are quantifiable advantages of distributed processing? · You can add and remove execution nodes as and when required, significantly reducing infrastructure costs. · Since problem instructions are executed on separate execution nodes, memory and processing requirements are low even while processing large volumes of data. · Parallel processing can process Big Data in a fraction of the time compared to linear processing. · Parallel processing fixes and executes errors locally without impacting other nodes.

All of them

which of the following statements about Hadoop are true? -Collection of computers working together at the same time to perform tasks -Hadoop allows for running applications on clusters -Processes massive amounts of data in distributed files systems that are linked together. -Set of open-source programs and procedures which can be used as the framework for Big Data operation

All of them

What is TRUE about transformation? · Transformations are the functions that are applied on an RDD · Filter and Map applies to each element of RDD that creates a new RDD · Transformations are not executed until an action is called · All of these

All of these

Which of the following physically stores the data? -Master Node -All of the above -Data Node -Name Node

Data Node

What is following is TRUE about Apache Spark? · Hadoop is way faster than Apache Spark · It provides high-level API's only in Java · Apache Spark is a fast and general purpose open source cluster computing system · All of these

Apache Spark is a fast and general purpose open source cluster computing system

Which of the following is framework-specific entity that negotiates resources from the ResourceManager? -ApplicationMaster -NodeManager -ResourceManager -All of the above

ApplicationMaster

The minimum amount of data that HDFS can read or write is called a _____________. -NameNode -Block -Datanode -None of the above

Block

Which type of data processing Spark offers? · Batched-based processing of data stream · Interactive processing · None · Both

Both

Point out the wrong statement regarding combiner -Combiner can speed up the job execution -Combiner will be called as long as it is specified in the job configuration -the existing reducer can be used as combiner -Combiner should not affect the final result

Combiner will be called as long as it is specified in the job configuration

In which mode all daemons execute in separate nodes -None of the above -Pseudo-distributed mode -Fully distributed mode -Local (Standalone) mode

Fully distributed mode

What are the main components of Hadoop framework? -Kafka -HDFS -YARN -MapReduce

HDFS YARN MapReduce

What are the components of a Hadoop architecture? -DataNode and NameNode -HDFS, MapReduce, and YARN -Jobtracker and Tasktracker -HDFS and MapReduce

HDFS, MapReduce, and YARN HDFS and MapReduce

________ is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer. -None of the mentioned -Hadoop Stream -Hadoop Strdata -Hadoop Streaming

Hadoop Streaming

Which one of the following is FALSE about Hadoop? -It is a distributed framework -MapReduce is the processing engine in Hadoop -Hadoop can work with commodity hardware -Hadoop was created by Google

Hadoop was created by Google

_________ is the default Partitioner for partitioning key space. -Partitioner -HashPartitioner -HashPar -None of the mentioned

HashPartitioner

The CapacityScheduler supports _____________ queues to allow for more predictable sharing of cluster resources. -Hierarchical -Networked -None of the above -Partition

Hierarchical

Point out the correct statement. -The right number of reduces seems to be 0.95 or 1.75 -Increasing the number of reduces increases the framework overhead -With 0.95 all of the reduces can launch immediately and start transferring map outputs as the maps finish -All of the mentioned

Increasing the number of reduces increases the framework overhead

The Hadoop MapReduce framework spawns one map task for each __________ generated by the InputFormat for the job. -All of the mentioned -OutputSplit -InputSplit -InputSplitStream

InputSplit

The number of maps is usually driven by the total size of ____________ None of the above Inputs Output Task

Inputs

Which of the following is not the feature of Spark? · Fault-tolerance · Supports in-memory computation · It is not cost efficient · Compatible with other file storage system

It is not cost efficient

Users can bundle their MapReduce code in a _________ file and execute it using jar command. -py -Jar -Java -xml

Jar

Although the Hadoop framework is implemented in Java, MapReduce applications need not be written in ____________ Python C++ Java None of the above

Java

What is Sparks data loading mechanism? · Eager Loading · Both of these · Lazy loading · None of these

Lazy loading

Which of the following is a data processing engine for Hadoop Framework? -MapReduce -Spark -HDFS -YARN

MapReduce

Is YARN a replacement of Hadoop MapReduce? (Y/N)

No

What is FALSE about RDD? · RDD is immutable · RDD provides two kinds of operations: transformations & actions · Spark revolves around the concept of RDD · None of the above

None of the these

Which of the following capabilities are quantifiable advantages of parallel processing? · You can add and remove execution nodes as and when required, significantly reducing infrastructure costs. · Since problem instructions are executed on separate execution nodes, memory and processing requirements are low even while processing large volumes of data. · Parallel processing can process Big Data in a fraction of the time compared to linear processing. · Parallel processing fixes and executes errors locally without impacting other nodes.

Parallel processing can process Big Data in a fraction of the time compared to linear processing

Which of the following are not design goals of HDFS? -Fault detection and recovery -Prevent deletion of data -Provide high network bandwidth for data movement -Handle huge dataset

Prevent deletion of data Provide high network bandwidth for data movement

In MapReduce, the number of reducers can be changed by____________ -The number of map tasks -input size -Programmer set the number of Reducers -the number of nodes in a cluster

Programmer set the number of Reducers

Hive is a ____ -Query Language -Database -Data Flow Language -Programming Language

Query Language

The basic abstraction of Spark Streaming is · DataFrame · RDD · Shared variable · None of the above

RDD

all of the following accurately describe Hadoop, EXCEPT: -Open Source -Java Based -Real Time -Distributed Computing approach

Real Time

All of the following accurately describe Hadoop, EXCEPT -Open-source -Real-time -Java-based -Distributed computing approach

Real-time

Which of the following key features of HDFS ensure against data loss? -Scalable -Portable -Fault tolerant -Replication

Replication

What does RDD stands for? · Redundant Distributed Database · Resilient Distributed Database · Resilient Distributed Dataset · None

Resilient Distributed Dataset

What is YARN? -None of the above -Storage layer -Batch processing engine -Resource Management Layer

Resource Management Layer

Which among the following is ultimate authority that arbitrates resources among all the applications in the system. -Container -ApplicationMaster -ResourceManager -NodeManage

ResourceManager

Spark is developed in which language · Java · Scala · Python · R

Scala

Which of the following phases occur simultaneously -Shuffle and Map -Both A and B -Shuffle and Sort -Reduce and Sort

Shuffle and Sort

Which statement best describes small data? · Small Data is available in limited quantities that humans can easily interpret with little or no digital processing. · Small data consists of batches of big data requiring large amounts of compute power. · Small Data is available in quantities that humans can easily interpret after digital processing. · Small data has little or no structure or is semi-structured. Examples of semi-structured data include social media posts that could be images accompanied by hashtags, while unstructured data could include medical records from millions of patients.

Small Data is available in limited quantities that humans can easily interpret with little or no digital processing.

What is the driver program of Spark? · SparkContext · Cluster Manager · Worker Node · All

SparkContext

Point out the wrong statement. -It is legal to set the number of reduce-tasks to zero if no reduction is desired. -The Mapreduce framework does not sort the map-outputs before sending them to the reduce tasks -None of the above -The outputs of the map-tasks go directly to the local File System

The Mapreduce framework does not sort the map-outputs before sending them to the reduce tasks

The total number of partitioner is equal to -The number of reducer -The number of combiner -All of the above -The number of mapper

The number of reducer

Hadoop can be deployed on commodity servers, which provides low-cost processing as well as storage of unstructured, huge volume of data. -True -False

True

What is writable in Hadoop? -None of these answers are correct -Writable is a java interface that needs to be implemented for HDFS writes -Writable is a java interface that needs to be implemented for streaming data to remote servers -Writable is a java interface that needs to be implemented for MapReduce processing

Writable is a java interface that needs to be implemented for MapReduce processing

Which of the following is the architectural center of Hadoop that allows multiple data processing engines. -HDFS -YARN -Hive -Incubator

YARN

Which of the following manages the resources among all the applications running in a Hadoop cluster? -NameNode -DataNode -YARN -MapReduce

YARN

Apache Hadoop YARN stands for : -Yet Another Resource Network -None of the above -Yet Another Resource Negotiator -Yet Another Reserve Negotiator

Yet Another Resource Negotiator

In HDFS the files cannot be___ -executed -none of the above -deleted -read

executed

Which of these statements describe big data? Check all that apply. · Data generated in huge volumes and can be structured, semi-structured, or unstructured. · Big Data arrives continuously at enormous speed from multiple sources. · Big Data is relatively consistent and is stored as JSON or XML forms. · Big Data is mostly located in storage within Enterprises and Data Centers.

· Data generated in huge volumes and can be structured, semi-structured, or unstructured. · Big Data arrives continuously at enormous speed from multiple sources. · Big Data is mostly located in storage within Enterprises and Data Centers.


Kaugnay na mga set ng pag-aaral

6500 Exam #2 Self Assessment Quiz's

View Set

ATI Pediatric Growth and Development

View Set

USMLE Step 2 CK Board Preparation: Diseases of the Musculoskeletal System

View Set