Hadoop Ecosystem
Point out the wrong statement: a) Elastic MapReduce (EMR) is Facebook's packaged Hadoop offering b) Amazon Web Service Elastic MapReduce (EMR) is Amazon's packaged Hadoop offering c) Scalding is a Scala API on top of Cascading that removes most Java boilerplate d) All of the mentioned
a) Elastic MapReduce (EMR) is Facebook's packaged Hadoop offering Rather than building Hadoop deployments manually on EC2 (Elastic Compute Cloud) clusters, users can spin up fully configured Hadoop installations using simple invocation commands, either through the AWS Web Console or through command-line tools
Point out the correct statement : a) Hive is not a relational database, but a query engine that supports the parts of SQL specific to querying data b) Hive is a relational database with SQL support c) Pig is a relational database with SQL support d) All of the mentioned
a) Hive is not a relational database, but a query engine that supports the parts of SQL specific to querying data Hive is a SQL-based data warehouse system for Hadoop that facilitates data summarization, ad hoc queries, and the analysis of large datasets stored in Hadoop-compatible file systems.
___________ is general-purpose computing model and runtime system for distributed data analytics. a) Mapreduce b) Drill c) Oozie d) None of the mentioned
a) Mapreduce Mapreduce provides a flexible and scalable foundation for analytics, from traditional reporting to leading-edge machine learning algorithms.
The Pig Latin scripting language is not only a higher-level data flow language but also has operators similar to: a) SQL b) JSON c) XML d) All of the mentioned
a) SQL Pig Latin, in essence, is designed to fill the gap between the declarative style of SQL and the low-level procedural style of MapReduce.
______ is a framework for performing remote procedure calls and data serialization. a) Drill b) BigTop c) Avro d) Chukwa
c) Avro In the context of Hadoop, Avro can be used to pass data from one program or language to another.
_________ hides the limitations of Java behind a powerful and concise Clojure API for Cascading. a) Scalding b) HCatalog c) Cascalog d) All of the mentioned
c) Cascalog Cascalog also adds Logic Programming concepts inspired by Datalog. Hence the name "Cascalog" is a contraction of Cascading and Datalog.
________ is a platform for constructing data flows for extract, transform, and load (ETL) processing and analysis of large datasets. a) Pig Latin b) Oozie c) Pig d) Hive
c) Pig Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs.
Hive also support custom extensions written in : a) C# b) Java c) C d) C++
b) Java Hive also support custom extensions written in Java, including user-defined functions (UDFs) and serializer-deserializers for reading and optionally writing custom formats.
________ is the most popular high-level Java API in Hadoop Ecosystem a) Scalding b) HCatalog c) Cascalog d) Cascading
d) Cascading Cascading hides many of the complexities of MapReduce programming behind more intuitive pipes and data flow abstractions.
_______ jobs are optimized for scalability but not latency. a) Mapreduce b) Drill c) Oozie d) Hive
d) Hive Hive Queries are translated to MapReduce jobs to exploit the scalability of MapReduce.