Hadoop Basics

¡Supera tus tareas y exámenes ahora con Quizwiz!

3. What license is Hadoop distributed under ? a) Apache License 2.0 b) Mozilla Public License c) Shareware d) Commercial

Answer: a Explanation: Hadoop is Open Source, released under Apache 2 license.

3. According to analysts, for what can traditional IT systems provide a foundation when they're integrated with big data technologies like Hadoop ? a) Big data management and data mining b) Data warehousing and business intelligence c) Management of Hadoop clusters d) Collecting and storing unstructured data

Answer: a Explanation: Data warehousing integrated with Hadoop would give better understanding of data.

2. Point out the correct statement : a) Hive is not a relational database, but a query engine that supports the parts of SQL specific to querying data b) Hive is a relational database with SQL support c) Pig is a relational database with SQL support d) All of the mentioned

Answer: a Explanation: Hive is a SQL-based data warehouse system for Hadoop that facilitates data summarization, ad hoc queries, and the analysis of large datasets stored in Hadoop-compatible file systems.

9. Above the file systems comes the ________ engine, which consists of one Job Tracker, to which client applications submit MapReduce jobs. a) MapReduce b) Google c) Functional programming d) Facebook

Answer: a Explanation: MapReduce engine uses to distribute work around a cluster.

8. __________ can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data. a) MapReduce b) Mahout c) Oozie d) All of the mentioned

Answer: a Explanation: MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm.

7. ___________ is general-purpose computing model and runtime system for distributed data analytics. a) Mapreduce b) Drill c) Oozie d) None of the mentioned

Answer: a Explanation: Mapreduce provides a flexible and scalable foundation for analytics, from traditional reporting to leading-edge machine learning algorithms.

8. The Pig Latin scripting language is not only a higher-level data flow language but also has operators similar to : a) SQL b) JSON c) XML d) All of the mentioned

Answer: a Explanation: Pig Latin, in essence, is designed to fill the gap between the declarative style of SQL and the low-level procedural style of MapReduce.

5. Which of the following genres does Hadoop produce ? a) Distributed file system b) JAX-RS c) Java Message Service d) Relational Database Management System

Answer: a Explanation: The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user.

4. Hadoop is a framework that works with a variety of related tools. Common cohorts include: a) MapReduce, Hive and HBase b) MapReduce, MySQL and Google Apps c) MapReduce, Hummer and Iguana d) MapReduce, Heron and Trumpet

Answer: a Explanation: To use Hive with HBase you'll typically want to launch two clusters, one to run HBase and the other to run Hive.

7. All of the following accurately describe Hadoop, EXCEPT: a) Open source b) Real-time c) Java-based d) Distributed computing approach

Answer: b Explanation: Apache Hadoop is an open-source software framework for distributed storage and distributed processing of Big Data on clusters of commodity hardware.

2. Point out the correct statement : a) Hadoop is an ideal environment for extracting and transforming small volumes of data b) Hadoop stores data in HDFS and supports data compression/decompression c) The Giraph framework is less useful than a MapReduce job to solve graph and machine learning d) None of the mentioned

Answer: b Explanation: Data compression can be achieved using compression algorithms like bzip2, gzip, LZO, etc. Different algorithms can be used in different scenarios based on their capabilities.

2. Point out the correct statement : a) Hadoop do need specialized hardware to process the data b) Hadoop 2.0 allows live stream processing of real time data c) In Hadoop programming framework output files are divided in to lines or records d) None of the mentioned

Answer: b Explanation: Hadoop batch processes data distributed over a number of computers ranging in 100s and 1000s.

4. Hive also support custom extensions written in : a) C# b) Java c) C d) C++

Answer: b Explanation: Hive also support custom extensions written in Java, including user-defined functions (UDFs) and serializer-deserializers for reading and optionally writing custom formats.

1. ________ is a platform for constructing data flows for extract, transform, and load (ETL) processing and analysis of large datasets. a) Pig Latin b) Oozie c) Pig d) Hive

Answer: c Explanation: Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs.

6. What was Hadoop named after? a) Creator Doug Cutting's favorite circus act b) Cutting's high school rock band c) The toy elephant of Cutting's son d) A sound Cutting's laptop made during Hadoop's development

Answer: c Explanation: Doug Cutting, Hadoop's creator, named the framework after his child's stuffed toy elephant.

7. Which of the following platforms does Hadoop run on ? a) Bare metal b) Debian c) Cross-platform d) Unix-like

Answer: c Explanation: Hadoop has support for cross platform operating system.

5. Point out the wrong statement : a) Hardtop's processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabytes of data b) Hadoop uses a programming model called "MapReduce", all the programs should confirms to this model in order to work on Hadoop platform c) The programming model, MapReduce, used by Hadoop is difficult to write and test d) All of the mentioned

Answer: c Explanation: The programming model, MapReduce, used by Hadoop is simple to write and test.

1. As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including: a) Improved data storage and information retrieval b) Improved extract, transform and load features for data integration c) Improved data warehousing functionality d) Improved security, workload management and SQL support

Answer: d Explanation: Adding security to Hadoop is challenging because all the interactions do not follow the classic client- server pattern.

6. ________ is the most popular high-level Java API in Hadoop Ecosystem a) Scalding b) HCatalog c) Cascalog d) Cascading

Answer: d Explanation: Cascading hides many of the complexities of MapReduce programming behind more intuitive pipes and data flow abstractions.

9. _______ jobs are optimized for scalability but not latency. a) Mapreduce b) Drill c) Oozie d) Hive

Answer: d Explanation: Hive Queries are translated to MapReduce jobs to exploit the scalability of MapReduce.


Conjuntos de estudio relacionados

Data Analysis: Chapter 12: Simple Regression

View Set

What Are the Seven Wonders of the Ancient World questions

View Set

Module 36 - Clinical Decision Making

View Set

FAR Study Quiz 1 (Fin Stmt Disclosure)

View Set

ATI Practice Test (Anticoagulants)

View Set

Fundamentals for Success in Business D075 - All Questions From Course

View Set

no longer working study guide - new quizlet: https://quizlet.com/649621525/new-and-hopefully-correct-psychology-final-study-guide-flash-cards/

View Set