Chapter 2 - Data science

Ace your homework & exams now with Quizwiz!

The final or main cause of the industrial revolution was the effects created by

Agricultural revolution

What are the four basic kinds of devices

Memory, Microprocessors, Logic, and Networks

Communication between Human and machine requires an interface. And, the interface may include

- Gesture recognition - Voice recognition - Terminal

What are the goals of data analysis in the Big Data Value Chain?

- Highlighting relevant data - Synthesizing and extracting useful hidden information with high potential from a business point of view.

What are the characteristics of Hadoop Ecosystem

- Reliability - Flexibility - Scalable

List things introduced or generated by in second industrial revolution

- Telegraph, Telephone, and Electrical power - The modern lightbulb, the assembly line, the automobile, aircraft, and the construction of the transcontinental railroad

How many types of data is present? A. 2 B. 3 C. 5 D. 4

Hadoop is a framework that works with a variety of related tools. Common cohorts include: A ,MapReduce, Hive and HBase B, MapReduce, MySQL and Google Apps C, MapReduce, Hummer and Iguana D, MapReduce, Heron and Trumpet

In Hadoop ecosystem, which component performs data management A. Spark and Mapreduce B. Oozie, Zookeeper C. Sqoop and Flume D. Hive, Pig

Which of the following are the Goals of HDFS? A. Fault detection and recovery B. Huge datasets C. Hardware at data D. All of the above

All

The minimum amount of data that HDFS can read or write is called a _____________. A. Datanode B. Namenode C. Block D. None of the above

Block

What is a general term that describes the delivery of on-demand services, usually through the internet, on a pay per use basis

Cloud Computing

On which platform Hadoop language runs? A. Bare metal B. Debian C. Cross-platform D. Unix-Like

Cross-platform

All of the following are service enabling devices except A. Modems B. Routers C. Switches D. Hadoop

____________ is the re-structuring or re-ordering of data by people or machines to increase their usefulness and add values for a particular purpose.

Data processing

______Is defined as the persistence and management of data in a scalable way that satisfies the needs of applications that require fast access to the data

Data storage

What is a data value chain and briefly explain the different chain activities in it

Describes the process of data creation and use from first identifying a need for data to its final use and possible reuse

__ and ___are used to ingest data from external sources into HDFS

Flume and Sqoop

Which Hadoop component is used for a distributed data storage unit?

HDFS - Hadoop Distributed File System

_______Is described as a transition to new manufacturing processes

IR 1.0

_______IR is also known as the Technological Revolution

IR 2.0

Stage of Industrial revolution characterized by the use of IoT, AI, and Big Data Technologies to the industry that will allow the intelligent production

Industry 4.0

Which technique in big data life cycle is used to transfer data from various sources to Hadoop?

Ingest

____is the input, or what you tell the computer to do or save

Input

One of the following is not common example of data type? A, Integer B, Float C, Text D, Variable

Integer

What are examples of unstructured data?

JASON & XML

What are the specific functions of Logic Devices

Logic devices provide specific functions, including device-to-device interfacing, data communication, signal processing, data display, timing and control operations

Which component of Hadoop is used for programming based data processing? A. HIVE B. Spark MLLib C. MapReduce D. Oozie

MapReduce

Which of the following is some sort of hardware architecture or software framework that allows the software to run? A. Hadoop file system B. Platform C. Malware D. Chatbot

Platform

The goal of most big data systems is A. To save cost B. To reduce complexity C. To surface insight D. To secure data

Probably C

"Information Technology" is an example of A. Primary Industry B. Secondary Industry C. Tertiary Industry D. Quaternary Industry

Quaternary

Which of the following is not Features Of Hadoop? A. Suitable for Big Data Analysis B. Scalability C. Robust D. Fault Tolerance

Robust

In which agricultural revolution did mass crop production were invented?

Second Agricultural Revolution or The British Agricultural Revolution

What type of data is self-describing data?

Semi-structured data type, eg. JASON, XML

T/F Data is independent of information whereas information is dependent on data

What stage of industrial revolution provide a service like teaching and nursing

The forth industrial revolution

What is the core factor of IR 3.0 revolution?

The mass production and wide spread use of digital logic circuits

List and explain basic components of Hadoop system?

There are three components of Hadoop: Hadoop HDFS - Hadoop Distributed File System (HDFS) is the storage unit. Hadoop MapReduce - Hadoop MapReduce is the processing unit. Hadoop YARN - Yet Another Resource Negotiator (YARN) is a resource management unit.

A much bigger percentage of all the data in our world is unstructured data. T/F

True

True or false? Hadoop can be used to create distributed clusters, based on commodity servers, that provide low-cost processing and storage for unstructured data, log files and other forms of big data.

True

True or false? MapReduce can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of unstructured data.

True

The type of data that doesn't have any predefined data model?

Unstructured data

Which characteristics of big data indicates data in uncertainty due to data inconsistency and ambiguity

Veracity

The primary goal of HCI is to improve the interaction between computers to computers!? T/F

False, It's Computer to humans

The bi-directional information flow between a human brain and a machine is referred to as A. Neuro-modulation B. Social computing C. Human machine interface D. Brain computer interface

Which of the following is not Features Of HDFS? A. It is suitable for the distributed storage and processing. B. Streaming access to file system data. C. HDFS provides file permissions and authentication. D. Hadoop does not provides a command interface to interact with HDFS.

In which technique in the big data life cycle is the data stored in the distributed file system, HDFS, and the NoSQL distributed data, HBase?

Processing

Is the process of gathering, filtering, and cleaning data before it put in a data warehouse

Data acquisition

Which of the following is regarded as smart Revolution A. Information revolution B. Agriculture revolution C. Industrial revolution (second revolution) D. Knowledge revolution

________Is used to transfer data from RDBMS to HDFS in a Hadoop ecosystem

Sqoop

T/F. Rural to Urban migration is not the result of industrial revolution

In which industrial revolution did internet were introduced

The Third Industrial Revolution

What do we mean by data science is a multi-disciplinary and why it is multi-disciplinary

This approach generally includes the fields of data mining, forecasting, machine learning, predictive analytics, statistics, and text analytics.

List Service Enabling Devices (SEDs)

Traditional channel service unit (CSU) and data service unit (DSU)• Modems• Routers• Switches• Conferencing equipment• Network appliances (NIDs and SIDs)• Hosting equipment and servers

_______is an open-source framework intended to make interaction with big data easier

Hadoop

All of the following are the specific functions of Logic Devices except A. Control operations B. Data communication C. Signal processing D. None

Data curators also known as

Data annotators

List and explain the five characteristics of Big Data? 5V's

Volume, Velocity, Variety and Veracity - Volume. - Volume refers to how much data is actually collected, Volume is like the base of big data, as it is the initial size and amount of data that is collected. If the volume of data is large enough, it can be considered big data. - Veracity - Veracity relates to how reliable data is, it is related to consistency, accuracy, quality, and trustworthiness. Data veracity refers to the biasedness, noise, and abnormality in data. - Velocity - Velocity in big data refers to how fast data can be generated, gathered, and analysed. It refers to how quickly data is generated and how quickly that data moves. This is an important aspect for companies need that need their data to flow quickly, so it's available at the right times to make the best business decisions possible. - Variety - Refers to the diversity of data types. An organization might obtain data from a number of different data sources, which may vary in value. - Value - It refers to the usefulness of gathered data for your business. Data by itself, regardless of its volume, usually isn't very useful — to be valuable, it needs to be converted into insights or information, and that is where data processing steps in.

True or false? Due to Hadoop's ability to manage unstructured and semi-structured data and because of its scale-out support for handling ever-growing quantities of data, many experts view it as a replacement for the enterprise data warehouse.

False

What is human to computer interaction and how human beings interact with computers?

HCI is the study of how people interact with computers and to what extent computers are or are not developed for successful interaction with human beings The user interacts directly with hardware for the human input and output such as displays, e.g. through a graphical user interface.

____is a distributed file system that may run on a cluster of commodity machines, where the storage of data is distributed among the cluster and the processing is distributed too.

Hadoop

List the importance of cluster computing

High availability through fault tolerance and resilience, load balancing and scaling capabilities, and performance improvements

What are the different activities of Big-data life cycle A. Ingesting data, persisting data, computing and analyzing data, and visualizing results B. Acquisition, analysis, curation, storage, and usage C. Veracity, variability, and value D. Input, processing, and output

Which of the following is not a future trend of networks A. 5G technology B. Rise of centralization C. Embedded computation D. Network developments in edge computing

What is the massive amount of data which cannot be stored, processed, and analyzed using the traditional ways?

Big data

A device that provides interfacing, data communication, signal processing and other key similar functionalities is A. Microprocessor B. Memory device C. Logic device D. Network device

Which IR is known as Digital revolution

IR 3.0

_______Is a model for enabling convenient on demand network access to a shared of computing resources

Cloud computing

Discuss the difference between cloud computing and cluster computing.

Cloud computing Cloud computing is the on-demand availability of computer system resources, especially data storage and computing power, without direct active management by the user. Large clouds often have functions distributed over multiple locations, each location being a data center. Cluster computing A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike cloud computing, computer clusters have each node set to perform the same task, controlled and scheduled by software.

_______ IR Introduced the transition from mechanical and analog electronic technology to digital electronics

IR 3.0

What is a cyber-physical system

Is a mechanism that is controlled or monitored by computer-based algorithms, tightly integrated with the Internet and its users.

According to analysts, for what can traditional IT systems provide a foundation when they're integrated with big data technologies like Hadoop? A, Big data management and data mining B, Collecting and storing unstructured data C, Management of Hadoop clusters D, Data warehousing and business intelligence

Hadoop benefits big data users for the following reasons except: A, It can store and process vast amounts of structured, semi-structured and unstructured data, quickly B, It can support real-time analytics to help drive better operational decision-making B, It protects application and data processing against hardware failures C, It requires data to be pre-processed for storage before filtering it for specific analytic uses

All of the following accurately describe Hadoop, EXCEPT: A, Open source B, Real-time C, Java-based D, Distributed computing approach

Real-time

_____Is making the raw data acquired amenable to use in decision making as well as domain specific usage

Data Analysis

In data value chain, the activities of ensuring that data are trustworthy, discoverable, accessible, reusable and fit their purpose is called

Data Curation

__________Covers the data driven business activities that need access to data, its analysis, and the tools needed to integrate the data analysis within the business activity

Data Usage

___________is performed by expert curators that are responsible for improving the accessibility and quality of data.

Data curation

_____Describes the information flow with in the big data system in generate hidden pattern and useful knowledge from data.

Chapter 2 - Data science

Related study sets

The ULTIMATE stats study guide

Neuromechanical Kinesiology

Pharmacology- CH 91- Fluoroquinolones, metronidazole, Rifampin, etc.

Life Insurance Basic CH 2

Velazquez Preguntas

ch. 4 Processing Crime and Incident Scenes

It 214

Midterm Review #1

Chapter 17 Law Final Study Guide

PEDS/ TODDLER CHAPTER 26

1601 final exam part 1

Chapter 15 Drugs Affecting Inflammation and Infection

Psych Review

Q3

2.4 Assessment

Biology Final Exam Review

catherine's lines - proof scene 4

IBUS 310 Ch 8 SmartBook

Adaptive Quizzes

Sociology