Chapter 7 edited
In the financial services industry, Big Data can be used to improve
both A & B
How does Hadoop work?
It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers.
Define MapReduce.
MapReduce is a programming model that is used to process and generate big datasets with a parallel distributed algorithm.
All of the following statements about MapReduce are true EXCEPT
MapReduce runs without fault tolerance
The ________ Node in a Hadoop cluster provides client information on where in the cluster particular data is stored and if any nodes fail.
Name
HBase, Cassandra, MongoDB, and Accumulo are examples of ________ databases.
NoSQL
What is NoSQL as used for Big Data? Describe its major downsides.
NoSQL is a new form of databases that processes and stores unstructured data that is not in a tabular format . NoSQL is high performance and highly scalable . The downside is that they trade ACID compliance for performance and scalability.
In a Hadoop "stack," what is a slave node?
a node where data is stored and processed
As volumes of Big Data arrive from multiple sources such as sensors, machines, social media, and clickstream interactions, the first step is to ________ all the data reliably and cost effectively.
capture
HBase is a nonrelational ________ that allows for low-latency, quick lookups in Hadoop.
database
In the Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse case study, what was the analytic goal?
determine differences in rates of disease in urban and rural populations
Hadoop is primarily a(n) ________ file system and lacks capabilities we'd associate with a DBMS, such as indexing, random access to data, and support for SQL.
distributed
In a network analysis, what connects nodes?
edges
As the size and the complexity of analytical systems increase, the need for more ________ analytical systems is also increasing to obtain the best performance.
efficient
Big Data comes from ________.
everywhere
Which Big Data approach promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources?
grid computing
Allowing Big Data to be processed in memory and distributed across a dedicated set of nodes can solve complex problems in near-real time with highly accurate insights. What is this process called?
in-memory analytics
In the Alternative Data for Market Analysis or Forecasts case study, satellite data was NOT used for
monitoring individual customer patterns
In the Twitter case study, how did influential users support their tweets?
objective data
Why are some portions of tape backup workloads being redirected to Hadoop clusters today?
- difficulty of retrieval , data that is stored offline takes long to retrieve , tape formats change over time , and are prone to loss of data . There is a value in keeping historical data online and accessable.
A newly popular unit of data in the Big Data era is the petabyte (PB), which is
10^15 bytes
What is Big Data's relationship to the cloud?
Amazon and Google have working Hadoop cloud offerings
________ bring together hardware and software in a physical unit that is not only fast but also scalable on an as-needed basis.
Appliances
________ speeds time to insights and enables better data governance by performing data integration and analytic functions inside the database.
In-database analytics
________ of data provides business value; pulling of data from multiple subject areas and numerous applications into one repository is the raison d'être for data warehouses.
Integration
In the world of Big Data, ________ aids organizations in processing and analyzing large volumes of multistructured data. Examples include indexing and search, graph analysis, etc.
MapReduce
Which of the following sources is likely to produce Big Data the fastest?
RFID tags
In the opening vignette, why was the Telecom company so concerned about the loss of customers, if customer churn is common in that industry?
The loss was at such a high rate . Th company had been losing customer faster than gaining customers . It was identified that the lost of customers could be traced back to customer service interactions.
________ refers to the conformity to facts: accuracy, quality, truthfulness, or trustworthiness of the data.
Veracity
The problem of forecasting economic activity or microclimates based on a variety of data beyond the usual retail data is a very recent phenomenon and has led to another buzzword — ________.
alternative data
Using data to understand customers/clients and business operations to sustain and foster growth and profitability is
an increasingly challenging task for today's enterprises.
In-motion ________ is often overlooked today in the world of BI and Big Data.
analytics
In open-source databases, the most important performance enhancement to date is the cost-based ________.
optimizer
Big Data employs ________ processing techniques and nonrelational data storage capabilities in order to process unstructured and semistructured data.
parallel
In a Hadoop "stack," what node periodically replicates and stores data from the Name Node should it fail?
secondary node
In the energy industry, ________ grids are one of the most impactful applications of stream analytics.
smart
Companies with the largest revenues from Big Data tend to be
the largest computer and IT services firms.
Traditional data warehouses have not been able to keep up with
the variety and complexity of data
A job ________ is a node in a Hadoop cluster that initiates and coordinates MapReduce jobs, or the processing of the data.
tracker
Under which of the following requirements would it be more appropriate to use Hadoop over a data warehouse?
unrestricted, ungoverned sandbox explorations
What is the Hadoop Distributed File System (HDFS) designed to handle?
unstructured and semistructured non-relational data
The ________ of Big Data is its potential to contain more useful patterns and interesting anomalies than "small" data.
value proposition
Data flows can be highly inconsistent, with periodic peaks, making data loads hard to manage. What is this feature of Big Data called?
variability
Organizations are working with data that meets the three V's-variety, volume, and ________ characterizations.
velocity