ISM6404 Chapter 7
MapReduce runs without fault tolerance.
All of the following statements about MapReduce are true EXCEPT
in-memory analytics
Allowing Big Data to be processed in memory and distributed across a dedicated set of nodes can solve complex problems in near-real time with highly accurate insights. What is this process called?
True
Big Data is being driven by the exponential growth, availability, and use of information
Objective Data
In the Twitter case study, how did influential users support their tweets?
regulatory oversight & decision making
In the financial services industry, Big Data can be used to improve
True
In the opening vignette, the Access Telecom (AT), built a system to better visualize customers who were unhappy before they canceled their service.
True
It is important for Big Data and self-service business intelligence to go hand in hand to get maximum value from analytics
True
MapReduce can be easily understood by skilled programmers due to its procedural nature.
False
Big Data simplifies data governance issues, especially for global firms.
False
Big Data uses commodity hardware, which is expensive, specialized hardware that is custom built for a client or application.
True
Current total storage capacity lags behind the digital information being generated in the world.
Variability
Data flows can be highly inconsistent, with periodic peaks, making data loads hard to manage. What is this feature of Big Data called?
True
The quality and objectivity of information disseminated by influential users of Twitter is higher than that disseminated by noninfluential users.
True
The term "Big Data" is relative as it depends on the size of the using organization.
True
For low latency, interactive reports, a data warehouse is preferable to Hadoop.
False
Hadoop and MapReduce require each other to work.
True
Hadoop was designed to handle petabytes and exabytes of data distributed over multiple nodes in parallel.
It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers.
How does Hadoop work?
True
If you have many flexible programming languages running in parallel, Hadoop is preferable to a data warehouse.
True
In Application Case 7.6, Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse, it was found that urban individuals have a higher number of diagnosed disease conditions.
False
In most cases, Hadoop is used to replace data warehouses
determine differences in rates of disease in urban and rural populations
In the Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse case study, what was the analytic goal?
False
In the Salesforce case study, streaming data is used to identify services that customers use most
True
Satellite data can be used to evaluate the activity at retail locations as a source of alternative data.
True
Social media mentions can be used to chart and predict flu outbreaks.
True
There is a clear difference between the type of information support provided by influential users versus the others on Twitter.
unrestricted, ungoverned sandbox explorations
Under which of the following requirements would it be more appropriate to use Hadoop over a data warehouse?
Amazon and Google have working Hadoop cloud offerings.
What is Big Data's relationship to the cloud?
unstructured and semistructured non-relational data
What is the Hadoop Distributed File System (HDFS) designed to handle?
grid computing
Which Big Data approach promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources?
RFID tags
Which of the following sources is likely to produce Big Data the fastest?
10^15 bytes.
A newly popular unit of data in the Big Data era is the petabyte (PB), which is
True
Despite their potential, many current NoSQL tools lack mature management and monitoring tools