ISM Study Guide CH.7
In a Hadoop "stack," what is a slave node? A) a node where bits of programs are stored B) a node where metadata is stored and used to organize data processing C) a node where data is stored and processed D) a node responsible for holding all the source programs
a node where data is stored and processed
Using data to understand customers/clients and business operations to sustain and foster growth and profitability is A) easier with the advent of BI and Big Data B) essentially the same now as it has always been C) an increasingly challenging task for today's enterprises D) now completely automated with no human intervention required
an increasingly challenging task for today's enterprises
In the financial services industry, Big Data can be used to improve A) regulatory oversight B) decision making C) customer service D) both A & B
both A & B
In the Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse case study, what was the analytic goal? A) determine if diseases are accurately diagnosed B) determine probabilities of diseases that are comorbid C) determine differences in rates of disease in urban and rural populations D) determine differences in rates of disease in males v. females
determine differences in rates of disease in urban and rural populations
In a network analysis, what connects nodes? A) edges B) metrics C) paths D) visualizations
edges
Which Big Data approach promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources? A) in-memory analytics B) in-database analytics C) grid computing D) appliances
grid computing
Allowing Big Data to be processed in memory and distributed across a dedicated set of nodes can solve complex problems in near-real time with highly accurate insights. What is this process called? A) in-memory analytics B) in-database analytics C) grid computing D) appliances
in-memory analytics
In the Alternative Data for Market Analysis or Forecasts case study, satellite data was NOT used for A) evaluating retail traffic B) monitoring activity at factories C) tracking agricultural estimates D) monitoring individual customer patterns
monitoring individual customer patterns
In the Twitter case study, how did influential users support their tweets? A) opinion B) objective data C) multiple posts D) references to other users
objective data
In a Hadoop "stack," what node periodically replicates and stores data from the Name Node should it fail? A) backup node B) secondary node C) substitute node D) slave node
secondary node
Companies with the largest revenues from Big Data tend to be A) the largest computer and IT services firms B) small computer and IT services firms C) pure open source Big Data firms D) non-U.S. Big Data firms
the largest computer and IT services firms.
Traditional data warehouses have not been able to keep up with A) the evolution of the SQL language. B) the variety and complexity of data. C) expert systems that run on them. D) OLAP
the variety and complexity of data
Under which of the following requirements would it be more appropriate to use Hadoop over a data warehouse? A) ANSI 2003 SQL compliance is required B) online archives alternative to tape C) unrestricted, ungoverned sandbox explorations D) analysis of provisional data
unrestricted, ungoverned sandbox explorations
What is the Hadoop Distributed File System (HDFS) designed to handle? A) unstructured and semistructured relational data B) unstructured and semistructured non-relational data C) structured and semistructured relational data D) structured and semistructured non-relational data
unstructured and semistructured non-relational data
Data flows can be highly inconsistent, with periodic peaks, making data loads hard to manage. What is this feature of Big Data called? A) volatility B) periodicity C) inconsistency D) variability
variability
A newly popular unit of data in the Big Data era is the petabyte (PB), which is A) 109 bytes B) 1012 bytes C) 1015 bytes D) 1018 bytes
1015 bytes
What is Big Data's relationship to the cloud? A) Hadoop cannot be deployed effectively in the cloud just yet B) Amazon and Google have working Hadoop cloud offerings C) IBM's homegrown Hadoop platform is the only option D) Only MapReduce works in the cloud; Hadoop does not
Amazon and Google have working Hadoop cloud offerings
Which of the following sources is likely to produce Big Data the fastest? A) order entry clerks B) cashiers C) RFID tags D) online customers
RFID tags
How does Hadoop work? A) It integrates Big Data into a whole so large data elements can be processed as a whole on one computer B) It integrates Big Data into a whole so large data elements can be processed as a whole on multiple computers C) It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on one computer D) It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers
It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers
All of the following statements about MapReduce are true EXCEPT A) MapReduce is a general-purpose execution engine B) MapReduce handles the complexities of network communication C) MapReduce handles parallel programming D) MapReduce runs without fault tolerance
MapReduce runs without fault tolerance