ISBA 510 - Chapter 7
- In a network analysis, what connects nodes?
o Edges
- Current total storage capacity lags behind the digital information being generated in the world.
True
- Despite their potential, many current NoSQL tools lack mature management and monitoring tools.
True
- For low latency, interactive reports, a data warehouse is preferable to Hadoop.
True
- It is important for big data and self service business intelligence to go hand in hand to get maximum value from analytics.
True
- MapReduce can be easily understood by skilled programmers due to its procedural nature.
True
- Satellite data can be used to evaluate the activity at retail locations as a source of alternative data.
True
- A newly popular unit of data in the big data era is the petabyte (PB), which is
o 10^15 bytes
- In a Hadoop "stack", what is a slave node?
o A node where data is stored and processed
- The problem of forecasting economic activity or microclimates based on a variety of data beyond the usual retail data is a very recent phenomenon and has led to another buzzword ___
o Alternative data
- What is big data's relationship to the cloud?
o Amazon and Google have working Hadoop cloud offerings
- Using data to understand customers/clients and business operations to sustain and foster growth and profitability is
o An increasingly challenging task for today's enterprises
- In motion ____ is often overlooked today in the world of BI and big data
o Analytics
- ___ bring together hardware and software in a physical unit that is not only fast but also scalable on an as needed basis
o Appliances
- As volumes of big data arrive from multiple sources such as sensors, machines, social media, and clickstream interactions, the first step is to ___ all the data reliably and cost effectively
o Capture
- Big data comes from
o Everywhere
- Which big data approach promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources?
o Grid computing
- ___ speeds time to insights and enables better data governance by performing data integration and analytics functions inside the database
o In database analytics
- Allowing big data to be processed in memory and distributed across a dedicated set of nodes can solve complex problems on near real time with highly accurate insights. What is this process called?
o In memory analytics
- ____ refers to the conformity to facts: accuracy, quality, truthfulness, or trustworthiness of the data
o Veracity
- Big data is being driven by exponential growth, availability, and use of information.
True
- Hadoop was designed to handle petabytes and exabytes of data distributed over multiple nodes in parallel.
True
- If you have many flexible programming languages running in parallel, Hadoop is preferable to a data warehouse.
True
- In Application Case 7.6, Analyzing Disease Patterns from an Electric Medical Records Data Warehouse, it was found that urban individuals have a higher number of diagnosed disease conditions.
True
- In the opening vignette, the Access Telecom (AT), built a system to better visualize customers who were unhappy before they canceled their service.
True
- In the Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse case study, what was the analytic goal?
o Determine differences in rates of disease in urban and rural populations
- Hadoop is primarily a ___ file system and lacks capabilities we'd associate with a DBMS, such as indexing, random access to data, and support for SQL
o Distributed
- As the size and complexity of analytics systems increase, the need for more ___ analytics systems is also increasing to obtain the best performance
o Efficient
- ___ of data provides business value: pulling data from multiple subject areas and numerous applications into one repository in the raison d'etre for data warehouses
o Integration
- How does Hadoop work?
o It breaks up big data into multiple parts so each part can be processed and analyzed at the same time on multiple computers
- In the world of big data, ___ aids organizations in processing and analyzing large volumes of multistructured data. Examples including indexing and search, graph analysis, etc.
o MapReduce
- In the Alternative Data for Market Analysis of Forecasts case study, satellite data was NOT used for
o Monitoring individual customer patterns
- In the Twitter case study, how did influential users support their tweets?
o Objective data
- In open source databases, the most important performance enhancement to data is the cost based ___
o Optimizer
- Big data employs ___ processing techniques and nonrelational data storage capabilities in order to process unstructured and semistructured data
o Parallel
- Which of the following sources is likely to produce big data the fastest?
o RFID tags
- In the financial services industry, big data can be used to improve
o Regulatory oversight and decision making
- In a Hadoop "stack", what node periodically replicates and stores data from the Name Node should it fail?
o Secondary node
- In the energy industry, ___ grids are one of the most impactful applications of stream analytics
o Smart
- Companies with the largest revenues from big data tend to be
o The largest computer and IT services firms
- Traditional data warehouses have not been able to keep up with
o The variety and complexity of data
- A job ___ is a node in a Hadoop cluster that initiates and coordinates MapReduce jobs, or the processing of the data
o Tracker
- Under which of the following requirements would it be more appropriate to use Hadoop over a data warehouse?
o Unrestricted, ungoverned sandbox explorations
- What is the Hadoop Distributed File System (HDFS) designed to handle?
o Unstructured and semistructured non-relational data
- The ___ of big data is its potential to contain more useful patterns and interest anomalies than "small" data
o Value proposition
- Data flows can be highly inconsistent, with periodic peaks, making data loads hard to manage. What is this feature of big data called?
o Variability
- Organizations are working with data that meets the three V's - variety, volume and ___ characteristics
o Velocity
- HBase is a nonrelational ___ that allows for low latency, quick lookups in Hadoop
o Database
- All of the following statements about MapReduce are true EXCEPT
o MapReduce runs without fault tolerances
- The ___ Node in Hadoop cluster provides client information on where in the cluster particular data is stored and if any nodes fail
o Name
- HBase, Cassandra, MongoDB, and Accumulo are examples of ___ databases
o NoSQL
- Big data simplifies data governance issues, especially for global firms.
False
- Big data uses commodity hardware, which is expensive, specialized hardware that is custom built for a client or application.
False
- Hadoop and MapReduce require each other to work.
False
- In most cases, Hadoop is used to replace data warehouses.
False
- In the Salesforce case study, streaming data is used to identify services that customers use most.
False
- Social media mentions can be used to chart and predict flu outbreaks.
True
- The quality and objectivity of information disseminated by influential users of Twitter is higher than that disseminated by noninfluential users.
True
- The term "big data" is relative as it depends on the size of the using organization.
True
- There is a clear difference between the type of information support provided by influential users versus the others on Twitter.
True