ISBA 510 - Chapter 7

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

- In a network analysis, what connects nodes?

o Edges

- Current total storage capacity lags behind the digital information being generated in the world.

True

- Despite their potential, many current NoSQL tools lack mature management and monitoring tools.

True

- For low latency, interactive reports, a data warehouse is preferable to Hadoop.

True

- It is important for big data and self service business intelligence to go hand in hand to get maximum value from analytics.

True

- MapReduce can be easily understood by skilled programmers due to its procedural nature.

True

- Satellite data can be used to evaluate the activity at retail locations as a source of alternative data.

True

- A newly popular unit of data in the big data era is the petabyte (PB), which is

o 10^15 bytes

- In a Hadoop "stack", what is a slave node?

o A node where data is stored and processed

- The problem of forecasting economic activity or microclimates based on a variety of data beyond the usual retail data is a very recent phenomenon and has led to another buzzword ___

o Alternative data

- What is big data's relationship to the cloud?

o Amazon and Google have working Hadoop cloud offerings

- Using data to understand customers/clients and business operations to sustain and foster growth and profitability is

o An increasingly challenging task for today's enterprises

- In motion ____ is often overlooked today in the world of BI and big data

o Analytics

- ___ bring together hardware and software in a physical unit that is not only fast but also scalable on an as needed basis

o Appliances

- As volumes of big data arrive from multiple sources such as sensors, machines, social media, and clickstream interactions, the first step is to ___ all the data reliably and cost effectively

o Capture

- Big data comes from

o Everywhere

- Which big data approach promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources?

o Grid computing

- ___ speeds time to insights and enables better data governance by performing data integration and analytics functions inside the database

o In database analytics

- Allowing big data to be processed in memory and distributed across a dedicated set of nodes can solve complex problems on near real time with highly accurate insights. What is this process called?

o In memory analytics

- ____ refers to the conformity to facts: accuracy, quality, truthfulness, or trustworthiness of the data

o Veracity

- Big data is being driven by exponential growth, availability, and use of information.

True

- Hadoop was designed to handle petabytes and exabytes of data distributed over multiple nodes in parallel.

True

- If you have many flexible programming languages running in parallel, Hadoop is preferable to a data warehouse.

True

- In Application Case 7.6, Analyzing Disease Patterns from an Electric Medical Records Data Warehouse, it was found that urban individuals have a higher number of diagnosed disease conditions.

True

- In the opening vignette, the Access Telecom (AT), built a system to better visualize customers who were unhappy before they canceled their service.

True

- In the Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse case study, what was the analytic goal?

o Determine differences in rates of disease in urban and rural populations

- Hadoop is primarily a ___ file system and lacks capabilities we'd associate with a DBMS, such as indexing, random access to data, and support for SQL

o Distributed

- As the size and complexity of analytics systems increase, the need for more ___ analytics systems is also increasing to obtain the best performance

o Efficient

- ___ of data provides business value: pulling data from multiple subject areas and numerous applications into one repository in the raison d'etre for data warehouses

o Integration

- How does Hadoop work?

o It breaks up big data into multiple parts so each part can be processed and analyzed at the same time on multiple computers

- In the world of big data, ___ aids organizations in processing and analyzing large volumes of multistructured data. Examples including indexing and search, graph analysis, etc.

o MapReduce

- In the Alternative Data for Market Analysis of Forecasts case study, satellite data was NOT used for

o Monitoring individual customer patterns

- In the Twitter case study, how did influential users support their tweets?

o Objective data

- In open source databases, the most important performance enhancement to data is the cost based ___

o Optimizer

- Big data employs ___ processing techniques and nonrelational data storage capabilities in order to process unstructured and semistructured data

o Parallel

- Which of the following sources is likely to produce big data the fastest?

o RFID tags

- In the financial services industry, big data can be used to improve

o Regulatory oversight and decision making

- In a Hadoop "stack", what node periodically replicates and stores data from the Name Node should it fail?

o Secondary node

- In the energy industry, ___ grids are one of the most impactful applications of stream analytics

o Smart

- Companies with the largest revenues from big data tend to be

o The largest computer and IT services firms

- Traditional data warehouses have not been able to keep up with

o The variety and complexity of data

- A job ___ is a node in a Hadoop cluster that initiates and coordinates MapReduce jobs, or the processing of the data

o Tracker

- Under which of the following requirements would it be more appropriate to use Hadoop over a data warehouse?

o Unrestricted, ungoverned sandbox explorations

- What is the Hadoop Distributed File System (HDFS) designed to handle?

o Unstructured and semistructured non-relational data

- The ___ of big data is its potential to contain more useful patterns and interest anomalies than "small" data

o Value proposition

- Data flows can be highly inconsistent, with periodic peaks, making data loads hard to manage. What is this feature of big data called?

o Variability

- Organizations are working with data that meets the three V's - variety, volume and ___ characteristics

o Velocity

- HBase is a nonrelational ___ that allows for low latency, quick lookups in Hadoop

o Database

- All of the following statements about MapReduce are true EXCEPT

o MapReduce runs without fault tolerances

- The ___ Node in Hadoop cluster provides client information on where in the cluster particular data is stored and if any nodes fail

o Name

- HBase, Cassandra, MongoDB, and Accumulo are examples of ___ databases

o NoSQL

- Big data simplifies data governance issues, especially for global firms.

False

- Big data uses commodity hardware, which is expensive, specialized hardware that is custom built for a client or application.

False

- Hadoop and MapReduce require each other to work.

False

- In most cases, Hadoop is used to replace data warehouses.

False

- In the Salesforce case study, streaming data is used to identify services that customers use most.

False

- Social media mentions can be used to chart and predict flu outbreaks.

True

- The quality and objectivity of information disseminated by influential users of Twitter is higher than that disseminated by noninfluential users.

True

- The term "big data" is relative as it depends on the size of the using organization.

True

- There is a clear difference between the type of information support provided by influential users versus the others on Twitter.

True


Kaugnay na mga set ng pag-aaral

HY 121 Chapter 15 - Reconstruction

View Set

Psychosocial Chapter 16 (pre-lecture)

View Set

PsychMental Health Practice Test

View Set

The Call of the Wild Chapter 4 Vocab

View Set

Ch. 7: Fraud, Internal Control, and Cash

View Set

Fundamentals and Technical Analysis

View Set