Chapter 7

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

When considering Big Data projects and architecture, list and describe five challenges designers should be mindful of in order to make the journey to analytics competency less stressful.

- Data Volume : - Data Integration : - Processing Capabilities : - Data Governance : - Skill availability : - Solution Cost

Why are some portions of tape backup workloads being redirected to Hadoop clusters today?

- difficulty of retrieval , data that is stored offline takes long to retrieve , tape formats change over time , and are prone to loss of data . There is a value in keeping historical data online and accessable.

A newly popular unit of data in the Big Data era is the petabyte (PB), which is

10^15 bytes

Big Data simplifies data governance issues, especially for global firms.

False

In most cases, Hadoop is used to replace data warehouses.

False

________ speeds time to insights and enables better data governance by performing data integration and analytic functions inside the database.

In-database analytics

How does Hadoop work?

It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers.

In the world of Big Data, ________ aids organizations in processing and analyzing large volumes of multistructured data. Examples include indexing and search, graph analysis, etc.

MapReduce

Define MapReduce.

MapReduce is a programming model that is used to process and generate big datasets with a parallel distributed algorithm.

All of the following statements about MapReduce are true EXCEPT

MapReduce runs without fault tolerance

HBase, Cassandra, MongoDB, and Accumulo are examples of ________ databases.

NoSQL

Big Data is being driven by the exponential growth, availability, and use of information.

True

Social media mentions can be used to chart and predict flu outbreaks.

True

In-motion ________ is often overlooked today in the world of BI and Big Data.

analytics

In the financial services industry, Big Data can be used to improve

both A & B

As volumes of Big Data arrive from multiple sources such as sensors, machines, social media, and clickstream interactions, the first step is to ________ all the data reliably and cost effectively.

capture

In a network analysis, what connects nodes?

edges

Big Data comes from ________.

everywhere

In the Alternative Data for Market Analysis or Forecasts case study, satellite data was NOT used for

monitoring individual customer patterns

In the Twitter case study, how did influential users support their tweets?

objective data

In a Hadoop "stack," what node periodically replicates and stores data from the Name Node should it fail?

secondary node

In the energy industry, ________ grids are one of the most impactful applications of stream analytics.

smart

Companies with the largest revenues from Big Data tend to be

the largest computer and IT services firms.

A job ________ is a node in a Hadoop cluster that initiates and coordinates MapReduce jobs, or the processing of the data.

tracker

Under which of the following requirements would it be more appropriate to use Hadoop over a data warehouse?

unrestricted, ungoverned sandbox explorations

What is the Hadoop Distributed File System (HDFS) designed to handle?

unstructured and semistructured non-relational data

Organizations are working with data that meets the three V's-variety, volume, and ________ characterizations.

velocity

What is NoSQL as used for Big Data? Describe its major downsides.

NoSQL is a new form of databases that processes and stores unstructured data that is not in a tabular format . NoSQL is high performance and highly scalable . The downside is that they trade ACID compliance for performance and scalability.

Satellite data can be used to evaluate the activity at retail locations as a source of alternative data.

True

Hadoop was designed to handle petabytes and exabytes of data distributed over multiple nodes in parallel.

True

In a Hadoop "stack," what is a slave node?

a node where data is stored and processed

Big Data employs ________ processing techniques and nonrelational data storage capabilities in order to process unstructured and semistructured data.

parallel

Traditional data warehouses have not been able to keep up with

the variety and complexity of data

Hadoop is primarily a(n) ________ file system and lacks capabilities we'd associate with a DBMS, such as indexing, random access to data, and support for SQL.

distributed

As the size and the complexity of analytical systems increase, the need for more ________ analytical systems is also increasing to obtain the best performance.

efficient

Allowing Big Data to be processed in memory and distributed across a dedicated set of nodes can solve complex problems in near-real time with highly accurate insights. What is this process called?

in-memory analytics

List and describe four of the most critical success factors for Big Data analytics.

1) A clear business need : Business investments need to be made for the better good of the company , not for the sake of technology advancment. The driver should be a need of the business. 2) Strong committed sponsorship : If you don't have strong executive sponsorship it is difficult to succeed with Big data analytics. 3) Alignment between the business and IT strategy : Ensuring that the analytical work is supporting the business strategy 4) A fact based decision making culture : A fact based decision making culture the numbers and not the gut feelings are followed.

List and briefly discuss the three characteristics that define and make the case for data warehousing.

1) Data warehouse performance , Cost based optimizer. 2) Integrating data that provides business value to answer business questions. 3) Interactive BI tools give access to data warehouse.

In the Salesforce case study, streaming data is used to identify services that customers use most.

False

In the opening vignette, why was the Telecom company so concerned about the loss of customers, if customer churn is common in that industry?

The loss was at such a high rate . Th company had been losing customer faster than gaining customers . It was identified that the lost of customers could be traced back to customer service interactions.

MapReduce can be easily understood by skilled programmers due to its procedural nature.

True

The quality and objectivity of information disseminated by influential users of Twitter is higher than that disseminated by noninfluential users.

True

The term "Big Data" is relative as it depends on the size of the using organization.

True

Which Big Data approach promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources?

grid computing

Current total storage capacity lags behind the digital information being generated in the world.

True

It is important for Big Data and self-service business intelligence to go hand in hand to get maximum value from analytics.

True

The problem of forecasting economic activity or microclimates based on a variety of data beyond the usual retail data is a very recent phenomenon and has led to another buzzword — ________.

alternative data

Using data to understand customers/clients and business operations to sustain and foster growth and profitability is

an increasingly challenging task for today's enterprises.

What is Big Data's relationship to the cloud?

Amazon and Google have working Hadoop cloud offerings

________ bring together hardware and software in a physical unit that is not only fast but also scalable on an as-needed basis.

Appliances

________ of data provides business value; pulling of data from multiple subject areas and numerous applications into one repository is the raison d'être for data warehouses.

Integration

The ________ Node in a Hadoop cluster provides client information on where in the cluster particular data is stored and if any nodes fail.

Name

HBase is a nonrelational ________ that allows for low-latency, quick lookups in Hadoop.

database

In the Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse case study, what was the analytic goal?

determine differences in rates of disease in urban and rural populations

List and describe the three main "V"s that characterize Big Data.

Velocity : The speed at which data is being stored Volume : The volume at which data is being stored Variety : How fast data is being produced and how fast data must be processed.

________ refers to the conformity to facts: accuracy, quality, truthfulness, or trustworthiness of the data.

Veracity

In open-source databases, the most important performance enhancement to date is the cost-based ________.

optimizer

The ________ of Big Data is its potential to contain more useful patterns and interesting anomalies than "small" data.

value proposition

Data flows can be highly inconsistent, with periodic peaks, making data loads hard to manage. What is this feature of Big Data called?

variability

There is a clear difference between the type of information support provided by influential users versus the others on Twitter.

True

Which of the following sources is likely to produce Big Data the fastest?

RFID tags

Describe data stream mining and how it is used.

Data stream mining is the process of extracting novel patterns and knowledge structures from continuous , rapid data records . A data stream is a continues flow of ordered instances that in many applications of data stream mining can be read/processed only one time or small number of times. In many data stream mining applications the goal is to predict the class or value of new instances in the stream .

Big Data uses commodity hardware, which is expensive, specialized hardware that is custom built for a client or application.

False

Hadoop and MapReduce require each other to work.

False

What are the differences between stream analytics and perpetual analytics? When would you use one or the other?

Stream analytics : Appling the transaction level logic to a real-time observation. The rule applies to these observations take into account previous observation as long as they accured in a prescribed window. The windows have to have a size. Perpetual analytics: Evaluates every incoming observation against all prior observation and there is no window size. I would use stream analytics when the transactional volume is high and time to decision is to short, favoring nonpresistance and smaller window sizes.

Despite their potential, many current NoSQL tools lack mature management and monitoring tools.

True

For low latency, interactive reports, a data warehouse is preferable to Hadoop.

True

If you have many flexible programming languages running in parallel, Hadoop is preferable to a data warehouse.

True

In Application Case 7.6, Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse, it was found that urban individuals have a higher number of diagnosed disease conditions.

True

In the opening vignette, the Access Telecom (AT), built a system to better visualize customers who were unhappy before they canceled their service.

True


Ensembles d'études connexes

Externalities & Public Goods-- Microecon Chapter 8

View Set

Modern dental assisting chapter 5, Modern Dental Assisting Chapter 5, Modern Dental Assisting Chapter 5 Terms, Modern Dental Assisting Ch 5 Review, Modern Dental Assisting Chapter 22, Modern Dental Assisting - Chapter 21 and 22 study guide, Modern De...

View Set

XCEL Chapter Exam - Life Premiums and Benefits

View Set

Analytical skills: Qualities and characteristics associated with solving problems using facts.

View Set

EXam Fx Taxes, Retirement, and other Insurance Concepts 5-13-19

View Set