CHAPTER 6

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

1. market research 2. social media 3. census data 4. election databases

*APPLICATION CASE 6.4: BIG DATA AND ANALYTICS IN POLITICS The inputs to the analytic system include:

NoSQL

*____ is a technology that is used to store and process large volumes of unstructured, semi-structured, and multi-structured data

Hadoop

*____ is an open source framework for processing, storing, and analyzing massive amounts of distributed unstructured data

variability

*____ means that data can be highly inconsistent, with periodic peaks, making data loads hard to manage

velocity

*____ refers to both how fast data is being produced and how fast the data must be processed to meet the need or demand

veracity

*____ refers to the conformity to facts: accuracy, quality, truthfulness, or trustworthiness of big data

MapReduce + Hadoop

*_____ + ______ = Big data core technology

NoSQL

*_____ can handle big data better than traditional relational database technology

value proposition

*_____ is big data's potential to contain more useful patterns and interesting anomalies than small data

fault tolerance

Hadoop data is replicated across multiple nodes, allowing for _____ in the system

open source

Hadoop is an _____ with hundreds of contributors continuously improve the core technology

petabytes and exabytes, multiple

Hadoop is designed to handle _____ of data distributed over ____ nodes in parallel

Apache Software foundation

Hadoop is now part of ______

commodity machines, internet

Hadoop typically uses ______ connected via the _____

MapReduce

Hadoop utilizes the ______ framework to implement distributed parallelism

1. Hadoop distributed file system (HDFS) 2. name node 3. secondary node 4. job tracker 5. slave nodes

Hadoop's main components:

Web

Hadoop, with their distributed file system and flexibility of data formats, is advantageous when working with information commonly found on the _____

relational database technologies

High-performance computing differs from regular analytics which tend to focus on ______

1. in-memory analytics 2. in-database analytics 3. grid computing 4. appliances

High-performance computing includes:

high-performance computing

In order to keep up with the computational needs of big data, a number of new and innovative analytics computational techniques and platforms have been developed. These techniques are collectively called ______

1. as the repository and refinery of raw data 2. as an active archive of historical data

In terms of its use cases, Hadoop is differentiated in 2 ways:

patient monitoring

In the health services industry, the biggest potential source of big data comes from _______

data in-motion analytics or real-time data analytics

Stream analytics is sometimes called ______

map, single

The ____ function in MapReduce breaks a problem into sub-problems, which can each be processed by _____ nodes in parallel.

HDFS

The ____ is the default storage layer in any given Hadoop cluster

reduce

The _____ function in MapReduce merges the results from each of the nodes into the final results.

job tracker

The _____ is the node of a Hadoop cluster that initiates and coordinates MapReduce jobs or the processing of the data.

rapidly

The big data vendor landscape is developing very ______

not capable, new breed technologies

The traditional means for capturing, storing, and analyzing data are ____ of dealing with big data effectively and efficiently and so a ______ are needed to take on the big data.

data scientist

"The sexiest job of the 21st century"

classic smart grid

*A use case in the energy industry for stream analytics is a _______ application for the electric power supply chain.

data integration

*APPLICATION CASE 6.1: BIG DATA ANALYTICS HELPS LUXOTTICA In the Luxottica case study, the technique the company uses to gain visibility into its customers is ______

includes everything they can find about their customer interactions

*APPLICATION CASE 6.1: BIG DATA ANALYTICS HELPS LUXOTTICA What does big data mean to Luxottica?

*Luxottica did not outsource their data storage and promotional campaign development and management, nor did they merge with companies in Asia.* A 10% improvement in marketing effectiveness Identifies the highest-value customer out of nearly 100 mil.

*APPLICATION CASE 6.1: BIG DATA ANALYTICS HELPS LUXOTTICA What were the results?

can handle the high volume, high variability, continuously streaming data that trading banks need to deal with

*APPLICATION CASE 6.2: TOP 5 INVESTMENT BANK How can Big data benefit large-scale trading banks?

single source of the truth

*APPLICATION CASE 6.2: TOP 5 INVESTMENT BANK This case illustrates an excellent example in the banking industry, where disparate data sources are integrated into a big data infrastructure to achieve a _______

moved many old disparate systems to a new unified system

*APPLICATION CASE 6.2: TOP 5 INVESTMENT BANK What was the solution?

had a relational database and as data volumes and variability increased, their system was not fast enough to respond to growing needs and requirements

*APPLICATION CASE 6.2: TOP 5 INVESTMENT BANK What was their challenge?

1.real-time access to trading data 2. new alert feature 3. less downtime for maintenance 4. much faster capacity to process complex changes 5. reduced operations costs

*APPLICATION CASE 6.2: TOP 5 INVESTMENT BANK What were the obtained results/benefits?

did not integrate into a single big data center infrastructure

*APPLICATION CASE 6.3: EBAY'S BIG DATA SOLUTION What did eBay not do for their solution?

developed a multi-data center deployment using NoSQL and Hadoop a scale-out architecture that enables them to deploy multiple DataStax Enterprise clusters

*APPLICATION CASE 6.3: EBAY'S BIG DATA SOLUTION What was eBay's solution?

experiencing explosive data growth and needed a solution that did not have the typical bottlenecks, scalability issues, and transactional constraints

*APPLICATION CASE 6.3: EBAY'S BIG DATA SOLUTION What were the challenges?

-more cost effective -much faster -reliability and fault tolerance greatly enhanced

*APPLICATION CASE 6.3: EBAY'S BIG DATA SOLUTION What were the obtained results?

needed the ability to turn the enormous volumes of data it generates into useful insights for customers

*APPLICATION CASE 6.3: EBAY'S BIG DATA SOLUTION Why did Ebay need a big data solution?

Could have made a difference in the outcome of the 2008 and 2012 elections because many people agree that the Democrats had a clear competitive advantage in using big data over the republicans

*APPLICATION CASE 6.4: BIG DATA AND ANALYTICS IN POLITICS Could big data change the outcome of an election?

1. voter mobilization 2. organize movements 3. increase number of volunteers 4. raise money contributions

*APPLICATION CASE 6.4: BIG DATA AND ANALYTICS IN POLITICS The analytic system outputs or goals include:

can help predict election outcomes as well as targeting potential voters and donors; has become a critical part of political campaigns

*APPLICATION CASE 6.4: BIG DATA AND ANALYTICS IN POLITICS What is the role of analytics and big data in modern day politics?

geospatial data, timetables

*APPLICATION CASE 6.5: DUBLIN CITY COUNCIL By integrating _____ from buses and data on bus ____ into a central geographic information system, you can create a digital map of the city.

ease traffic problems and better understand the traffic network

*APPLICATION CASE 6.5: DUBLIN CITY COUNCIL For the Dublin case, big data analytics were used primarily to ________

operators look at which buses are on time and which are delayed and when they are delayed focus on trying to relieve congestion in the area

*APPLICATION CASE 6.5: DUBLIN CITY COUNCIL How can big data analytics help ease the traffic problem in big cities?

good picture of traffic in the city from a high-level perspective

*APPLICATION CASE 6.5: DUBLIN CITY COUNCIL The major problem with Dublin was the difficulty in getting a _________

teamed up with IBM and created a digital map of the city overlaid with real-time positions of their buses

*APPLICATION CASE 6.5: DUBLIN CITY COUNCIL What was their solution?

operators gained the ability to see the system as a whole instead of just individual corridors

*APPLICATION CASE 6.5: DUBLIN CITY COUNCIL What were the obtained results?

Splunk

*APPLICATION CASE 6.7: TURNING MACHINE-GENERATED STREAMING DATA INTO VALUABLE BUSINESS INSIGHTS The US telecommunications company selected to work with _____, one of the leading analytics service providers in the area of turning machine-generated streaming data into valuable insights

threat assessments and security monitoring

*APPLICATION CASE 6.7: TURNING MACHINE-GENERATED STREAMING DATA INTO VALUABLE BUSINESS INSIGHTS The US telecommunications company's use of stream analytics via dashboards has helped to improve the effectiveness of the company's _______

stream analytics

*APPLICATION CASE 6.7: TURNING MACHINE-GENERATED STREAMING DATA INTO VALUABLE BUSINESS INSIGHTS The company uses ____ to boost customer satisfaction and competitive advantage

application troubleshooting operations compliance security

*APPLICATION CASE 6.7: TURNING MACHINE-GENERATED STREAMING DATA INTO VALUABLE BUSINESS INSIGHTS What were some of the beneficial results?

1. MapReduce 2. Hadoop 3. NoSQL

*Big Data technologies include:

a clear business need

*Business investments ought to be made for the good of the business, not for the sake of mere technology advancements. Therefore the main driver for Big Data analytics should be an alignment with the vision and the strategy and at any level-strategic, tactical, and operations. Which of the critical success factors for Big Data analytics is being described?

*1. Skill availability (data scientists are in short supply)* 2. data volume 3. data integration 4. processing capabilities 5. data governance 6. solution cost

*Challenges of big data analytics:

*1. A CLEAR BUSINESS NEED (ALIGNMENT WITH THE VISION AND THE STRATEGY)* *2. STRONG, COMMITTED SPONSORSHIP (EXECUTIVE CHAMPION)* *3. A FACT-BASED DECISION-MAKING CULTURE* 4. Alignment between the business and IT strategy 5. A strong data infrastructure 6. the right analytics tools 7. Right people with right skills

*Critical success factors for big data analytics:

Doug Cutting at Yahoo

*Hadoop was originally created by ______

a fact-based decision-making culture

*In __________, the numbers rather than intuition, gut feeling, or supposition drive decision making. There is also a culture of experimentation to see what works and doesn't. To create ________, senior management needs to do the following: recognize that some people can't or won't adjust; be a vocal supporter; stress that outdated methods must be discontinued; ask to see what analytics went into decisions; link incentives and compensation to desired behaviors.

a strong, committed sponsorship

*It is a well-known fact that if you don't have committed executive backing, it is difficult (if not impossible) to succeed. If the scope is a single or a few analytical applications, the support can be at the departmental level. However, if the target is enterprise-wide organizational transformation, which is often the case for Big Data initiatives, _____________________ needs to be at the highest levels and organization-wide. Which one best Critical Success Factor for Big Data Analytics best fills the blank in the previous sentence?

volume

*Many factors contributed to the exponential increase in data _____, such as transaction-based data stored through the years, text data constantly streaming in from social media, increasing amounts of sensor data being collected, automatically generated RFID and GPS data, and so forth.

machines

*Most big data is generated by ______

new style

*NoSQL is a _____ of database

Geneva, Switzerland

*OPENING VIGNETTE: BIG DATA MEETS BIG SCIENCE AT CERN Cern is located near _____

particle physics

*OPENING VIGNETTE: BIG DATA MEETS BIG SCIENCE AT CERN Cern is the world's largest ______ laboratory.

volume

*The most common trait of big data is ____

massive volumes of data

*Traditionally, "big data"=____

A. an increasingly challenging task for today's enterprises B. is not a new technological fad, rather, it's a business priority

*Using data to understand customers/clients and business operations to sustain and foster growth and profitability is:

variety

*___ means that data comes in all types of formats

simplify

*____ because it is hard to keep track of all of the new data base vendors, open source projects, and Big Data service providers. It will be even more crowded and complicated in the years ahead.

visualize

*______ because according to leading analytics research companies like Forrester and Gartner, enterprises find advanced data visualization platforms to be essential tools that enable them to monitor business, find patterns, and take action to avoid threats and snatch opportunities.

govern

*______ because data it has always been a challenging issue in IT, and it is getting even more puzzling with the advent of Big Data.

MapReduce

*______ is a technique popularized by Google that distributes the processing of very large multi-structured data files across a large cluster of ordinary machines/computer processors.

empower

*_______ because Big Data and self-service business intelligence go hand in hand. Across a range of uses - from tackling new business problems, developing entirely new products and services, finding actionable intelligence in less than an hour, and blending data from disparate sources - Big Data has fired the imagination of what is possible through the application of analytics.

integrate

*_______ because blending data from disparate sources for your organization is an essential part of Big Data Analytics. Organizations that can blend different relational, semi-structured, and raw data sources in real time, without expensive up-front costs, will be the ones that get the best value from Data.

coexist

*_______ because using the strengths of each database platform and enabling them to work together in your organization's data architecture are essential. There is ample literature that talks about the necessity of maintaining and nurturing synchronicity of traditional data warehouses with the capabilities of new platforms

evangelize

*_______ because with the backing of one or more executive sponsors, future business graduates from LSU E.J. Ourso College of Business like yourself can get the ball rolling and instill a virtuous cycle: The more departments in your organization realize actionable benefits, the more pervasive analytics becomes across your organization. Fast, easy-to-use visual analytics is the key that opens the door to organization-wide analytics adoption and collaboration.

perpetual analytics

*_______ evaluates every incoming observations against all prior observations when analyzing big data in the context of intelligent systems and recognizing how the new observation relates to all prior observations enables the discovery of real-time insights

stream

A ____ can be thought of as an unbounded flow or sequence of data elements, arriving continuously at high velocity

name node

A _____ is a node in a Hadoop cluster that provides the client information on where in the cluster particular data is stored and if any nodes fail

deployed the Customer Intelligence Application for IBM Business Partner

APPLICATION CASE 6.1: BIG DATA ANALYTICS HELPS LUXOTTICA What was their proposed solution?

Because Luxottica outsourced both data storage and promotional campaign development and management, there was a disconnect between data analytics and marketing execution

APPLICATION CASE 6.1: BIG DATA ANALYTICS HELPS LUXOTTICA What were the main challenges?

online marketplace

APPLICATION CASE 6.3: EBAY'S BIG DATA SOLUTION Ebay is the world's largest ____

1. social services 2. tax collection 3. sanitation services 4. environmental management 5. crime prevention 6. management of police and fire departments

APPLICATION CASE 6.5: DUBLIN CITY COUNCIL What are areas that large cities could greatly benefit from big data analytics?

value proposition

Although all of the Vs are important characteristics, ____ is probable the most important for decision makers

historical data

Because Hadoop can handle such huge volumes of data, ____ can be managed easily with this approach

worthless

Big data by itself is _____ unless business users do something with it that delivers value to the organization.

velocity

Big data concerns about _____ are especially prevalent when dealing with streams

misnomer

Big data is a _____

1. volume 2. variety 3. velocity 4. veracity 5. variability 6. value proposition

Big data is characterized by these traits:

value

Big data plus "big" analytics yields ____

80-85%

By some estimates, ______ of all organizations' data is in some sort of unstructured or semi-structured format

1. process efficiency and cost reduction 2. brand management 3. revenue maximization, cross-selling, and up-selling 4. enhanced customer experience 5. churn identification, customer recruiting 6. improved customer service 7. identifying new products and market opportunities 8. risk management 9. regulatory compliance 10. enhanced security capabilities

Common business problems addressed by big data:

happening in real time

Critical event processing relates to stream analytics because the events are ________

soft skills

Data scientists are expected to have _____ such as creativity, curiosity, communication/interpersonal skills, domain expertise, problem definition skills, and managerial skills.

business, communication, & technical current business analytics

Data scientists use a combination of their ______ skills to investigate big data looking for ways to improve ______ and hence to improve decisions for new business opportunities.

1. sensor data 2. computer network traffic 3. phone conversations 4. ATM transactions 5. web searches 6. financial data

Examples of data streams include:

1. e-commerce 2. telecommunications 3. law enforcement 4. cyber security 5. the power industry *6. health services* 7. the government

Examples of industries that benefit from stream analytics:

distributed and parallelized

MapReduce is a programming model that allows the processing of large-scale data analysis problems to be ______

ACID -atomicity -consistency -isolation -durability

NoSQL databases trade _____ compliance for performance and scalability

Not Only SQL

NoSQL stands for what?

investigates and looks for new possibilities

One of the biggest differences between a data scientist and a business intelligence user is that a data scientist __________, while a BI user analyzes existing business situations and operations

expertise in both technical and business application domains

One of the most sought-out characteristics of a data scientist is _______

70%

Organizations with Big Data over ______ are more likely than other organizations to have BI/BA projects that are driven primarily by the business community, not by the IT group.

They take advantage of commodity hardware to enable scale-out, parallel processing techniques; employ nonrelational data storage capabilities in order to process unstructured and semistructured data; and apply advanced analytics and data visualization technology to Big Data to convey insights to end users.

What are the common characteristics of emerging Big Data technologies?

variety and complexity

What has changed the landscape between data warehousing and big data in recent years is the _____ of data, which made data warehousing incapable of keeping up

achieving high performance with "simple" computers

What is the goal of MapReduce?

definition and the structure

What's new is that the _____ of big data constantly changes.

LSU ISDS Masters of Science in Analytics

Where do data scientists come from?

NoSQL

____ databases are mostly aimed at serving up discrete data stored among large volumes of multi-structured data to end-user and automated big data applications

stream analytics

____ is the process of extracting actionable information from continuously flowing data

secondary nodes

_____ are backup for name nodes in a Hadoop cluster

slave nodes

_____ in a Hadoop cluster store data and take direction to process it from the job tracker

critical event processing

_____ involves combining data from multiple sources to infer events or patterns of interest

data stream mining

_____ is the process of extracting novel patterns and knowledge structures from continuous, rapid data records.

querying

______ for data in the Hadoop distributed system is accomplished via MapReduce

critical event processing

______ is a method of capturing, tracking, and analyzing streams of data to detect events (out of normal happenings) of certain types that are worthy of the effort

data scientist

a person with skills to investigate big data is called a what?

big data guru

data scientist = ______


Ensembles d'études connexes

AP Psych: Unit 14: Social Psychology

View Set

Chapter 14: Antineoplastic Agents

View Set

AP World: 1450-1750 (units 3 & 4)

View Set

P4- CH. 43 Pediatric Emergencies

View Set

Quiz on Negotiable Instruments, Credit and Bankruptcy

View Set

Appendicular skeleton - lower limb

View Set

Maternal-Health Ch. 5: Health promotion for the developing child

View Set

Chapter 29: Growth and Development of the Adolescent

View Set

B&G Chapter 4 Practice Questions

View Set