CHAPTER 6
1. market research 2. social media 3. census data 4. election databases
*APPLICATION CASE 6.4: BIG DATA AND ANALYTICS IN POLITICS The inputs to the analytic system include:
NoSQL
*____ is a technology that is used to store and process large volumes of unstructured, semi-structured, and multi-structured data
Hadoop
*____ is an open source framework for processing, storing, and analyzing massive amounts of distributed unstructured data
variability
*____ means that data can be highly inconsistent, with periodic peaks, making data loads hard to manage
velocity
*____ refers to both how fast data is being produced and how fast the data must be processed to meet the need or demand
veracity
*____ refers to the conformity to facts: accuracy, quality, truthfulness, or trustworthiness of big data
MapReduce + Hadoop
*_____ + ______ = Big data core technology
NoSQL
*_____ can handle big data better than traditional relational database technology
value proposition
*_____ is big data's potential to contain more useful patterns and interesting anomalies than small data
fault tolerance
Hadoop data is replicated across multiple nodes, allowing for _____ in the system
open source
Hadoop is an _____ with hundreds of contributors continuously improve the core technology
petabytes and exabytes, multiple
Hadoop is designed to handle _____ of data distributed over ____ nodes in parallel
Apache Software foundation
Hadoop is now part of ______
commodity machines, internet
Hadoop typically uses ______ connected via the _____
MapReduce
Hadoop utilizes the ______ framework to implement distributed parallelism
1. Hadoop distributed file system (HDFS) 2. name node 3. secondary node 4. job tracker 5. slave nodes
Hadoop's main components:
Web
Hadoop, with their distributed file system and flexibility of data formats, is advantageous when working with information commonly found on the _____
relational database technologies
High-performance computing differs from regular analytics which tend to focus on ______
1. in-memory analytics 2. in-database analytics 3. grid computing 4. appliances
High-performance computing includes:
high-performance computing
In order to keep up with the computational needs of big data, a number of new and innovative analytics computational techniques and platforms have been developed. These techniques are collectively called ______
1. as the repository and refinery of raw data 2. as an active archive of historical data
In terms of its use cases, Hadoop is differentiated in 2 ways:
patient monitoring
In the health services industry, the biggest potential source of big data comes from _______
data in-motion analytics or real-time data analytics
Stream analytics is sometimes called ______
map, single
The ____ function in MapReduce breaks a problem into sub-problems, which can each be processed by _____ nodes in parallel.
HDFS
The ____ is the default storage layer in any given Hadoop cluster
reduce
The _____ function in MapReduce merges the results from each of the nodes into the final results.
job tracker
The _____ is the node of a Hadoop cluster that initiates and coordinates MapReduce jobs or the processing of the data.
rapidly
The big data vendor landscape is developing very ______
not capable, new breed technologies
The traditional means for capturing, storing, and analyzing data are ____ of dealing with big data effectively and efficiently and so a ______ are needed to take on the big data.
data scientist
"The sexiest job of the 21st century"
classic smart grid
*A use case in the energy industry for stream analytics is a _______ application for the electric power supply chain.
data integration
*APPLICATION CASE 6.1: BIG DATA ANALYTICS HELPS LUXOTTICA In the Luxottica case study, the technique the company uses to gain visibility into its customers is ______
includes everything they can find about their customer interactions
*APPLICATION CASE 6.1: BIG DATA ANALYTICS HELPS LUXOTTICA What does big data mean to Luxottica?
*Luxottica did not outsource their data storage and promotional campaign development and management, nor did they merge with companies in Asia.* A 10% improvement in marketing effectiveness Identifies the highest-value customer out of nearly 100 mil.
*APPLICATION CASE 6.1: BIG DATA ANALYTICS HELPS LUXOTTICA What were the results?
can handle the high volume, high variability, continuously streaming data that trading banks need to deal with
*APPLICATION CASE 6.2: TOP 5 INVESTMENT BANK How can Big data benefit large-scale trading banks?
single source of the truth
*APPLICATION CASE 6.2: TOP 5 INVESTMENT BANK This case illustrates an excellent example in the banking industry, where disparate data sources are integrated into a big data infrastructure to achieve a _______
moved many old disparate systems to a new unified system
*APPLICATION CASE 6.2: TOP 5 INVESTMENT BANK What was the solution?
had a relational database and as data volumes and variability increased, their system was not fast enough to respond to growing needs and requirements
*APPLICATION CASE 6.2: TOP 5 INVESTMENT BANK What was their challenge?
1.real-time access to trading data 2. new alert feature 3. less downtime for maintenance 4. much faster capacity to process complex changes 5. reduced operations costs
*APPLICATION CASE 6.2: TOP 5 INVESTMENT BANK What were the obtained results/benefits?
did not integrate into a single big data center infrastructure
*APPLICATION CASE 6.3: EBAY'S BIG DATA SOLUTION What did eBay not do for their solution?
developed a multi-data center deployment using NoSQL and Hadoop a scale-out architecture that enables them to deploy multiple DataStax Enterprise clusters
*APPLICATION CASE 6.3: EBAY'S BIG DATA SOLUTION What was eBay's solution?
experiencing explosive data growth and needed a solution that did not have the typical bottlenecks, scalability issues, and transactional constraints
*APPLICATION CASE 6.3: EBAY'S BIG DATA SOLUTION What were the challenges?
-more cost effective -much faster -reliability and fault tolerance greatly enhanced
*APPLICATION CASE 6.3: EBAY'S BIG DATA SOLUTION What were the obtained results?
needed the ability to turn the enormous volumes of data it generates into useful insights for customers
*APPLICATION CASE 6.3: EBAY'S BIG DATA SOLUTION Why did Ebay need a big data solution?
Could have made a difference in the outcome of the 2008 and 2012 elections because many people agree that the Democrats had a clear competitive advantage in using big data over the republicans
*APPLICATION CASE 6.4: BIG DATA AND ANALYTICS IN POLITICS Could big data change the outcome of an election?
1. voter mobilization 2. organize movements 3. increase number of volunteers 4. raise money contributions
*APPLICATION CASE 6.4: BIG DATA AND ANALYTICS IN POLITICS The analytic system outputs or goals include:
can help predict election outcomes as well as targeting potential voters and donors; has become a critical part of political campaigns
*APPLICATION CASE 6.4: BIG DATA AND ANALYTICS IN POLITICS What is the role of analytics and big data in modern day politics?
geospatial data, timetables
*APPLICATION CASE 6.5: DUBLIN CITY COUNCIL By integrating _____ from buses and data on bus ____ into a central geographic information system, you can create a digital map of the city.
ease traffic problems and better understand the traffic network
*APPLICATION CASE 6.5: DUBLIN CITY COUNCIL For the Dublin case, big data analytics were used primarily to ________
operators look at which buses are on time and which are delayed and when they are delayed focus on trying to relieve congestion in the area
*APPLICATION CASE 6.5: DUBLIN CITY COUNCIL How can big data analytics help ease the traffic problem in big cities?
good picture of traffic in the city from a high-level perspective
*APPLICATION CASE 6.5: DUBLIN CITY COUNCIL The major problem with Dublin was the difficulty in getting a _________
teamed up with IBM and created a digital map of the city overlaid with real-time positions of their buses
*APPLICATION CASE 6.5: DUBLIN CITY COUNCIL What was their solution?
operators gained the ability to see the system as a whole instead of just individual corridors
*APPLICATION CASE 6.5: DUBLIN CITY COUNCIL What were the obtained results?
Splunk
*APPLICATION CASE 6.7: TURNING MACHINE-GENERATED STREAMING DATA INTO VALUABLE BUSINESS INSIGHTS The US telecommunications company selected to work with _____, one of the leading analytics service providers in the area of turning machine-generated streaming data into valuable insights
threat assessments and security monitoring
*APPLICATION CASE 6.7: TURNING MACHINE-GENERATED STREAMING DATA INTO VALUABLE BUSINESS INSIGHTS The US telecommunications company's use of stream analytics via dashboards has helped to improve the effectiveness of the company's _______
stream analytics
*APPLICATION CASE 6.7: TURNING MACHINE-GENERATED STREAMING DATA INTO VALUABLE BUSINESS INSIGHTS The company uses ____ to boost customer satisfaction and competitive advantage
application troubleshooting operations compliance security
*APPLICATION CASE 6.7: TURNING MACHINE-GENERATED STREAMING DATA INTO VALUABLE BUSINESS INSIGHTS What were some of the beneficial results?
1. MapReduce 2. Hadoop 3. NoSQL
*Big Data technologies include:
a clear business need
*Business investments ought to be made for the good of the business, not for the sake of mere technology advancements. Therefore the main driver for Big Data analytics should be an alignment with the vision and the strategy and at any level-strategic, tactical, and operations. Which of the critical success factors for Big Data analytics is being described?
*1. Skill availability (data scientists are in short supply)* 2. data volume 3. data integration 4. processing capabilities 5. data governance 6. solution cost
*Challenges of big data analytics:
*1. A CLEAR BUSINESS NEED (ALIGNMENT WITH THE VISION AND THE STRATEGY)* *2. STRONG, COMMITTED SPONSORSHIP (EXECUTIVE CHAMPION)* *3. A FACT-BASED DECISION-MAKING CULTURE* 4. Alignment between the business and IT strategy 5. A strong data infrastructure 6. the right analytics tools 7. Right people with right skills
*Critical success factors for big data analytics:
Doug Cutting at Yahoo
*Hadoop was originally created by ______
a fact-based decision-making culture
*In __________, the numbers rather than intuition, gut feeling, or supposition drive decision making. There is also a culture of experimentation to see what works and doesn't. To create ________, senior management needs to do the following: recognize that some people can't or won't adjust; be a vocal supporter; stress that outdated methods must be discontinued; ask to see what analytics went into decisions; link incentives and compensation to desired behaviors.
a strong, committed sponsorship
*It is a well-known fact that if you don't have committed executive backing, it is difficult (if not impossible) to succeed. If the scope is a single or a few analytical applications, the support can be at the departmental level. However, if the target is enterprise-wide organizational transformation, which is often the case for Big Data initiatives, _____________________ needs to be at the highest levels and organization-wide. Which one best Critical Success Factor for Big Data Analytics best fills the blank in the previous sentence?
volume
*Many factors contributed to the exponential increase in data _____, such as transaction-based data stored through the years, text data constantly streaming in from social media, increasing amounts of sensor data being collected, automatically generated RFID and GPS data, and so forth.
machines
*Most big data is generated by ______
new style
*NoSQL is a _____ of database
Geneva, Switzerland
*OPENING VIGNETTE: BIG DATA MEETS BIG SCIENCE AT CERN Cern is located near _____
particle physics
*OPENING VIGNETTE: BIG DATA MEETS BIG SCIENCE AT CERN Cern is the world's largest ______ laboratory.
volume
*The most common trait of big data is ____
massive volumes of data
*Traditionally, "big data"=____
A. an increasingly challenging task for today's enterprises B. is not a new technological fad, rather, it's a business priority
*Using data to understand customers/clients and business operations to sustain and foster growth and profitability is:
variety
*___ means that data comes in all types of formats
simplify
*____ because it is hard to keep track of all of the new data base vendors, open source projects, and Big Data service providers. It will be even more crowded and complicated in the years ahead.
visualize
*______ because according to leading analytics research companies like Forrester and Gartner, enterprises find advanced data visualization platforms to be essential tools that enable them to monitor business, find patterns, and take action to avoid threats and snatch opportunities.
govern
*______ because data it has always been a challenging issue in IT, and it is getting even more puzzling with the advent of Big Data.
MapReduce
*______ is a technique popularized by Google that distributes the processing of very large multi-structured data files across a large cluster of ordinary machines/computer processors.
empower
*_______ because Big Data and self-service business intelligence go hand in hand. Across a range of uses - from tackling new business problems, developing entirely new products and services, finding actionable intelligence in less than an hour, and blending data from disparate sources - Big Data has fired the imagination of what is possible through the application of analytics.
integrate
*_______ because blending data from disparate sources for your organization is an essential part of Big Data Analytics. Organizations that can blend different relational, semi-structured, and raw data sources in real time, without expensive up-front costs, will be the ones that get the best value from Data.
coexist
*_______ because using the strengths of each database platform and enabling them to work together in your organization's data architecture are essential. There is ample literature that talks about the necessity of maintaining and nurturing synchronicity of traditional data warehouses with the capabilities of new platforms
evangelize
*_______ because with the backing of one or more executive sponsors, future business graduates from LSU E.J. Ourso College of Business like yourself can get the ball rolling and instill a virtuous cycle: The more departments in your organization realize actionable benefits, the more pervasive analytics becomes across your organization. Fast, easy-to-use visual analytics is the key that opens the door to organization-wide analytics adoption and collaboration.
perpetual analytics
*_______ evaluates every incoming observations against all prior observations when analyzing big data in the context of intelligent systems and recognizing how the new observation relates to all prior observations enables the discovery of real-time insights
stream
A ____ can be thought of as an unbounded flow or sequence of data elements, arriving continuously at high velocity
name node
A _____ is a node in a Hadoop cluster that provides the client information on where in the cluster particular data is stored and if any nodes fail
deployed the Customer Intelligence Application for IBM Business Partner
APPLICATION CASE 6.1: BIG DATA ANALYTICS HELPS LUXOTTICA What was their proposed solution?
Because Luxottica outsourced both data storage and promotional campaign development and management, there was a disconnect between data analytics and marketing execution
APPLICATION CASE 6.1: BIG DATA ANALYTICS HELPS LUXOTTICA What were the main challenges?
online marketplace
APPLICATION CASE 6.3: EBAY'S BIG DATA SOLUTION Ebay is the world's largest ____
1. social services 2. tax collection 3. sanitation services 4. environmental management 5. crime prevention 6. management of police and fire departments
APPLICATION CASE 6.5: DUBLIN CITY COUNCIL What are areas that large cities could greatly benefit from big data analytics?
value proposition
Although all of the Vs are important characteristics, ____ is probable the most important for decision makers
historical data
Because Hadoop can handle such huge volumes of data, ____ can be managed easily with this approach
worthless
Big data by itself is _____ unless business users do something with it that delivers value to the organization.
velocity
Big data concerns about _____ are especially prevalent when dealing with streams
misnomer
Big data is a _____
1. volume 2. variety 3. velocity 4. veracity 5. variability 6. value proposition
Big data is characterized by these traits:
value
Big data plus "big" analytics yields ____
80-85%
By some estimates, ______ of all organizations' data is in some sort of unstructured or semi-structured format
1. process efficiency and cost reduction 2. brand management 3. revenue maximization, cross-selling, and up-selling 4. enhanced customer experience 5. churn identification, customer recruiting 6. improved customer service 7. identifying new products and market opportunities 8. risk management 9. regulatory compliance 10. enhanced security capabilities
Common business problems addressed by big data:
happening in real time
Critical event processing relates to stream analytics because the events are ________
soft skills
Data scientists are expected to have _____ such as creativity, curiosity, communication/interpersonal skills, domain expertise, problem definition skills, and managerial skills.
business, communication, & technical current business analytics
Data scientists use a combination of their ______ skills to investigate big data looking for ways to improve ______ and hence to improve decisions for new business opportunities.
1. sensor data 2. computer network traffic 3. phone conversations 4. ATM transactions 5. web searches 6. financial data
Examples of data streams include:
1. e-commerce 2. telecommunications 3. law enforcement 4. cyber security 5. the power industry *6. health services* 7. the government
Examples of industries that benefit from stream analytics:
distributed and parallelized
MapReduce is a programming model that allows the processing of large-scale data analysis problems to be ______
ACID -atomicity -consistency -isolation -durability
NoSQL databases trade _____ compliance for performance and scalability
Not Only SQL
NoSQL stands for what?
investigates and looks for new possibilities
One of the biggest differences between a data scientist and a business intelligence user is that a data scientist __________, while a BI user analyzes existing business situations and operations
expertise in both technical and business application domains
One of the most sought-out characteristics of a data scientist is _______
70%
Organizations with Big Data over ______ are more likely than other organizations to have BI/BA projects that are driven primarily by the business community, not by the IT group.
They take advantage of commodity hardware to enable scale-out, parallel processing techniques; employ nonrelational data storage capabilities in order to process unstructured and semistructured data; and apply advanced analytics and data visualization technology to Big Data to convey insights to end users.
What are the common characteristics of emerging Big Data technologies?
variety and complexity
What has changed the landscape between data warehousing and big data in recent years is the _____ of data, which made data warehousing incapable of keeping up
achieving high performance with "simple" computers
What is the goal of MapReduce?
definition and the structure
What's new is that the _____ of big data constantly changes.
LSU ISDS Masters of Science in Analytics
Where do data scientists come from?
NoSQL
____ databases are mostly aimed at serving up discrete data stored among large volumes of multi-structured data to end-user and automated big data applications
stream analytics
____ is the process of extracting actionable information from continuously flowing data
secondary nodes
_____ are backup for name nodes in a Hadoop cluster
slave nodes
_____ in a Hadoop cluster store data and take direction to process it from the job tracker
critical event processing
_____ involves combining data from multiple sources to infer events or patterns of interest
data stream mining
_____ is the process of extracting novel patterns and knowledge structures from continuous, rapid data records.
querying
______ for data in the Hadoop distributed system is accomplished via MapReduce
critical event processing
______ is a method of capturing, tracking, and analyzing streams of data to detect events (out of normal happenings) of certain types that are worthy of the effort
data scientist
a person with skills to investigate big data is called a what?
big data guru
data scientist = ______