BIA Chapter 7 Big Data Concepts and tools
High performance computing to keep up with computational needs of Big data (4)
In memory analytics In database analytics Grid computing Appliance
Applications of stream analytics (7)
e-Commerce Telecommunications Law Enforcement and cybersecurity Power industry Financial services Health sciences Government
MapReduce
technique popularized by Google that distributes the processing of vary large multi structured data files across a large cluster of machines
How does Hadoop work? It breaks the data up into ...
"parts" which are then loaded into a file system made up of multiple nodes running on commodity
Hadoop Pros (3) -Allows enterprises to... -Enterprises no longer must ... - x to get started with Hadoop
-Allows enterprises to process and analyze large volumes of unstructured and semistructured data -Enterprises no longer must rely on sample data sets but can process and analyze all relevant data - Inexpensive to get started with Hadoop
Most critical success factor for Big data analytics (5) - Alignment between the business and IT strategy (2) Essential to make sure that... Analytics should play the....
-Essential to make sure that the analytics work is always support the business strategy -Analytics should play the enabling role in successfully executing the business strategy
The V's that define big data - Velocity (2) how x data is ... most x ....
-how fast data is being produced and how fast the data must be processed to meet the need or demand - most overlooked characteristic of Big data
Problems that can be addressed using Big Data analytics (10)
-process efficiency and cost reduction -Brand management -Revenue maximization -Enhance customer experience -Churn identification, customer recruitiing -Improved customer service -Identifying new products, market opportunities -Risk management -Regulatory compliance -Enhanced security capabilities
Most critical success factor for Big data analytics (5)
1. A clear business need (alignment with the vision and the strategy) 2. Strong, committed sponsorship (executive champion) 3. Alignment between the business and IT strategy 4. A fact based decision making culture 5. A strong data infrastructure
How to succeed with Big Data (7)
1. Simplify 2. Coexist 3. Visualize 4. Empower 5. Integrate 6.Govern 7. Evangelize
Coexistence of hadoop and data warehouse (5)
1. Use hadoop for strong and archiving multistructure data 2. Use hadoop for filter, transforming, and/or consolidating multistructured data 3. Use hadoop to analyze large volumes of multistructured data and publish the analytical results 4. Use a relational DBMS that provides MapReduce capabilities as an investigative computing platform 5. Use a front end query tool to access and analyze data
Critical event processing
A method of capturing, tracking, and analyzing streams of data to detect events (out of normal happenings) of certain types that are worthy of the efforts
Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6) - Data volume
Ability to capture, store, and process a huge volume of data at an acceptable speed so latest info is available to decision makers
Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6) - Data integration
Ability to combine data that is not similar in structure or source and to do so quickly and at a reasonable cost
Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6) - Data governance
Ability to keep up with the security, privacy, ownership and quality issues of Big data
Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6) - Processing capabilities
Ability to process data quickly as it is captured
High performance computing to keep up with computational needs of Big data (4) - Appliances
Brings together hardware and software in a physical unit that is not only fast but also scalable on an as needed basis
Use cases for Hadoop -Differentiators (2) - Hadoop is the repository and refinery for raw data.
Capture all the data reliably and cost effectively Hadoop refine raw data
The V's that define big data - variety (2) Comes in all... x% of all ....
Comes in all types of formats 80 - 85% of all organizations' data are in some sort of unstructured or semistructured format
Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6) - solution cost
Crucial to reduce the cost of the solutions used to find the value
The V's that define big data - variability (2) Data flows.... X peak data loads....
Data flows can be highly inconsistent with periodic peaks Daily, seasonal, and event triggered peak data loads can be highly variable and this challenging to manage
Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6)
Data volume Data integration Processing capabilities Data governance Skills availability Solution cost
Use cases for data warehousing (3)
Data warehouse performance Integrating data that provides business values Interactive BI tools
Hadoop technical components (5) - Hadoop distributed file system (HDFS)
Default storage layer in any given Hadoop cluster
Advantage of MapReduce
Develops do not have to be concerned implementing parallel computing- this is handled transparently by the system
Perpetual analytics
Evaluates every incoming observation against all prior observation, where there is no window size. Recognizing how the new observation relates to all prior observations enables the discovery of real time insight
Hadoop technical components (5)
Hadoop distributed file system (HDFS) Name Node Secondary node Job tracker Slave nodes
Use cases for Hadoop -Differentiators (2)
Hadoop is the repository and refinery for raw data. Hadoop is a powerful, economical, and active archive.
Hadoop cons (3) -are x and x -x and x Hadoop clusters and performing ... -a X of x developers
Immature and still developing Implementing and managing Hadoop clusters and performing advanced analytics on large volumes of unstructured data require significant expertise, skill and training A dearth of Hadoop developers and take advantage of complex Hadoop clusters
Map Reduce is a X model ...
Map reduce is a programming model, not a programming language, that is, it is designed to be used by programmers, rather than business user.
Big data technologies (3)
Mapreduce Hadoop NoSQL
The V's that define big data - Value proposition (3) Contains more ... Organization can ... X insights and x decisions...
Orgs can gain greater business value that they may not have otherwise Greater insight and better decision, something that every organization needs
NoSQL
Process large volumes of multistructured data
High performance computing to keep up with computational needs of Big data (4) - Grid computing
Promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources
Most critical success factor for Big data analytics (5) - A fact based decision making culture To create a fact based decision making culture senior management needs to (5) Recognize.. Be a ... Stress that ... Ask to see .... Link....
Recognize that some people can't or won't adjust Be a vocal supporter Stress that outdated methods must be discontinued Ask to see what analytics went into decisions Link incentives and compensation to desired behaviors
The V's that define big data - Veracity (3) Refers to ... Tools and techniques are often used to handle ...
Refers to conformity to facts : accuracy, quality, truthfulness, or trustworthiness of the data Tools and techniques are often used to handle Big Data's veracity by transforming the data into quality and trustworthy insights
Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6) - Skills availability
Shortage of people with skills to do job
High performance computing to keep up with computational needs of Big data (4) - In memory analytics
Solves complex problems in near real time with highly accurate insights by allowing analytics computations
High performance computing to keep up with computational needs of Big data (4) - In database analytics
Speeds time to insights and enables better data governance by performing data integration and anlytics functions inside the DB so you won't have to move or convert data repeatedly
Most critical success factor for Big data analytics (5) - Strong, committed sponsorship (executive champion) Sponsorship needs to be ....
Sponsorship needs to be at the highest levels and organization wide
Most critical success factor for Big data analytics (5) - A strong data infrastructure Success requires ...
Success requires marrying the old with the new for a holistic infrastructure that works synergistically
Stream Analytics
Term commonly used for the analytics process of extracting actionable information from continuously flowing/streaming data.
Most critical success factor for Big data analytics (5) - A fact based decision making culture X drive decision making
The numbers not intuition or gut feelings driver decision making
The V's that define big data (6)
Volume Variety Velocity Veracity Variability Value proposition
Map Reduce - High performance is ...
achieved by breaking the processing into small units of work that can be run in parallel across the hundreds, potentially thousands, of nodes in the cluster
Hadoop distributed file system
adept at storing large volumes of unstructured and semistructured data as they do not required data to be organized into relational rows and columns
NoSQL databases are ...
aimed for the most part, at serving up discrete data stored among large volumes of multi structured data to end user and automated Big Data applications
Hadoop
an open source framework for processing, storing, and analyzing massive amounts of distributed, unstructured data
The procedural nature of MapReduce makes it ...
easily understood by skilled programmers
How does Hadoop work? A client accesses unstructured and semistructured data ...
from sources including log files, social media feeds, and internal data stores
Hadoop designed to ...
handle petabytes and exabytes of data distributed over multiple nodes in parallel
Big data exceeds the reach of commonly used ....
hardware environments and/or capabilities of software tools to capture, manage and process it within a tolerable time span for its user population
The V's that define big data - Volume issues
how to determine relevance how to create value from data that is deemed to be relevant
Data stream analytics
in motion analytics
Why use MapReduce? MapReduce aids organizations in ...
in processing and analyzing large volumes of multi structured data
Hadoop clusters run on ...
inexpensive commodity hardware so projects can scale out without breaking the bank
Hadoop technical components (5) - job tracker
initiates and coordinates Mapreduce jobs or the processing of the data
Big data has been used to describe the ...
massive volumes of data analyzed by huge organizations like Google or research science projects at nasa
The V's that define big data - Volume
most common trait of Big Data
Hadoop technical components (5) - Secondary node
periodically replicates and stores data from the name node
Business problems addressed by big data analytics - Top business problems (2)
process efficiency cost reduction
Hadoop technical components (5) - Name Node
provides the client info on where the cluster particular data is stored and if any nodes fail
Most critical success factor for Big data analytics (5) - A clear business need (alignment with the vision and the strategy) Main driver for big data analytics ...
should be the needs for the business, at any level- strategic, tactical, and operations
NoSQL capability is ...
sorely lacking from relational database technology, which simply can't maintain needed application performance levels at a Big Data scale
Hadoop technical components (5) - slaves nodes
store data and take direction to process it from job tracker
Critical event processing is a application of ...
stream analytics that combines data from multiple sources to infer events or patterns of interest either before they actually occur or as soon as they happen
Critical event processing goal
take rapid action to prevent these events from occurring or in the case of a short window of opportunity take full advantage within the allowed time
Big data has become a popular ...
term to describe the exponential growth, availability and use of information, both structure and unstructured
Big data is not new. What is new is ....
the definition and the structure of Big Data constantly change
Data stream mining
the process of extracting novel patterns and knowledge structures from continuous, rapid data records.
NoSQL of downside
trade ACID (atomicity, consistency, isolation, durability) compliance for performance and scalability