BIA Chapter 7 Big Data Concepts and tools

Ace your homework & exams now with Quizwiz!

High performance computing to keep up with computational needs of Big data (4)

In memory analytics In database analytics Grid computing Appliance

Applications of stream analytics (7)

e-Commerce Telecommunications Law Enforcement and cybersecurity Power industry Financial services Health sciences Government

MapReduce

technique popularized by Google that distributes the processing of vary large multi structured data files across a large cluster of machines

How does Hadoop work? It breaks the data up into ...

"parts" which are then loaded into a file system made up of multiple nodes running on commodity

Hadoop Pros (3) -Allows enterprises to... -Enterprises no longer must ... - x to get started with Hadoop

-Allows enterprises to process and analyze large volumes of unstructured and semistructured data -Enterprises no longer must rely on sample data sets but can process and analyze all relevant data - Inexpensive to get started with Hadoop

Most critical success factor for Big data analytics (5) - Alignment between the business and IT strategy (2) Essential to make sure that... Analytics should play the....

-Essential to make sure that the analytics work is always support the business strategy -Analytics should play the enabling role in successfully executing the business strategy

The V's that define big data - Velocity (2) how x data is ... most x ....

-how fast data is being produced and how fast the data must be processed to meet the need or demand - most overlooked characteristic of Big data

Problems that can be addressed using Big Data analytics (10)

-process efficiency and cost reduction -Brand management -Revenue maximization -Enhance customer experience -Churn identification, customer recruitiing -Improved customer service -Identifying new products, market opportunities -Risk management -Regulatory compliance -Enhanced security capabilities

Most critical success factor for Big data analytics (5)

1. A clear business need (alignment with the vision and the strategy) 2. Strong, committed sponsorship (executive champion) 3. Alignment between the business and IT strategy 4. A fact based decision making culture 5. A strong data infrastructure

How to succeed with Big Data (7)

1. Simplify 2. Coexist 3. Visualize 4. Empower 5. Integrate 6.Govern 7. Evangelize

Coexistence of hadoop and data warehouse (5)

1. Use hadoop for strong and archiving multistructure data 2. Use hadoop for filter, transforming, and/or consolidating multistructured data 3. Use hadoop to analyze large volumes of multistructured data and publish the analytical results 4. Use a relational DBMS that provides MapReduce capabilities as an investigative computing platform 5. Use a front end query tool to access and analyze data

Critical event processing

A method of capturing, tracking, and analyzing streams of data to detect events (out of normal happenings) of certain types that are worthy of the efforts

Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6) - Data volume

Ability to capture, store, and process a huge volume of data at an acceptable speed so latest info is available to decision makers

Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6) - Data integration

Ability to combine data that is not similar in structure or source and to do so quickly and at a reasonable cost

Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6) - Data governance

Ability to keep up with the security, privacy, ownership and quality issues of Big data

Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6) - Processing capabilities

Ability to process data quickly as it is captured

High performance computing to keep up with computational needs of Big data (4) - Appliances

Brings together hardware and software in a physical unit that is not only fast but also scalable on an as needed basis

Use cases for Hadoop -Differentiators (2) - Hadoop is the repository and refinery for raw data.

Capture all the data reliably and cost effectively Hadoop refine raw data

The V's that define big data - variety (2) Comes in all... x% of all ....

Comes in all types of formats 80 - 85% of all organizations' data are in some sort of unstructured or semistructured format

Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6) - solution cost

Crucial to reduce the cost of the solutions used to find the value

The V's that define big data - variability (2) Data flows.... X peak data loads....

Data flows can be highly inconsistent with periodic peaks Daily, seasonal, and event triggered peak data loads can be highly variable and this challenging to manage

Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6)

Data volume Data integration Processing capabilities Data governance Skills availability Solution cost

Use cases for data warehousing (3)

Data warehouse performance Integrating data that provides business values Interactive BI tools

Hadoop technical components (5) - Hadoop distributed file system (HDFS)

Default storage layer in any given Hadoop cluster

Advantage of MapReduce

Develops do not have to be concerned implementing parallel computing- this is handled transparently by the system

Perpetual analytics

Evaluates every incoming observation against all prior observation, where there is no window size. Recognizing how the new observation relates to all prior observations enables the discovery of real time insight

Hadoop technical components (5)

Hadoop distributed file system (HDFS) Name Node Secondary node Job tracker Slave nodes

Use cases for Hadoop -Differentiators (2)

Hadoop is the repository and refinery for raw data. Hadoop is a powerful, economical, and active archive.

Hadoop cons (3) -are x and x -x and x Hadoop clusters and performing ... -a X of x developers

Immature and still developing Implementing and managing Hadoop clusters and performing advanced analytics on large volumes of unstructured data require significant expertise, skill and training A dearth of Hadoop developers and take advantage of complex Hadoop clusters

Map Reduce is a X model ...

Map reduce is a programming model, not a programming language, that is, it is designed to be used by programmers, rather than business user.

Big data technologies (3)

Mapreduce Hadoop NoSQL

The V's that define big data - Value proposition (3) Contains more ... Organization can ... X insights and x decisions...

Orgs can gain greater business value that they may not have otherwise Greater insight and better decision, something that every organization needs

NoSQL

Process large volumes of multistructured data

High performance computing to keep up with computational needs of Big data (4) - Grid computing

Promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources

Most critical success factor for Big data analytics (5) - A fact based decision making culture To create a fact based decision making culture senior management needs to (5) Recognize.. Be a ... Stress that ... Ask to see .... Link....

Recognize that some people can't or won't adjust Be a vocal supporter Stress that outdated methods must be discontinued Ask to see what analytics went into decisions Link incentives and compensation to desired behaviors

The V's that define big data - Veracity (3) Refers to ... Tools and techniques are often used to handle ...

Refers to conformity to facts : accuracy, quality, truthfulness, or trustworthiness of the data Tools and techniques are often used to handle Big Data's veracity by transforming the data into quality and trustworthy insights

Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6) - Skills availability

Shortage of people with skills to do job

High performance computing to keep up with computational needs of Big data (4) - In memory analytics

Solves complex problems in near real time with highly accurate insights by allowing analytics computations

High performance computing to keep up with computational needs of Big data (4) - In database analytics

Speeds time to insights and enables better data governance by performing data integration and anlytics functions inside the DB so you won't have to move or convert data repeatedly

Most critical success factor for Big data analytics (5) - Strong, committed sponsorship (executive champion) Sponsorship needs to be ....

Sponsorship needs to be at the highest levels and organization wide

Most critical success factor for Big data analytics (5) - A strong data infrastructure Success requires ...

Success requires marrying the old with the new for a holistic infrastructure that works synergistically

Stream Analytics

Term commonly used for the analytics process of extracting actionable information from continuously flowing/streaming data.

Most critical success factor for Big data analytics (5) - A fact based decision making culture X drive decision making

The numbers not intuition or gut feelings driver decision making

The V's that define big data (6)

Volume Variety Velocity Veracity Variability Value proposition

Map Reduce - High performance is ...

achieved by breaking the processing into small units of work that can be run in parallel across the hundreds, potentially thousands, of nodes in the cluster

Hadoop distributed file system

adept at storing large volumes of unstructured and semistructured data as they do not required data to be organized into relational rows and columns

NoSQL databases are ...

aimed for the most part, at serving up discrete data stored among large volumes of multi structured data to end user and automated Big Data applications

Hadoop

an open source framework for processing, storing, and analyzing massive amounts of distributed, unstructured data

The procedural nature of MapReduce makes it ...

easily understood by skilled programmers

How does Hadoop work? A client accesses unstructured and semistructured data ...

from sources including log files, social media feeds, and internal data stores

Hadoop designed to ...

handle petabytes and exabytes of data distributed over multiple nodes in parallel

Big data exceeds the reach of commonly used ....

hardware environments and/or capabilities of software tools to capture, manage and process it within a tolerable time span for its user population

The V's that define big data - Volume issues

how to determine relevance how to create value from data that is deemed to be relevant

Data stream analytics

in motion analytics

Why use MapReduce? MapReduce aids organizations in ...

in processing and analyzing large volumes of multi structured data

Hadoop clusters run on ...

inexpensive commodity hardware so projects can scale out without breaking the bank

Hadoop technical components (5) - job tracker

initiates and coordinates Mapreduce jobs or the processing of the data

Big data has been used to describe the ...

massive volumes of data analyzed by huge organizations like Google or research science projects at nasa

The V's that define big data - Volume

most common trait of Big Data

Hadoop technical components (5) - Secondary node

periodically replicates and stores data from the name node

Business problems addressed by big data analytics - Top business problems (2)

process efficiency cost reduction

Hadoop technical components (5) - Name Node

provides the client info on where the cluster particular data is stored and if any nodes fail

Most critical success factor for Big data analytics (5) - A clear business need (alignment with the vision and the strategy) Main driver for big data analytics ...

should be the needs for the business, at any level- strategic, tactical, and operations

NoSQL capability is ...

sorely lacking from relational database technology, which simply can't maintain needed application performance levels at a Big Data scale

Hadoop technical components (5) - slaves nodes

store data and take direction to process it from job tracker

Critical event processing is a application of ...

stream analytics that combines data from multiple sources to infer events or patterns of interest either before they actually occur or as soon as they happen

Critical event processing goal

take rapid action to prevent these events from occurring or in the case of a short window of opportunity take full advantage within the allowed time

Big data has become a popular ...

term to describe the exponential growth, availability and use of information, both structure and unstructured

Big data is not new. What is new is ....

the definition and the structure of Big Data constantly change

Data stream mining

the process of extracting novel patterns and knowledge structures from continuous, rapid data records.

NoSQL of downside

trade ACID (atomicity, consistency, isolation, durability) compliance for performance and scalability


Related study sets

Declaration of Independence Grievances

View Set

Cost Accounting Final (Learning Catalytic's)

View Set

Mental Health Exam 1 Prep-U Ch's 2, 5, 6, 7, 8, 10, 13, 14, 15, 16, 17, 18, 20, 24

View Set

Types of Underwriting Commitments

View Set