Chapter 8- Understanding Big Data & It's Impact on Business
Business Focus Areas of Big Data
-Data mining -Data analysis -Data visualization
text analysis
-analyzes unstructured data to find trends & patterns in words & sentences -text mining a firm's customer support email might identify which customer service representative is best able to handle the question, allowing the system to forward it to the right person
big data
-collection of large, complex data sets, including structured & unstructured data -cannot be analyzed using traditional database methods & tools
1. Variety
-diff forms of data -different forms of structured & unstructured data -data from spreadsheets & databases as well as from email, videos, photos, pdf, all of which have to be analyzed
Data-Mining Techniques
-estimation analysis -affinity grouping analysis -cluster analysis -classification analysis
speech anlysis
-process of analyzing recorded calls to gather info; brings structure to customer interactions & exposes info buried in customer contact center interactions w/ an enterprise -heavily used in the customer service department to help improve processes by identifying angry customers & routing them to the appropriate customer service representative
unstructured data examples
-satellite images -photographic data -video data -social media data -text messages -voice mail data
3. Volume
-scale of data -includes enormous volumes of data generated daily -massive volume created by machines & networks -big data tools necessary to analyze zettabytes & brontobytes
structured data examples
-sensor data -Weblog data -financial data -click-stream data -point of sale data -accounting data
regression model
-statistical process for estimating the relationships among variables -include many techniques for modeling & analyzing several variables when the focus is on the relationship btw a dependent variable & one/ more independent variables
forecasting model
-time-series info is time stamped info collected at a particular frequency -forecasts are predictions based on time-series info allowing users to manipulate the time series for forecasting activities
Virtualization examples
-traditional computing environment: application, operating system, server -virtualized computing environment: multiple applications, operating system, server
2. Veracity
-uncertainty of data, including biases, noise, & abnormalities -uncertainty of untrustworthiness of data -data must be meaningful to the problem being analyzed -must keep data clean & implement processes to keep dirty data from accumulating in systems
Data-Mining Process Model Overview
1. Business understanding 2. Data understanding 3. Data preparation 4. Data modeling 5. Evaluation 6. Deployment
4 Common Characteristics of Big Data
1. Variety 2. Veracity 3. Volume 4. Velocity
Three Elements of Data Mining
1. data 2. discovery 3. deployment
Classification analysis example
Age -Young->student->yes/no -Old->credit score->yes/no
IoT
Internet of Things
recommendation engine
a data mining algorithm that analyzes a customer's purchases & actions on a website & then uses the data to recommend complementary products
2. Data understanding
analysis of all current data along w/ identifying & data quality issues & activities include -gather data -describe data -explore data -verify data quality
Techniques used by data scientist to perform big data advanced analytics
analytics include: -behavioral analysis -correlation analysis -exploratory data analysis -pattern recognition analysis -social media analysis -speech analysis -text analysis -web analysis
5. Evaluation
analyze the trends & patterns to assess the potential for solving the business problem & activities include: -evaluate results -review process -determine next steps
social media anlysis
analyzes text flowing across the internet, including unstructured text from blogs & messages
web analysis
analyzes unstructured data associated w/ websites to identify consumer behavior & website navigation
fast data
application of big data analytics to smaller data sets in near-real/ real-time in order to solve a problem/ create business value
4. Data modeling
apply mathematical techniques to identify trends & patterns in the data & activities include: -select modeling technique -design tests -build models
data artist
business analytics specialist who uses visual tools to help ppl understand complex data
pattern recognition analysis
classification/labeling of an identified pattern in the machine learning process
cube
common term for the representation of multidimensional info
virtualization
creation of a virtual version of computing resources, such as operating system, a server, a storage device, or network resources
outlier
data value that is numerically distant from most of the other data points in a set of data
6. Deployment
deploy the discoveries tot he org for work in everyday business & activities include -plan deployment -monitor deployment -analyze results -review final reports
data visualization
describes technologies that allow users to see/visualize data to transform info into a business perpective
correlation analysis
determines a statistical relationship btw variables, often for the purpose of identifying predictive factors among the variables
estimation analysis
determines values for an unknown continuous variable behavior/estimated future value
market basket analysis
evaluates such items as websites & checkout scanner info to detect to customers' buying behavior & predict future behavior by identifying affinities among customers' choices of products & services
data scientist
extracts knowledge from data by performing statistical analysis, data mining, & advanced analytics on big data to identify trends, market changes, & other relevant indo
1. data
foundation for data-directed decision making
1. Business understanding
gain a clear understanding of the business problem that must be solved & how it impacts the company & activities include: -identify business goals -situation assessment -define data-mining goals -create project plan
3. Data preparation
gather & organize data in the correct formats & structures for analysis & activities include: -select data -cleanse data -integrate data -format data
exploratory data analysis
identifies patterns in data, including outliers, uncovering the underlying structure to understand relationships btw the variables
infographics
information graphics-present the results of data analysis, displaying the patterns, relationships, & trends in a graphical format
M2M
machine-to-machine communication
algorithms
mathematical formulas placed in software that performs an analysis on a data set
data visualization tools
move beyond Excel graphs & charts into sophisticated analysis techniques such as controls, instruments, maps, time-series graphs, & more
analysis paralysis
occurs when the user goes into an emotional state of over-analysis (or over-thinking) a situation so that a decision/action is never taken
Data mining modeling techniques for predictions
prediction models (3) 1. optimization model 2. forecasting model 3. regression model
data mining
process of analyzing data to extract info not offered by the raw data alone & uncovers patterns & trends for business analysis such as -analyzing customer buying patterns to predict future marketing & promotion campaigns -building budgets & other financial info -detecting fraud by identifying deceptive spending patterns -finding the best customers who spend the most money -keeping customers from leaving/migrating to competitors -promoting & hiring employees to ensure success for both the company & the individual
data profiling
process of collecting statistics & info about data in an existing source
2. discovery
process of identifying new patterns, trends, & insights
anomaly detection
process of identifying rare/unexpected items/events in a data set that do not conform to other items in the data set
3. deployment
process of implementing discoveries to drive success
classification analysis
process of org. data into categories of groups for its most effective & efficient use (groups of political affiliation & charity donors)
data replication
process of sharing info to ensure consistency btw multiple data sources
distributed computing
processes & manages algorithms across many machines in a computing environment
affinity grouping analysis
reveals the relationship btw variables along w/ the nature & frequency of the relationships
analytics
science of fact-based decision making
Distributed Computing Environment
servers connect to the internet<-->distributed computing environment<-->computer desktops
prediction
statement abt what will happen/might happen in the future, for ex, predicting future sales/employee turnover
optimization model
statistical process that finds the way to make a design, system/decision as effective as possible, for ex, finding the values of controllable variables that determine maximal productivity/minimal waste
cluster analysis
technique used to divide an info set into mutually exclusive groups such that the members of each group are as close together as possible to one another & the diff groups are as far apart as possible
Data Mining Process Model Activities
the 6 phases
business intelligence dashboards
track corporate metrics such as critical success factors & key performance indicators & include advanced capabilities such as interactive controls, allowing users to manipulate data for analysis
data mining tools
use a variety of techniques to find patterns & relationships in large volumes of info that predict future behavior & guide decision making
behavioral analysis
using data abt people's behaviors to understand intent & predict future actions