ISYS ch 8
Forecasting model
Timer series info is time stamped info collected at a particular frequency. Forecasts are predictions based on time series info allowing users to manipulate the time series for forecasting activities (ex. web visits per hour)
Data understanding (data mining)
analysts of all current data along with identifying any data quality issues
evaluation (data mining)
analyze the trends and patterns to assess the potential for solving the business problem
social media analysis
analyzes text flowing across the Internet, including unstructured text from blogs and messages
web analysis
analyzes unstructured data associated with websites to identify consumer behavior and website navigation
text analysis
analyzes unstructured data to find trends and patterns in words and sentences
fast data
application of big data analytics to small data sets in near-real or real-time in order to solve a problem or create business value
data modeling (data mining)
apply mathematical techniques to identify trends and patterns in the data
data artist
business analytics specialist who uses visual tools to help people understand complex data
the term fast data is usually associated with?
business intelligence and the goal is to quickly gather and mine structured and unstructured data so that action can be taken
How is classification and cluster analysis different?
classification analysis requires that all classes are defined before the analysis begins
big data
collection of large, complex data sets including structured and unstructured data, which cannot be analyzed using traditional database methods and tools
cube
common term for the representation of multidimensional info
What are the 2 primary computing models that have shaped the collection of big data?
computing and vitalization
data preparation (data mining)
gather and organize the data in the correct formats and structures for analysis
exploratory data analysis
identifies patterns in data including outliers, uncovering the underlying structure to understand relationships between the variables
cluster analysis
technique used to divide an info set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible (ex. targeting marketing based on zip codes)
Data mining process model
1. Business Understanding 2. Data Understanding 3. Data Preparation 4. Data Modeling 5. Evaluation 6. Deployment
What does insights extracted from data profiling do?
Can determine how easy or difficult it will be to use existing data for other purposes along with providing metrics on data quality
what are the 3 elements of data mining?
Data, discovery and deployment
Regression model
a statistical process for estimating the relationships among variables, include many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent and independent variable (ex. predict the winners of a marathon based on gender, weight)
Optimization model
a statistical process that finds the way to make a design system, or decision as effective as possible (ex. choose a combo of projects to max. overall earnings)
Business understanding (data mining)
gain a clear understanding of the business problem that must be solved and how it impacts the company
Recommendation engine
data mining algorithm that analyzes a customer's purchases and actions on a website and then uses the data to recommend complementary products
outlier
data value that is numerically distant from most of the other data points in a set of data; helped identified by anomaly detection
deployment (data mining)
deploy the discoveries to the organization for work in everyday business
data visualization
describes technologies that allow users to see or visualize data to transform info into a business perspective
correlation analysis
determines a statistical relationship between variables, often for the purpose of identifying predictive factors among the variables
estimation analysis
determines values for an unknown continuous variable behavior or estimate future value; one of the least expensive modeling techniques
what does variety of big data mean?
different forms of structured/unstructured data, data from spreadsheets and databases as well as from email, videos, photos and PDFs, all of which must be analyzed
data scientist
extracts knowledge from data by performing statistical analysis, data mining and advanced analytics on big data to identify trends, market changes and other relevant info
big data includes data sources that include
extremely large volumes of data, with high velocity, wide variety and an understanding of the data veracity
Data
foundation for data directed decision making
Data mining can determine relationships among...
internal factors (such as price, product positioning or staff skills) and external factors (economic indicators, competition, and customer demographics)
algorithms
mathematical formulas placed in software the performs an analysis on a data set
With distributed computing individual computers are...
networked together across geographical areas and work together to execute a workload or computing processes as if they were one single computing environment
estimation models predict
numeric outcomes based on historical data
analysis paralysis
occurs when the user goes into an emotional state of over-analysis a situation so that a decision or action is never taken, in effect paralyzing the outcome
market basket analysis
one of the most common forms of association detection analysis; evaluates such items as websites and checkout scanner info to detect customers' buying behavior and predict future behavior by identifying affinities among customers' choices of products and services
3 common data mining techniques for predictions
optimization, forecasting and regression model
infographics
present the results of data analysis, displaying the patterns relationships and trends in a graphical format
discovery
process of identifying new patterns, trends, and insights
deployment
process of implementing discoveries to drive success
Distributed computing
processes and manages algorithms across many machines in a computing environment
Data mining allows users to
recycle their work to become more efficient and effective on solving future problems
affinity grouping analysis
reveals the relationship between variables along with the nature and frequency of the relationship; create rules to determine the likelihood of events occurring together at a particular time or following each other in a logical progression
data visualization tools
sophisticated analysis techniques such as controls, instruments, maps, time-series graphs
prediction
statement about what will happen or might happen in the future
what does velocity of big data mean?
the analysis of streaming data as it travels around the Internet, analysis necessary of social media messages spreading globally
pattern recognition analysis
the classification or labeling of an identified pattern in the machine learning process
virtualization
the creation of a virtual (rather than actual) version of computing resources, such as an operating system, a server, a storage device, or network resources
data mining
the process of analyzing data to extract info not offered by the raw data alone
speech analysis
the process of analyzing recorded calls to gather info; heavily used in customer service
Data profiling
the process of collecting statistics and info about data in an existing source
anomaly detection
the process of identifying rare or unexpected items or events in a data set that do not conform to other items in the data set
classification analysis
the process of organizing data into categories or groups for its most effective and efficient use
data replication
the process of sharing info to ensure consistency between multiple data sources
what does volume of big data mean?
the scale of data; includes enormous volumes of data generated daily; massive volume created by machines and networks; big data tools necessary to analyze zettabytes and brontobytes
analytics
the science of fact-based decision making; use software based algorithms and stats to derive meaning from data
what does veracity of big data mean?
the uncertainty of data, including biases, noise and abnormalities; uncertainty or untrustworthiness of data; data must be meaningful to the problem being analyzed; must keep data clean and implement processes to keep dirty data from accumulating in systems
Why do companies use data mining techniques?
to compile a complete picture of their operations, all within a single view, allowing them to identify trends and improve forecasts
business intelligence dashboards
track corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such an interactive controls, allowing users to manipulate data for analysis
behavioral analysis
using data about people's behaviors to understand intent and predict future actions
data mining tools
variety of techniques to find patterns and relationships in large volumes of info that predict future behavior and guide decision making
What are the 4 common characteristics of big data?
variety, veracity, volume and velocity