Analytics
coefficient of determination
A statistical measure of goodness of fit; a measure of how closely a regression line fits the data on which it is based.
inference engine
Part of the expert system that seeks information and relationships from the knowledge base and provides answers, predictions, and suggestions similar to the way a human expert would. (interprets & evaluates facts to derive new rules based on its "experience", thereby expanding its knowledge
Understand data flow reporting
Source Systems (ERP; Data Warehouses; Databases; Flat files; Legacy systems; Web services; Sensor data); Semantic Layer (optional) (example: SAP Business Objects Semantic Layer); Authoring (Report design tool; Report authoring tool); Report Server (Report deployment); End-User (Web; Mobile; Print)
Know the difference between structured & unstructured data
Structured: stored in databases (numeric; character) Unstructured: stored as files (text; email; docs; comments; photos; diagrams; voice; audio; video)
Know the elements of a report
Text; Subreports; Geometric Shapes; Images; Charts; Maps
Overfitting
The process of fitting a model too closely to the training data for the model to be effective on other data.
neural networks (aka artificial neural networks) ANN
are a type of machine learning that is based on biological neural networks such as human (or animal) brain
estimation models
attempt to approximate or otherwise determine outcomes based on multiple parameters & known relationships expressed as mathematical algorithms or parametric equations, in other words, equations that express a set of quantities as functions of independent variables
Naive Bayes Classifier
based on Bayes' theorem from statistics. focuses on conditional probability, which is the probability that an event A will occur given that another event B has occurred
negative feedback mechanism
because it occurs in the direction opposite to the change
Cockpit
business dashboards are sometimes configured in related clusters to form this
feedback loop
control mechanism that is employed in economics, the sciences, & engineering to bring actual outcomes into alignment w/ desired outcomes (Input: desired results System: business process Output: actual result)
Validation Partition
Used to assess the predictive performance of each model so that you can compare models and choose the best one
What are the V's of big data
Volume; Variety; Velocity; Variability; Veracity; Volatility; Value
Decision cycle
data; analysis; insight(s); decision; action; outcome; assessment; improvement
Big data
describes the explosion of data generation, storage, & usage since the beginning of the 21 century
Association analysis
determines affinity or relationships among different variables w/ in the dataset
Understand what data mining is & why it is important
discovery & retrieval of useful data; these discoveries frequently are applied to new datasets & are utilized to predict future trends in order to implement effective strategies. Helps discover patterns, relationships & trends that are not evident using techniques learned thus far
Elements
dropdown menus: also called dropdown lists; combo boxes; accordions; list boxes; check boxes; dials & sliders; radio buttons; toggle buttons; slicers
node (different types)
each decision point w/ in the tree
Forecast
estimation of the value of a variable in the future
Fully manual decision cycle
every stage of the cycle is assessed, evaluated, & moved forward by humans
know the challenges to optimization
external events; unexpected events; competition; changing market forces; bad data; wrong decisions & actions
Deployment
final step in the authoring process
Clustering
group together similar data values based on chosen characteristic or attributes
Breaking
grouping; creating a break after each group
Recommendation engine
identify potential add-on sales for customers
Cluster size
indicates the number of members w/ in, the larger the circle, the more members w/ in it
control system
loop of contant feedback to adjust the system to achieve desired goals
simple linear regression
mathematical model that creates an arithmetic equation to explain the relationship between variables
Conditional probability
measures the probability of an event given that another event has occurred
Define: Classification
model that classifies a categorical target variable based on a set of independent variables that may be numerical or categorical. (a classifier places a target into appropriate categories)
Deterministic system
most predictable; w/ in the system, given the present state-which is the complete description of all system attributes at the present time--we can predict, at least in theory, all future states, w/ full certainty
Multivariate time series
multiple variables change over time, & we want to model the interactions among them. (ex. measure temperature & carbon dioxide concentration [to variables] over the earth's history [time])
unstructured data
non-numeric information that is typically formatted in a way that is meant for human eyes and not easily understood by computers
positive feedback mechanism
one in which the feedback loop adds to the input. good because, they occur in the same direction as the input
Understand clustering
one popular algorithm for identifying clusters is k-means. represents the number of clusters, or groupings
Semantic Layer
optional intermediate layer between corporate data (from an informational system or from transactional system) and the authoring tool. Use to consolidate multiple data sources into a single source for the report authoring tool
classification models
or classifiers; used to classify or categorize data, entities, & events to identify patterns that explain how different predictor variables in a model contribute to an outcome
Chart of components
or selectors are standard user interface mechanisms. The availability of all of these components gives the user the ability to choose what is displayed in the dashboard & how it is displayed
optimization
overall goal of a cycle
cycle
pattern that displays highs & lows outside or in addition to the seasonal high & lows; unlike seasonality, the length of it does not need to be constant
Real-time analysis
performed on a continuous basis; with results gained in time to alter the run-time system
Define: Data mining model
statistical technique that is chosen to find trends, patterns & relationships w/ in existing data, staged in the 1st phase of the process
expert system
subset of AI. These systems are rules-based & used to make decisions, particularly routine decisions, both w/ and w/o human help. usually specific to a knowledge domain. system continues to learn as it is employed
Machine Learning
supervised data models are also known as this; ideal for large, complex problems, & it is at the heart of artificial intelligence (the extraction of knowledge from data based on algorithms created from training data)
time series analysis
technique that analysts use to: uncover any implicit structure (patterns or trends) in the data; model that structure to make forecasts
trend
tendency of the mean of the data to increase, decrease, or stay the same over time
prediction interval
the interval w/ in with we can forecast that the variable value will fall w/ certain probability
Define: Data mining
the process of analyzing large amounts of data to discover patterns, relationships, & trends that cannot be discovered through slicing & dicing techniques
what is the role of data analysis in the decision cycle
they assume significant responsibility for the accuracy & reliability of the analytical results. Analysis-paralysis; bias in analytics; bias in the data collection phase; bias in the analysis phase; bias in the insight phase; bias in the outcome phase; bias in the assessment phase; bias in the improvement phase. admit there are biases
Display Only Dashboards
those that do not allow for user interaction; ex. "display only"
market basket analysis
to cross-sell & upsell products or services to customers
Exponential Smoothing
to decompose a time series into its components--trends, seasonality, cycles, & randomness--use this.
decision tree
tree-like diagram of rules that consist of decisions & their outcomes. 2 types: classification trees & regression trees
Support Vector Machines (SVM)
type of machine learning; can be used for both estimation & classification
Define: Predictive models
use part data to analyze & discover trends, patterns, & relationships w/ the goal of applying results to future data to make predictions
Test Partition
used to assess the performance of the chosen model with new data; does not impact the training, selection or validation of the model
Training Partition
used to train the model based on existing data
partly automated decision cycle
utilize both human actions & computer automation
machine learning
utilizes AI, data mining, & statistics; along w/ algorithms that can learn from data to make predictions
fully automated decision cycle
completed entirely by computers in all stages from data acquisitions to decisions to improvements. no human assistance or intervention is needed
Genetic Algorithms (GAs)
computer scientists who work in artificial intelligence or in machine learning have devised methods to mimic evolution in their programs
residual
also called irregular data; random, unpredictable part of the dataset; cannot be avoided in real-world scenarios
Unsupervised data mining
also descriptive model; exploratory in nature
Chaotic system
also deterministic; through they are highly sensitive to slight fluctuation in input conditions
understand technologies that automate data analysis
AI; expert systems; machine learning
structured data
Data that (1) are typically numeric or categorical; (2) can be organized and formatted in a way that is easy for computers to read, organize, and understand; and (3) can be inserted into a database in a seamless fashion.
Steps in data mining process
Data Staging (acquisition, harmonization); Data Mining Model (choosing model); Validation (model validation); Deployment (model in transactions); Monitoring (evaluating the model & retraining when necessary)
Various predictive data models
Estimation: simple linear regression Classification: Naive Bayes; K-nearest neighbors (KNN); Logistics regression; Decision trees; Neural networks; Genetic algorithms; Support vector machines (SVM)
Which models are applicable for which type of predictive scenarios?
Estimation: used to predict a specific value of numeric dependent, or target variable; Classification: use existing data to train the model to predict a categorical variable
Know the examples of reports mentioned in the textbook
Expense reports; monthly statements; crime report for a police department; airline delay reports; account statement; paychecks
Identify analytical techniques used to develop forecasting models
Exponential Smoothing: single, is effective for data w/ purely random component; no trend or seasonality. double, for data that exhibit trends but no seasonality. triple, provides a means for decomposing data that have both trend & seasonality
Know the steps for authoring a report
Identifying the needs of the report user; Identifying data sources; Building the layout for readability; Binding analytical components to data sources; Report structure; Adding prompts for end-users; Deployment
analysis paralysis
Refers to over-analyzing a situation so that a decision or action is severely delayed or never taken, in effect paralyzing the process of deciding
Subreport
Report contained within another report
Interactive dashboards
allows users to choose inputs that modify the visualization to meet their specific needs
Stochastic system
called non-deterministic system; do not have deterministic laws or rules. they are modeled using random variables, sometimes expressed as a range of values
logistic regression
classification model in which the dependent, or predicted variable, is categorical; that is, there are groupings into which a case is classified
K-nearest neighbor (K-NN)
classification; based on a simple observation that a case is most likely to be similar to its nearest neighbors
Identify unsupervised machine learning models used in descriptive data mining
clustering; association analysis
Affinity
co-occurrence of items
Binding
process by which a data component is linked to the data source from which its data will be extracted for display
Regression
process of estimating or defining the relationship between & among variables & developing a model of cause & effect. It answers the question of which variables(s) affect another variable & in what way
Transaction
referring to a collection of items that occur together in an identifiable event
seasonality
refers to pattern of regular periodic fluctuations in the data over time
Association rules
relationships that are identified by associational analysis; stronger rules imply greater association
Cluster density
represented by color intensity
confusion matrix
results of the validation dataset are presented as this, which is used to evaluate the performance of the classification model
Anticipatory models
seek to determine what might happen in the future in order to inform business decisions. uses real-time data, must be monitored continuously
Time series
sequence of values of an attribute, or variable, measured at equidistant intervals of time
Univariate time series
single variable (ex. interest rate, global temperature, inventory stock value, & population)
Cluster distance
space between them; indicates how dissimilar customers in one are from customers in another one
Scorecards (balanced)
specialized types of dashboards used to display key performance indicators (KPIs) that are linked to corporate objectives & goals. Effective at creating goal congruence by using both financial & non-financial measures at various level w/ in the organization