Analytics

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

coefficient of determination

A statistical measure of goodness of fit; a measure of how closely a regression line fits the data on which it is based.

inference engine

Part of the expert system that seeks information and relationships from the knowledge base and provides answers, predictions, and suggestions similar to the way a human expert would. (interprets & evaluates facts to derive new rules based on its "experience", thereby expanding its knowledge

Understand data flow reporting

Source Systems (ERP; Data Warehouses; Databases; Flat files; Legacy systems; Web services; Sensor data); Semantic Layer (optional) (example: SAP Business Objects Semantic Layer); Authoring (Report design tool; Report authoring tool); Report Server (Report deployment); End-User (Web; Mobile; Print)

Know the difference between structured & unstructured data

Structured: stored in databases (numeric; character) Unstructured: stored as files (text; email; docs; comments; photos; diagrams; voice; audio; video)

Know the elements of a report

Text; Subreports; Geometric Shapes; Images; Charts; Maps

Overfitting

The process of fitting a model too closely to the training data for the model to be effective on other data.

neural networks (aka artificial neural networks) ANN

are a type of machine learning that is based on biological neural networks such as human (or animal) brain

estimation models

attempt to approximate or otherwise determine outcomes based on multiple parameters & known relationships expressed as mathematical algorithms or parametric equations, in other words, equations that express a set of quantities as functions of independent variables

Naive Bayes Classifier

based on Bayes' theorem from statistics. focuses on conditional probability, which is the probability that an event A will occur given that another event B has occurred

negative feedback mechanism

because it occurs in the direction opposite to the change

Cockpit

business dashboards are sometimes configured in related clusters to form this

feedback loop

control mechanism that is employed in economics, the sciences, & engineering to bring actual outcomes into alignment w/ desired outcomes (Input: desired results System: business process Output: actual result)

Validation Partition

Used to assess the predictive performance of each model so that you can compare models and choose the best one

What are the V's of big data

Volume; Variety; Velocity; Variability; Veracity; Volatility; Value

Decision cycle

data; analysis; insight(s); decision; action; outcome; assessment; improvement

Big data

describes the explosion of data generation, storage, & usage since the beginning of the 21 century

Association analysis

determines affinity or relationships among different variables w/ in the dataset

Understand what data mining is & why it is important

discovery & retrieval of useful data; these discoveries frequently are applied to new datasets & are utilized to predict future trends in order to implement effective strategies. Helps discover patterns, relationships & trends that are not evident using techniques learned thus far

Elements

dropdown menus: also called dropdown lists; combo boxes; accordions; list boxes; check boxes; dials & sliders; radio buttons; toggle buttons; slicers

node (different types)

each decision point w/ in the tree

Forecast

estimation of the value of a variable in the future

Fully manual decision cycle

every stage of the cycle is assessed, evaluated, & moved forward by humans

know the challenges to optimization

external events; unexpected events; competition; changing market forces; bad data; wrong decisions & actions

Deployment

final step in the authoring process

Clustering

group together similar data values based on chosen characteristic or attributes

Breaking

grouping; creating a break after each group

Recommendation engine

identify potential add-on sales for customers

Cluster size

indicates the number of members w/ in, the larger the circle, the more members w/ in it

control system

loop of contant feedback to adjust the system to achieve desired goals

simple linear regression

mathematical model that creates an arithmetic equation to explain the relationship between variables

Conditional probability

measures the probability of an event given that another event has occurred

Define: Classification

model that classifies a categorical target variable based on a set of independent variables that may be numerical or categorical. (a classifier places a target into appropriate categories)

Deterministic system

most predictable; w/ in the system, given the present state-which is the complete description of all system attributes at the present time--we can predict, at least in theory, all future states, w/ full certainty

Multivariate time series

multiple variables change over time, & we want to model the interactions among them. (ex. measure temperature & carbon dioxide concentration [to variables] over the earth's history [time])

unstructured data

non-numeric information that is typically formatted in a way that is meant for human eyes and not easily understood by computers

positive feedback mechanism

one in which the feedback loop adds to the input. good because, they occur in the same direction as the input

Understand clustering

one popular algorithm for identifying clusters is k-means. represents the number of clusters, or groupings

Semantic Layer

optional intermediate layer between corporate data (from an informational system or from transactional system) and the authoring tool. Use to consolidate multiple data sources into a single source for the report authoring tool

classification models

or classifiers; used to classify or categorize data, entities, & events to identify patterns that explain how different predictor variables in a model contribute to an outcome

Chart of components

or selectors are standard user interface mechanisms. The availability of all of these components gives the user the ability to choose what is displayed in the dashboard & how it is displayed

optimization

overall goal of a cycle

cycle

pattern that displays highs & lows outside or in addition to the seasonal high & lows; unlike seasonality, the length of it does not need to be constant

Real-time analysis

performed on a continuous basis; with results gained in time to alter the run-time system

Define: Data mining model

statistical technique that is chosen to find trends, patterns & relationships w/ in existing data, staged in the 1st phase of the process

expert system

subset of AI. These systems are rules-based & used to make decisions, particularly routine decisions, both w/ and w/o human help. usually specific to a knowledge domain. system continues to learn as it is employed

Machine Learning

supervised data models are also known as this; ideal for large, complex problems, & it is at the heart of artificial intelligence (the extraction of knowledge from data based on algorithms created from training data)

time series analysis

technique that analysts use to: uncover any implicit structure (patterns or trends) in the data; model that structure to make forecasts

trend

tendency of the mean of the data to increase, decrease, or stay the same over time

prediction interval

the interval w/ in with we can forecast that the variable value will fall w/ certain probability

Define: Data mining

the process of analyzing large amounts of data to discover patterns, relationships, & trends that cannot be discovered through slicing & dicing techniques

what is the role of data analysis in the decision cycle

they assume significant responsibility for the accuracy & reliability of the analytical results. Analysis-paralysis; bias in analytics; bias in the data collection phase; bias in the analysis phase; bias in the insight phase; bias in the outcome phase; bias in the assessment phase; bias in the improvement phase. admit there are biases

Display Only Dashboards

those that do not allow for user interaction; ex. "display only"

market basket analysis

to cross-sell & upsell products or services to customers

Exponential Smoothing

to decompose a time series into its components--trends, seasonality, cycles, & randomness--use this.

decision tree

tree-like diagram of rules that consist of decisions & their outcomes. 2 types: classification trees & regression trees

Support Vector Machines (SVM)

type of machine learning; can be used for both estimation & classification

Define: Predictive models

use part data to analyze & discover trends, patterns, & relationships w/ the goal of applying results to future data to make predictions

Test Partition

used to assess the performance of the chosen model with new data; does not impact the training, selection or validation of the model

Training Partition

used to train the model based on existing data

partly automated decision cycle

utilize both human actions & computer automation

machine learning

utilizes AI, data mining, & statistics; along w/ algorithms that can learn from data to make predictions

fully automated decision cycle

completed entirely by computers in all stages from data acquisitions to decisions to improvements. no human assistance or intervention is needed

Genetic Algorithms (GAs)

computer scientists who work in artificial intelligence or in machine learning have devised methods to mimic evolution in their programs

residual

also called irregular data; random, unpredictable part of the dataset; cannot be avoided in real-world scenarios

Unsupervised data mining

also descriptive model; exploratory in nature

Chaotic system

also deterministic; through they are highly sensitive to slight fluctuation in input conditions

understand technologies that automate data analysis

AI; expert systems; machine learning

structured data

Data that (1) are typically numeric or categorical; (2) can be organized and formatted in a way that is easy for computers to read, organize, and understand; and (3) can be inserted into a database in a seamless fashion.

Steps in data mining process

Data Staging (acquisition, harmonization); Data Mining Model (choosing model); Validation (model validation); Deployment (model in transactions); Monitoring (evaluating the model & retraining when necessary)

Various predictive data models

Estimation: simple linear regression Classification: Naive Bayes; K-nearest neighbors (KNN); Logistics regression; Decision trees; Neural networks; Genetic algorithms; Support vector machines (SVM)

Which models are applicable for which type of predictive scenarios?

Estimation: used to predict a specific value of numeric dependent, or target variable; Classification: use existing data to train the model to predict a categorical variable

Know the examples of reports mentioned in the textbook

Expense reports; monthly statements; crime report for a police department; airline delay reports; account statement; paychecks

Identify analytical techniques used to develop forecasting models

Exponential Smoothing: single, is effective for data w/ purely random component; no trend or seasonality. double, for data that exhibit trends but no seasonality. triple, provides a means for decomposing data that have both trend & seasonality

Know the steps for authoring a report

Identifying the needs of the report user; Identifying data sources; Building the layout for readability; Binding analytical components to data sources; Report structure; Adding prompts for end-users; Deployment

analysis paralysis

Refers to over-analyzing a situation so that a decision or action is severely delayed or never taken, in effect paralyzing the process of deciding

Subreport

Report contained within another report

Interactive dashboards

allows users to choose inputs that modify the visualization to meet their specific needs

Stochastic system

called non-deterministic system; do not have deterministic laws or rules. they are modeled using random variables, sometimes expressed as a range of values

logistic regression

classification model in which the dependent, or predicted variable, is categorical; that is, there are groupings into which a case is classified

K-nearest neighbor (K-NN)

classification; based on a simple observation that a case is most likely to be similar to its nearest neighbors

Identify unsupervised machine learning models used in descriptive data mining

clustering; association analysis

Affinity

co-occurrence of items

Binding

process by which a data component is linked to the data source from which its data will be extracted for display

Regression

process of estimating or defining the relationship between & among variables & developing a model of cause & effect. It answers the question of which variables(s) affect another variable & in what way

Transaction

referring to a collection of items that occur together in an identifiable event

seasonality

refers to pattern of regular periodic fluctuations in the data over time

Association rules

relationships that are identified by associational analysis; stronger rules imply greater association

Cluster density

represented by color intensity

confusion matrix

results of the validation dataset are presented as this, which is used to evaluate the performance of the classification model

Anticipatory models

seek to determine what might happen in the future in order to inform business decisions. uses real-time data, must be monitored continuously

Time series

sequence of values of an attribute, or variable, measured at equidistant intervals of time

Univariate time series

single variable (ex. interest rate, global temperature, inventory stock value, & population)

Cluster distance

space between them; indicates how dissimilar customers in one are from customers in another one

Scorecards (balanced)

specialized types of dashboards used to display key performance indicators (KPIs) that are linked to corporate objectives & goals. Effective at creating goal congruence by using both financial & non-financial measures at various level w/ in the organization


Ensembles d'études connexes

Chap 52 Disorders of Skin Integrity and Function

View Set