Intro to Data Analytics Chapter 4
Test partition
A second level of validation often done with a second set of historical data
What are other names for association rule?
Affinity Analysis and Market Basket Analysis
Conformed dimensions
Common dimensions connecting other dimensional tables (more than one usually)
How do you read a lift chart?
The straight line is no model being used The curved line is the application of the data model The larger the distance between the lines, the better the model's performances
Where are the foreign keys in a star schema?
They are located in the fact table. Measures can also be called facts.
What is the lowest level of detail?
Transactional
What type of data supplies measures/facts in the fact table?
Transactional data
What combinations can you have in a confusion matrix?
True Positive TP = top left False Positive FP = top right True Negative TN = bottom right False Negative FN = bottom left
How is data mining different from other analytics?
Uses mathematical modeling and algorithms Uses machine learning to learn patterns in the data Many data mining algorithms mimic the learning process of the human brain "The unasked question" uncovering hidden patterns in the data
One dimensional table connected to two others provides...
a hierarchy of detail consisting of three levels
The diagonal from top left to bottom right are...
correct classifications
Association rule
determine what goes with what based on past patterns of behavior. The results of the algorithm in a series of "If ... Then" rules UNSUPERVISED
What is another way to refer to "data detail"?
grain
The diagonal from top right to bottom left are...
incorrect classifications
Supervised learning
involves input variables and a defined output variable
Unsupervised learning
involves input variables and no defined output variables
Clustering
partition a collection of things into segments of natural groupings where each group shares similar characteristics UNSUPERVISED
What data types are analyzed in data mining?
quantitative data (numeric) and qualitative data (nominal and ordinal)
What two categories always have a hierarchy of detail?
Territory (continent, country, city) and Date (year, month, week, day)
When was the greenbar report popular?
The 1960s, mainframe era
What is the cross industry process for data mining? (CRISP-DM)
Steps, business understanding, data understanding, data preparation, model building, test and evaluation, and deployment
What are some types of UNsupervised learning?
Association rules and Clustering
What is the most widely used performance management system?
Balanced scorecard
What are some example of commonly used classification algorithms?
Bayes Classifier, Decision Trees, k-Nearest Neighbors, Neural networks
Lift charts/ROC charts
Evaluate data mining models by showing how effective data mining models are at classifying and predicting.
How does the fact table connect to the dimensional tables?
Foreign keys in the fact table connect to primary keys in the dimensional tables
What are some software packages used in data mining?
IBM SPSS Modeler, SAS Enterprise Miner, R, Dell Statistica, Rapid Miner, XLMiner
Confusion matrix
Table used to evaluate the accuracy of a classification model (How effective is it as classifying data to match reality)
What are some common mistakes made in data mining projects?
Incorrect identification of problem Not managing expectations about data mining Beginning without an end result in mind Not having access to proper necessary data Not properly preparing data Not drilling down into detail/authors
How does machine learning/confusion matrix work?
Take some of the historical data and assign it to the training partition and another as the validation partition. Algorithm looks at data in training partition trying to draw patterns from the data and uses confusion matrix to make guesses checked by the validation data to show how well the model learned
Where can data mining be used?
Medical/healthcare, pharmaceuticals, retail, banking/credit card (identify fraudulent activities)
Neural networks
Mimics the function of the brain and can be used to predict the weather, approve credit, and formulate target marketing
What three layers of information should all dashboards have?
Monitoring, analysis, and management
Data mining
Non-trivial process of identifying valid, novel, potentially useful and understandable patterns in the data found in STRUCTURED databases
What are some types of supervised learning?
Prediction and Classification
What are the five data mining tasks?
Prediction, classification, clustering, association, visualization, and time series forecasting
What are some other names for data mining?
Predictive analytics, knowledge extraction, pattern analysis, data archaeology, and information harvesting
What Microsoft tool is used for building an OLAP cube?
SQL Server Analysis Services
Classification
SUPERVISED machine learning technique that learns patterns from historical data and uses that learning to place new instances into their respective groups
What is a label that doesn't constitute a dimensional table?
Sales discount since it's just transactional data in the fact table