Intro to Data Analytics Chapter 4

Ace your homework & exams now with Quizwiz!

Test partition

A second level of validation often done with a second set of historical data

What are other names for association rule?

Affinity Analysis and Market Basket Analysis

Conformed dimensions

Common dimensions connecting other dimensional tables (more than one usually)

How do you read a lift chart?

The straight line is no model being used The curved line is the application of the data model The larger the distance between the lines, the better the model's performances

Where are the foreign keys in a star schema?

They are located in the fact table. Measures can also be called facts.

What is the lowest level of detail?

Transactional

What type of data supplies measures/facts in the fact table?

Transactional data

What combinations can you have in a confusion matrix?

True Positive TP = top left False Positive FP = top right True Negative TN = bottom right False Negative FN = bottom left

How is data mining different from other analytics?

Uses mathematical modeling and algorithms Uses machine learning to learn patterns in the data Many data mining algorithms mimic the learning process of the human brain "The unasked question" uncovering hidden patterns in the data

One dimensional table connected to two others provides...

a hierarchy of detail consisting of three levels

The diagonal from top left to bottom right are...

correct classifications

Association rule

determine what goes with what based on past patterns of behavior. The results of the algorithm in a series of "If ... Then" rules UNSUPERVISED

What is another way to refer to "data detail"?

grain

The diagonal from top right to bottom left are...

incorrect classifications

Supervised learning

involves input variables and a defined output variable

Unsupervised learning

involves input variables and no defined output variables

Clustering

partition a collection of things into segments of natural groupings where each group shares similar characteristics UNSUPERVISED

What data types are analyzed in data mining?

quantitative data (numeric) and qualitative data (nominal and ordinal)

What two categories always have a hierarchy of detail?

Territory (continent, country, city) and Date (year, month, week, day)

When was the greenbar report popular?

The 1960s, mainframe era

What is the cross industry process for data mining? (CRISP-DM)

Steps, business understanding, data understanding, data preparation, model building, test and evaluation, and deployment

What are some types of UNsupervised learning?

Association rules and Clustering

What is the most widely used performance management system?

Balanced scorecard

What are some example of commonly used classification algorithms?

Bayes Classifier, Decision Trees, k-Nearest Neighbors, Neural networks

Lift charts/ROC charts

Evaluate data mining models by showing how effective data mining models are at classifying and predicting.

How does the fact table connect to the dimensional tables?

Foreign keys in the fact table connect to primary keys in the dimensional tables

What are some software packages used in data mining?

IBM SPSS Modeler, SAS Enterprise Miner, R, Dell Statistica, Rapid Miner, XLMiner

Confusion matrix

Table used to evaluate the accuracy of a classification model (How effective is it as classifying data to match reality)

What are some common mistakes made in data mining projects?

Incorrect identification of problem Not managing expectations about data mining Beginning without an end result in mind Not having access to proper necessary data Not properly preparing data Not drilling down into detail/authors

How does machine learning/confusion matrix work?

Take some of the historical data and assign it to the training partition and another as the validation partition. Algorithm looks at data in training partition trying to draw patterns from the data and uses confusion matrix to make guesses checked by the validation data to show how well the model learned

Where can data mining be used?

Medical/healthcare, pharmaceuticals, retail, banking/credit card (identify fraudulent activities)

Neural networks

Mimics the function of the brain and can be used to predict the weather, approve credit, and formulate target marketing

What three layers of information should all dashboards have?

Monitoring, analysis, and management

Data mining

Non-trivial process of identifying valid, novel, potentially useful and understandable patterns in the data found in STRUCTURED databases

What are some types of supervised learning?

Prediction and Classification

What are the five data mining tasks?

Prediction, classification, clustering, association, visualization, and time series forecasting

What are some other names for data mining?

Predictive analytics, knowledge extraction, pattern analysis, data archaeology, and information harvesting

What Microsoft tool is used for building an OLAP cube?

SQL Server Analysis Services

Classification

SUPERVISED machine learning technique that learns patterns from historical data and uses that learning to place new instances into their respective groups

What is a label that doesn't constitute a dimensional table?

Sales discount since it's just transactional data in the fact table


Related study sets

AZ-104: Prerequisites for Azure Administrators

View Set

Externalities & Public Goods-- Microecon Chapter 8

View Set

Modern dental assisting chapter 5, Modern Dental Assisting Chapter 5, Modern Dental Assisting Chapter 5 Terms, Modern Dental Assisting Ch 5 Review, Modern Dental Assisting Chapter 22, Modern Dental Assisting - Chapter 21 and 22 study guide, Modern De...

View Set

XCEL Chapter Exam - Life Premiums and Benefits

View Set

Analytical skills: Qualities and characteristics associated with solving problems using facts.

View Set