Data Mining Test 1

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Rank the levels of measurement:

1. Nominal 2. Ordinal 3. Interval

In a Decision tree, what happens with each split?

1. There will be an increase in purity 2. The change of purity level is defined as information gain 3. Choose the split that achieves the largest information

What are the two phases of constructing a decision tree

1. Tree Construction 2. Tree Pruning

What is a data warehouse?

A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data, in a standardized format.

Describe a decision tree

A series of nested tests containing nodes and leafs. Each node represents a test on one attribute. Each leaf is an end node with prediction.

The size of a company is _________ to describe it. Dimension Variable Feature

All of the above (Dimension, Variable, Feature)

Which of the following is (are) major step(s) in the data mining process? Interpretation/Evaluation Data Processing Data Modeling All of the above

All of the above (Interpretation/Evaluation, Data Processing, Data Modeling)

What do column/bar charts measure?

Useful in comparing value across different categories.

Attribution, Dimension, Feature, and Variable all mean the same thing?

Yes

What do box plots measure?

alternative type of chart for showing the distribution of a variable. Useful for comparing a variable against different categories.

What is the definition of information gain?

expected reduction in entropy

What do histograms measure?

most common type of chart for showing the distribution of a numerical value. Great for showing the shape of the distribution.

What is doing the "right" thing vs doing things "right" in Data Mining?

• Don't solve the problems that do not help business • Rigor takes backseat to usefulness

Which of the following are advantages of the decision tree?

• Easy to understand and interpret - tree structure specifies entire decision structure • Easy to implement • Running time is low even with large data sets • Very popular method

Give examples for the "Identify" step in the data mining cycle

• Planning for new product introduction • Planning direct marketing campaign • Understanding customer attrition/churn • Evaluation results of market test

What are the most popular tree stopping criteria?

• Pruning • Restriction on minimum node size (the minimum number of customers in nodes): not good, but makes managerial sense • Set a threshold stopping value on the value of splitting criterion

What are weaknesses of the decision tree?

• Volatile: small changes in underlying data result in very different models • Cannot capture interactions between variables • Can result in large error

How to measure purity in a decision tree?

1. Gini Index 2. Entropy

Explain what pruning a decision tree is

- Identify and remove branches - To avoid overfitting

What are examples of possible data sources?

1. "House" Data - Info kept within the company 2. Purchased Data - Data purchased outside 3. Test Data - Data generated with experiments conducted by the company. 4. Surveys - Useful for measuring consumer attitudes and competitive activity 5. Text Data - Content of customer emails or social network activities.

What are single numerical variable statistics?

1. Central Tendency (Mean, Median, Min, Max, percentiles, quartiles) 2. Dispersion/Variability (Variance, Std. Deviation) 3. Distribution shape (skewness - lack of symmetry or Kurtosis - peakedness or flatness of a distribution)

What are the main categories of potential predictors?

1. Customer Characteristics - Examples: Demographics, psychographics 2. Previous behavior - Examples: Previous purchases, response to previous marketing efforts 3. Previous Marketing - Previous marketing efforts targeted at the customer. 4. Big Data - Other Relevant Information

What is a predictor variable?

Stuff you can use to predict the target

What are the key steps of data mining?

1. Data Selection/Sampling 2. Data Preprocessing (Cleaning, Reduction & transformation) 3. Choosing Techniques 4. Data Mining 5. Pattern Evaluation & Knowledge Presentation

What is the process of predictive modeling?

1. Defining the problem 2. Preparing the data 3. Estimating the model 4. Evaluating the model 5. Making decisions

What are descriptive functions of data mining techniques?

Summarization Statistics Tables Graphs Visualization Association Rules Clustering

Name the data mining technique that answers the following question: What items are commonly bought together?

Association - Descriptive

Why bother data mining?

Because of data explosion and because cheap and powerful computers & the Internet make collecting and crunching gigantic datasets possible. And it creates value.

1. A bank uses data to predict which customers are more likely to default on their loans. Which of the following methods is NOT an appropriate data mining tool for the objective? A. Neural Network B. Decision Tree C. Exploration D. Regression

C. Exploration - You can use Neural Network, Decision Tree and Regression

What is another word for nominal and ordinal measures?

Categorical

What are prescriptive function of data mining techniques?

Classification Regression

Name the data mining technique that answers the following question: How likely is a customer to respond to a marketing campaign?

Classification/Regression - Predictive

Name the data mining technique that answers the following question: What will the sales look like for the new product?

Classification/Regression - Predictive

What is the best use for the decision tree?

Classifying an unknown sample (prediction)

Name the data mining technique that answers the following question: What cohesive groups of customers do we have?

Clustering - Descriptive

What are multi numerical variable statistics?

Covariance (relationship between two variables) and correlation (measure of linear association between two variables)

Where does data mining apply to?

Customers Product/Services Competition Reports Nonbusiness Applications (politics, security & terrorism)

What are three popular models?

Decision Tree Regression Neural Network

What is the extraction of potentially useful (yet previously unknown) patterns or knowledge from large volumes of data.

Definition of Data Mining

What are other names for target variable?

Dependent Variables Response Variables

What category of data mining answers the questions: Who are my best customers? What items are purchased together? On what and how much do they usually spend?

Descriptive

What data mining techniques characterizes properties of the data?

Descriptive

What two categories do data mining tasks fall in?

Descriptive and Predictive

What are other words for Attribution?

Dimension Feature Variable

Our sample customers have different intentions to buy our products. Out of 900, 450 customers are willing to buy and the others do not. What is the entropy of this customer sample?

Entropy is 1

T or F: The measurement level of a variable can only be interval.

False

We evaluate the performance of a predictive model based on its performance on the training data.

False

What are the entropy ranges?

From 0 (most pure) to Log (equal representation of cases)

Which type of graph would you use for the distribution of an interval variable?

Histogram

What are the steps of the data mining cycle?

Identify - business opportunities where analyzing data can provide value Transform Data - Into actionable information using data mining techniques Act - on the information Measure the results - of the efforts to complete the learning cycle (ITAM - Identify, Transform, Act, Measure)

What are other names for predictor variables?

Independent Variables Explanatory Variables

1. Which of the statements about measurement level is true? A. Interval variables can be measured as nominal B. Nominal variables can be measured as interval. C. Categorical variables can be measured as interval. D. None of the above

Interval variables can be measured as nominal. (remember the order. I, O, N: Interval, Ordinal, Nominal)

What can an interval-scaled attribute be measured by?

Interval, Ordinal, or Nominal measures

Which type of graph would you use for trend over time?

Line Chart

What is an ordinal measure?

Meaningful order or ranking but the distance between rankings has no meaning Examples: Age

What is interval measure?

Measured on a scale with meaningful difference We can perform arithmetic operations on them

What do line charts measure?

Measuring variables that change, usually over time. Best for showing trend over a time period. univariate analysis.

What is a nominal measure?

Symbols or names of things Valid operations are = and not=

Does Correlation always equal Causality?

No

What is a binary measure?

Nominal attribute with only two categories. Examples: Yes/No, Positive/Negative, True/False, Male/Female.

What can an nominal-scaled attribute be measured by?

Nominal measures only! NOT ordinal and interval measures!

What is another word for interval measures?

Numeric or Continuous

What can an ordinal-scaled attribute be measured by?

Ordinal and Nominal measures NOT interval measures

How do you construct a decision tree?

Partition training instances into purer and purer subgroups. (Group A is "purer" than group B if more members in A are similar than members in B.) Trees constructed by recursively partitioning instances.

Which type of graph would you use to measure the composition of student backgrounds?

Pie Chart

What category of data mining answers the following questions?: Will this customer default on their loan? Will this customer cancel their gym membership?

Predictive

What is the use of statistical models to predict?

Predictive Modeling

What data mining techniques makes inferences from data for prediction?

Prescriptive

What is a "pattern" in data mining?

Relationships, regularities, and structures hidden in data

Which type of graph would you use to measure the relationship between two variables?

Scatterplot

What do scatterplots/bubble graphs measure?

Shows relationship between two variables. Multivariate analyses.

What is a target variable?

The thing you want to predict

Name the data mining technique that answers the following question: What are stock price movements?

Time-Series Analysis

Pruning a tree helps to address the overfitting problem

True

T or F: An attribute can be measured at a certain level or any other levels.

True

T or F: Attributes may vary from one object to another (cross-sectional) or from one time to another (longitudinal).

True

T or F? Measuring at a lower level loses information and limits possible analyses.

True

True or False: A scatter plot is suitable for exploring the relationship between two interval variables.

True

True or False: Data mining is an iterative process that data and models shall be updated based on feedbacks, or new situations, etc.?

True

True or False: It is a predictive task to classify credit card purchases into fraudulent and legitimate ones.

True


Kaugnay na mga set ng pag-aaral

Chapter 7: Corruption and Ethics in Global Business

View Set

Chapter 28 - Therapeutic Agents for the Hemotological System

View Set

Chapter 8: The Making of Medieval Europe

View Set

Chapter 41: Fluid, Electrolyte, and Acid-Base Balance

View Set

(History) Unit 2, Assign. 7 New England Colonies

View Set

NEC Prep Quiz 3 Straight Order [230.53-250.10]

View Set

Theodore Roosevelt, William H. Taft, Woodrow Wilson

View Set

Unit 3 - Native Americans (Civil Rights. Little change.)

View Set