MIS 302F Unit 6
Which of the following types of data can be used for data mining?
1. All of these are used for data mining (correct) 2. Transaction Data 3. Surfing Data 4. Emotions and Social Interaction Data 5. Sensor Data
Data mining is ______?
A process of finding meaningful patterns in data to improve decisions
Advanced Scout
Advanced Scout is data mining software used by the National Basketball Association (NBA) to understand which players perform better and how these players fair against various opponents
Which data mining method uses an 'if' antecedent item set, 'then' consequent item set?
Association rules
Which of the following was NOT one of the methods Tesco used to beat Walmart?
Avoided use of club cards which Tesco learned created confusion for customers
Which of these is a commonly used predictive method?
Classification tree
confidence
Confidence is the conditional probability that C will be purchased if A is purchased A high confidence level is a good indication of customer behavior transactions with both a and c/transactions with only a
Which of these data mining methods finds interpretable human patterns that describe data?
Descriptive data mining
Advanced Scout is data mining software used by the National Basketball Association (NBA) to understand which of these?
Determine which NBA players perform best and identify how these players fair against various opponents
Before beginning to conduct data mining, a business should do which of these?
Focus on its business and data mining goals
General Mills
General Mills uses a data warehouse to identify when and where to offer coupons to move its inventory
Google tracks users throughout the Internet to present them with custom advertising
Methods that Tesco used to beat Walmart
Identified socio-economic groups of people likely to leave Tesco and targeted those customers Identified which items customers who were likely to leave Tesco mostly bought Undercut Walmart on specific item prices to lower incentives for customers to switch
Direct Marketing
Identifies which prospects should be included in a mailing list to obtain the highest response rate
retention
Keeps the customer from going to a competitor by better understanding what that customer wants or needs
Which of these methods uses variables to predict unknown or future values of the outcome?
Predictive
Interactive Marketing
Predicts what each individual accessing a website is most likely interested in seeing
Trend Analysis
Reveals the difference in the purchasing behavior among customers this month relative to a previous month
Target
Target studies customer purchase patterns and tries to target consumers with ads
In the case of Tesco versus Walmart, what happened?
Tesco was able to beat Walmart in Europe by embracing data mining
fraud
There are many ways to uncover fraud. Credit card companies can examine patterns in your purchases and see if there are any deviations to be concerned with. Typically credit card thieves will see if a card works by testing it at a gas station. Knowing this, credit card companies look for this pattern, deny the charge upfront, and put a hold on the card until you call to unlock it
Market Basket Analysis
Used to understand what products or services are commonly purchased together, such as beer and diapers
Walmart
Walmart studies its customers carefully. It attempts to use this customer data to predict what customers will buy. If there is a hurricane in the area, the store knows to stock up on beer and Pop-Tarts
up selling
When a firm attempts to sell more expensive products, upgrades, or accessories to existing customers to increase revenue. It is possible that a customer who doesn't want to upgrade, but is a drain on a firm's resource, can be divorced as well
classification
a large association rule with a decision at the end example: loan. the bank asks you your income, education, etc, and decides to give you a loan or not
lift ratio
an indicator of the strength of an association rule. Here, you compare the confidence of the rule where you assume that the occurrence of the consequent item set in a transaction is independent of the occurrence of the antecedent for each rule with some benchmark value Benchmark confidence = No. of transactions with consequent item set/No. of transactions in database Lift ratio = Confidence/Benchmark confidence probability of c given a/probability of c
hierarchical clustering
each data point is treated as a separate cluster to being with. then you merge two clusters that are closest to each other until there is only one group
clustering
grouping objects based on certain characteristics. a good cluster is when data within a cluster are close together and/or any two clusters are far apart. two methods of clustering: k-means and hierarchical
association rule
if i buy a, then c. if you buy a certain product is there a likelihood that you will buy another certain product. this rule is not always true, there is some level of confidence.
customer churn
predicts which customers are likely to leave a company and go to a competitor
descriptive method
these methods find interpretable human patterns that describe the data examples: clustering, association rules
predictive method
these methods use variables to predict unknown or future values of the outcome. also used for understanding influences, predicting fraud, sensing market trends, analyzing the market basket, mining emotions, and more example: classification tree
market segmentation
used to identify the common characteristics of customers who buy the same products from your company
market basket analysis
what do customers usually buy together
cross selling
when a firm attempts to sell other products or services to a customer who has purchased some product or service from that firm. Using this process, a firm can increase revenue and customer reliance, and benefit from economics of scope
k-means clustering
you select K number of clusters you want and then partition the data into K subgroups. you select K means randomly and assign data points that are closest to these means, once assigned calculate the mean for each cluster and repeat the process until the clusters change little.