Business Analytics with SAS

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Valid Association Rules

- A rule has to meet a minimum support and a minimum confidence level - Both thresholds are determined by the modeler - Consider an association rule found in a cell phone company database containing all call destinations for each account:

The problem is decomposed into two sub-problems

Find all frequent itemsets (i.e., all sets of items with support above min_sup) From each frequent itemset, generate rules that use items from that frequent itemset

Evaluating Association Rules: Lift

-Lift provides information about the increase in probability of the consequent, given the antecedent. I.e., does including the antecedent improve the probability of finding the consequent over random chance. -Lift takes into account statistical (in)dependence 𝑙𝑖𝑓𝑡=(𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑟𝑢𝑙𝑒)/(𝑠𝑢𝑝𝑝𝑜𝑟𝑡 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑐𝑜𝑛𝑠𝑒𝑞𝑢𝑒𝑛𝑡)=(𝑃(𝐶𝑜𝑛𝑠𝑖𝑞𝑢𝑒𝑛𝑡|𝐴𝑛𝑡𝑒𝑐𝑒𝑑𝑒𝑛𝑡))/(𝑃(𝐶𝑜𝑛𝑠𝑖𝑞𝑢𝑒𝑛𝑡))

Other Evaluation Criteria

-Many rules are either obvious or uninteresting {𝑀𝑎𝑡𝑒𝑟𝑛𝑖𝑡𝑦 𝑊𝑎𝑟𝑑}⇒{𝑃𝑎𝑡𝑖𝑒𝑛𝑡 𝑖𝑠 𝐹𝑒𝑚𝑎𝑙𝑒} -Screen for rules that are of particular interest and significance -Use domain specific conditions to filter generated rules -Some thoughts Action-ability - Keep only those rules that can be acted upon Interestingness - Rules are interesting if they are new (sometimes) or contradict what is currently known

Limitations to Market Basket Analysis

-Requires a large number of real transactions -Data's accuracy may be compromised if the products do not occur with similar frequency -Market basket analysis can sometimes capture the results of a previous successful marketing campaign, rather than the natural tendencies of customers

Association Rules

1)An association rule is a patterns that suggests when one event occurs, another event is likely to occur as well 2)Association rules are structured as sets of if/then statements 3)Each rule suggests co-occurrence, not causality 4)Market basket analysis is the most common example of the use of association rules

Association rule requirement

1)Association rules require data in transactional format 2)Meaningful rules can only be computed for categorical data

What is Market Basket Analysis?

1)Given a transactional database (set of transactions), find rules that predict the occurrence of an item based on the occurrences of other items in the database 2)People tend to buy things together... we might as well help/ guide/exploit 3)This is the most widely used and, in many ways, most successful data mining algorithm 4)Parable of beer and diapers popularized by Osco Drug

Generating Association Rules

1)Most common approach for generating association rules is the Apriori Algorithm (Agrawal et al. 1994) 2)Algorithm generates all association rules that have -Support greater than the user-specified support threshold (min_sup) -Confidence greater than the user-specified confidence threshold (min_conf) 3)The algorithm performs a (relatively) efficient search over the data to find such rules

Applications of Market Basket Analysis

1)Recommendation - Sellers can suggest products which may also appeal to the consumer (i.e., "you might also like...") 2)Bundling - Sellers can bundle commonly purchased products together, or bundle commonly purchased products with less frequently purchased products 3)Product placement - Retail stores can use this information to place these products in the same area of the store 4)Other applications Customer insight and buying behaviors Subsequent purchases And many others...

Association Rules for Market Basket Analysis

1)Rules are written in the form "left-hand side (LHS) implies right-hand side (RHS)" LHS is the antecedent RHS is the consequent 𝑖𝑓 {𝑠𝑒𝑡 𝑜𝑓 𝑖𝑡𝑒𝑚𝑠}⇒𝑡ℎ𝑒𝑛 {𝑠𝑒𝑡 𝑜𝑓 𝑖𝑡𝑒𝑚𝑠}

Evaluating Association Rules: Confidence

Confidence is a measure of the strength of the rule Proportion of transactions containing the antecedent that also contain the consequent i.e., it is the conditional probability that a transaction containing the antecedent also contains the consequent 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒=(# 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑤ℎ𝑒𝑟𝑒 𝑡ℎ𝑒 𝑟𝑢𝑙𝑒 𝑖𝑠 𝑡𝑟𝑢𝑒)/(# 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑡ℎ𝑒 𝑎𝑛𝑡𝑒𝑐𝑒𝑑𝑒𝑛𝑡)

Evaluating Association Rules: Support

Support is a measure of the relevance of the rule -Frequency of transactions in which the items in the antecedent and the consequent co-occur -i.e., it is the probability that a transaction contains the antecedent and the consequent 𝑠𝑢𝑝𝑝𝑜𝑟𝑡=(# 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑤ℎ𝑒𝑟𝑒 𝑡ℎ𝑒 𝑟𝑢𝑙𝑒 𝑖𝑠 𝑡𝑟𝑢𝑒)/(# 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎𝑏𝑎𝑠𝑒)

Evaluating Association Rules

To make effective use of a rule, three numeric measures about that rule must be considered: 1)Support refers to the percentage of baskets where the rule was true (both left and right-side products were present) 2)Confidence measures what percentage of baskets that contained the left-hand product also contained the right-hand products 3)Lift measures how many times confidence is larger that the expected (baseline) confidence. A lift value greater than 1 is desirable.

Comparison to Traditional DB Queries

Traditional DB Queries -Can be tedious and difficult to quantify -Supports hypothesis verification about relationships (e.g., do diapers and beer co-occur) Data Mining -Relatively easy to automatically discover association rules from data -User does not have to specify what to look for in advance (data driven) -Potential for finding unexpected correlations


Set pelajaran terkait

Microprocessor System (MCSL51E) - Chapter 2: The Microprocessor and its Architecture

View Set

Prep U: Ch 18: Disorders of Blood Flow & BP

View Set

John Green Crash Course on Media Literacy

View Set

Ch17: The Nervous System: Autonomic Nervous System

View Set

Positive and Negative Punishment

View Set

Chapter 18: Respiratory Disorders Questions

View Set

Drugs, Alcohol, and Tobacco- Class Notes

View Set