Module 12 of 14 Association Rules
Constraint-Based Data Mining
An Association Rule Variant where rulesets are limited based on external criteria. Helps identify patterns where small sales triggers big sales.
Phase 2 of Apriori Algorithm
Find the association rules for each frequent itemset greater than or equal to 2 items. Then, retain those rules believed to be strong (i.e. they exceed the confidence criterion)
Phases 1 of Apriori Algorithm
First, find all frequent itemsets of size 1 (i.e. {beer} and {diapers}.) Then expand these by counting frequency of all itemsets of size 2 that include itemsets of size 1. Finally, take itemsets of size 2 that are frequent, and try to expand until a max intemset size dfined by the modeler is reached or no longer possible.
Association Rule Mining
A flexible and unsupervised (Like Cluster Analysis) technique that allows us to determine that when event B is likely to occur after event A. Co occurrence, NOT causality it's particular relevance to marketing and retail product placement fields is why it is referred to as market basket analysis.
Natural Hierarchy
An Association Rule Variant for Multi-level Association Rules. It is a hiearchy in which at every level there is a one-to-many relationship between members in that level and members in the next lower level. Products are often categorized, which helps modelers discover broader purchasing patterns. As you aggregate products into groups, the support increases.
Analyzing Sequential Patterns
An Association Rule Variant when given a set of sequences, find the complete set of frequent sequences. Can be used for customer shopping sequences like when a consumer buys a computer, then a DVD rom, then a camera within 3 months. Helps to determine how a series of events commonly occur over time
Virtual Items (Events)
An Association Rule Variant where sales data are augmented. These bits of information are included in the dataset just as though they were an item purchased. Helps us identify patterns that might be missed in the absence of certain characteristics
Statistical Independence
Can be revealed when support is used to test positive vs negative correlation between the two events. For example, when support for (probability of) tea and coffee is 0.15, while the sum of the individual probabilities is 0,18, then there is a negative correlation for statistical independence. This makes sense because humans are often tied to their beverage of choice and these two options are substitutable for one another, making them competitive.
Association Rules for Market Basket Analysis
LHS (left hand side) is the "if" or antecedent RHS is the "then" or the consequent i.e. IF {milk, bread}, then {beer} (read as: milk and bread imply beer.)
Other Evaluation Criteria
On top of support, confidence, and lift, modelers should consider interestingness. Many rules may be obvious like {Maternity Ward} --> {Patient is Female} and thus, are uninteresting and should be filtered. Modelers should also consider actionable. When analyzing clothing purchases, it may be best to run rules based on geographic location, rather than sales data. This is because clothing purchases are highly dependent on weather conditions. Just because short-shorts are mad popular, it does not mean Alaska wants to see them in their stores, nor do they have a use for them (unless.... lol)
Transactional Format
One of the more challenging requirements for association rule mining. Instead of attributes arranged in many columns with a unique identifier per row, a given identifier may occur multiple times. Meaningful rules can only be computed for nominal data.
Market Basket Analysis
One of the most common applications of association rule mining. This approach looks for patterns among items contained in a shopping cart (transaction) Interesting patterns (like beer and diaper transactions co-occurring between 5-7PM) come to light, allowing miners to help businesses figure out how to exploit it.
Confidence
One of the three measures to consider. This measures the STRENGTH of the rule and if the ratio of transactions where the rules is true to the transactions containing the antecedent (LHS). Measures the percentage of baskets that contain the Left-hand products also contained the right-hand products.
Lift
One of the three measures to consider. This compares the strength of the observed rule to what would be expected if there is no RELATIONSHIP between the left and right-hand side. It measures how many times confidence is larger than the expected (baseline) confidence. A value greater than 1 is desirable.
Support
One of the three measures to consider. This measures the RELEVANCE of the rule and refers to the percentage of baskets where the rule was true. (Both LHS and RHS were present)
Applications of Market Basket Analysis
Patterns from previous purchases allow sellers to make recommendations. Sellers can bundle frequently purchased products often bought together. Product placement enhanced.
Limitations of Market Basket Analysis
Requires a large number of real transactions Data's accuracy may be compromised if the products do not occur with similar frequency Market basket analysis can sometimes capture the results of a previous successful marketing campaign, rather than the natural tendencies of customers
Apriori Algorithm
The most common approach for generating association rules which beings by determining which events co-occur with enough frequency to be of interest. it address two sub-problems: Finds all itsemsets then generates and retains strong rules that use items from that itemset. If a frequent itemset has a size 'n' all subsets of size n-1 are also frequent Example: If {diapers, beer} is frequent, then {diapers} and {beer} are also frqeuent.
Valid Association Rules
The point is, support confidence and lift are needed in order to determine a rule's value to the business. The rule must meet a MINIMUM support with a MAXIMUM confidence level, which are determined by the modeler. For example, a rule {Germany} --> {France, Belgium} could have 100% confidence that when someone calls Germany, they will also call France AND Belgium. However, support may suggest this event only occurs one in a blue moon. Thus, is may not make sense to invest in a development of a strategy to capitalize on this rule. It does not mean it should be ignored though. Knowing the events that co-occur that led to the purchase or action can help businesses.
Support, Confidence, and Lift
There are the three numeric measures about Association Rules that must be considered to make effective use of a rule
Caveat about Confidence
When a strong relationship between two products is suggested, a business can exploit this. However, often the probability of, say, buying coffee has dropped given that tea has been bought. This means the two products are actually competing with one another. This can be understood more when "support" is evaluated.