Ch 14 - Association Rules and Collaborative Filtering

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Apriori Algorithm

- Apriori algorithm is a fast way of finding frequent itemsets Generating frequent items sets for K products: - User sets a minimum support criterion - Next, generate list of one-item sets that meet the support criterion - Use the list of one-item sets to generate list of two-item sets that meet the support criterion - Use list of two-item sets to generate list of three-item sets - Continue up through k-item sets

Association Rules/Market Basket Analysis

- Association rules (or affinity analysis, or market basket analysis) produce rules on associations between items from a database of transactions - Widely used in recommender systems - Most popular method is Apriori algorithm - To reduce computation, we consider only "frequent" item sets (=support) - Performance of rules is measured by confidence and lift - Can produce a profusion of rules; review is required to identify useful rules and to reduce redundancy

Process of rule selection

- Generate all rules that meet specified support & confidence - Find frequent item sets (those with sufficient support - see above) - From these item sets, generate rules with sufficient confidence

caution: the role of chance

- Random data can generate apparently interesting association rules. - The more rules you produce, the greater this danger. - Rules based on large numbers of records are less subject to this danger.

Generating Rules

Terms: - "IF" part = antecedent - "THEN" part = consequent - "Item set" = the items (e.g., products) comprising the antecedent or consequent - Antecedent and consequent are disjoint (i.e., have no items in common) Many rules are possible: - For example: Transaction 1 supports several rules, such as - "If red, then white" ("If a red faceplate is purchased, then so is a white one") - "If red and white, then green" - Frequent items sets should be the only considered solutions - the criteria for frequent is "support" Support: - Support for an itemset = # (or percent) of transactions that include an itemset - ex: support for the item set {red, white} is 4 out of 10 transactions, or 40% - Support for a rule = # (or percent) of transactions that include both the antecedent and the consequent

Lift - measure of rule performance

confidence/(benchmark confidence) Lift > 1 indicates a rule that is useful in finding consequent items sets (i.e., more useful than just selecting transactions randomly) Lift ratio shows how effective the rule is in finding consequents (useful if finding particular consequents is important)

Confidence - measure of rule performance

no. transaction with both antecedent and consequent itemsets / no. transaction with antecedent itemset Confidence shows the rate at which consequents will be found (useful in learning costs of promotion)

Benchmark confidence - measure of rule performance

no. transaction with consequent items sets / no. transaction in database


Kaugnay na mga set ng pag-aaral

Business Statistics: Describing Data

View Set

NCLEX-RN Review - Test 7 - The Client with Biliary Tract Disorders

View Set

med term chapter 10 urinary system

View Set

Blood Type Review Worksheet 4/8/16

View Set