Association rule mining

¡Supera tus tareas y exámenes ahora con Quizwiz!

FP-growth Algorithm

1) Builds a condensed representation of the data base as an FP-tree. 2) Then uses a recursive divide and conquer approach to mine frequent itemsets.

2 Main bottlenecks of A'priori Alg

1) Can generate very large candidate sets. 2) Must scan the DB multiple times to find support.

2 Stages of A'priori algorithm

1) Candidate generation. 2) Candidate test.

4 factors affecting Apriori complexity

1) Choice of minimum support threshold. Larger means more frequent itemsets to consider. 2) Dimensionality. More dimensions means more space needed to store support counts. 3) DB size. Apriori scans DB multiple times. 4) Maximum transaction width. May increase the max length of frequent itemsets.

2 principles of FP-growth Alg

1) Compress database into a Frequent-Pattern tree. Avoids costly repeated database scans. 2) Use a divide and conquer mining task that breaks mining into smaller sub-tasks. Avoids candidate generation issues of A'priori

FP-Grwoth divide and conquer steps

1) For each frequent item, construct its "conditional pattern base", and then its conditional FP-tree 2) Repeat the process on each newly created conditional FP-tree until the resulting FP-tree is empty, or it contains only one path.

Advantages of FP-Tree structure

1) Scan the DB twice, and only twice 2) Complete: Tree contains all information related to mining frequent patterns 3) Compactness: Tree height is bounded by the maximum number of items in a transaction.

FP-Tree construction steps

1) Scan transaction DB, and find frequent single item patterns. Order them in a list "L" in frequency descending order. (Aka, make histogram of single items from DB) 2) For each transaction, order its frequent items (Do not include infrequent single item sets) according to the order in "L". 3) Scan the DB a second time, and construct the FP-tree by putting each "frequency ordered transaction" into it.

A'priori Algorithm

1) Use frequent k-1 itemsets to generate potential candidates for frequent k itemsets. 2) Scan the database and find support for each potential k itemset. 3) Determine frequent k itemsets by comparing the support of each k itemset with the min support.

Maximal Frequent Itemset

An item set in which none of its immediate supersets is frequent.

Closed Itemset

An itemset in which none of the supersets has the same support as the itemset.

FP-growth stages

FP-Tree -> conditional pattern bases -> conditional FP-Tree -> frequent patterns

Confidence

Fraction of transactions containing both the antecedent and consequent itemsets out of the total number of transactions containing the antecedent itemset. For rule: X -> Y, confidence is | Itemsets with X and Y| / | Itemsets with X|

Support

Fraction of transactions containing itemset. For rule: X -> Y, support is |X| / total number of transactions.

Why do FP-Trees insert items of descending frequency order?

If items were inserted in ascending frequency order, then there would be many more branches at each node. (Answer in FP growth ppt, page 5)

FP-Growth divide and conquer strategy

Recursively grow frequent patterns using the FP-tree: looking for shorter patterns recursively, then concatenating the suffix.

Coverage

The number of rules that a transaction is part of. (Diagram of coverage can be found in my written notes, 10/1)

FP-Tree

Tree built by iterating over all transactions, and adding a branch for each itemset, and incrementing the support count at each node for transactions that have the same sub-path. A linked list for each item is also built by the tree, and maintained in a "Header table".


Conjuntos de estudio relacionados

PEDIATRIC SUCCESS ORTHOPEDIC DISORDERS CHAPTER 12

View Set

Cambridge English Profile Level C1

View Set

Southern Rock: Allman Brothers / Lynyrd Skynyrd / Charlie Daniels

View Set

notecards for lebron james research project

View Set

1232 EAQ 2 Clinical Skills questions

View Set

What pronoun would I use when.....

View Set

Модуль №1. Мед. біо.

View Set