Data Mining Test 1 - Chapter 2
What are the two steps in the association rule mining process?
1) Find all frequent itemsets 2)Generate strong association rules from the frequent itemsets.
Frequent Itemset
A set of items that appear frequently together in a transaction data set.
Minimum Confidence Threshold
Confidence is defined as the measure of certainty or trustworthiness associated with each discovered pattern.
What input is used for apriori algorithm
D, a database of transactions; min sup, the minimum support count threshold.
What is the output for apriori algorithm
L, frequent itemsets in D.
Frequent Pattern
Patterns that appear frequently in a data set.
Confidence
The rule A ⇒ B has confidence c in the transaction set D, where c is the percentage of transactions in D containing A that also contain B. This
Support
The rule A ⇒ B holds in the transaction set D with support s, where s is the percentage of transactions in D that contain A ∪B (i.e., the union of sets A and B say, or, both A and B).
Minimum support threshold
The support of an association pattern is the percentage of task-relevant data transactions for which the pattern is true.
Apriori Property
all nonempty subsets of a frequent itemset must also be frequent
Ways to improve the efficiency of apriori?
hash-based technique (hashing itemsets into corresponding buckets): transaction reduction(reducing the number of transactions scanned in future iterations): partitioning(partitioning the data to find candidate itemsets): sampling(mining on a subset of the given data): Dynamic itemset counting (adding candidate itemsets at different points during a scan):
What is Apriori?
is a seminal algorithm for mining frequent itemsets for Boolean association rules Find frequent itemsets using an iterative level-wise approach based on candidate generation.
What two kinds of actions does the apriori use?
join and prune