Chapter 6: Frequent Pattern Mining
association rules
Descriptive; discovers links or associations amongst data.
Frequent pattern growth
_______________________ is a method of mining frequent itemsets without candidate generation.
min_sup threshold
a hyperparameter and is highly important
names
another name for identifiers
frequent pattern mining
are there confident (significant) associations between combos? finding patterns, associations, etc.
Eclat method
can store transactional data in a vertical format; which for every item a list of transaction identifiers in which the item occurs are stored
nodes
each piece of the tree
frequent itemsets
form a frequent pattern; an arbitrary combo of items
frequent itemset mining
given a set of all items [I], transactional dataset [T], and a threshold value min_sup, aims to find "frequent itemsets" where support T is atleast min_sup
compressed dataset
has a reduced size
T
how are transactional datasets denoted?
joining the antecedent and the consequent
how do you find support within the association rules?
confidence A-->C = support (AUC)/Support (A)
how is the confidence of a rule denoted?
1- identify all frequent itemsets and their support 2- process items in decreasing order according to their support
how to make a FP- Tree
monotonicity theorem
if an itemset is infrequent, then none of its supersets would be frequent
identifiers
indicate support of given itemset
A --> C
is association denoted
superset
itemsets following the current itemset
Transactional Identifier Set
listing transactions; looking for ratios with support
enumeration tree
looks at how everything is conncected
Apriori, Eclat method, FP-Growth, and Maximal/Closed Frequent Item Sets
methods to mine frequent itemsets
Apriori Algorithm
most common for association rule mining. Finds subsets that are common to at least a minimum number of the item sets. Uses a bottom-up approach Widely used for data mining
Maximal Frequent Itemset
none follow it; is maximal if none of its supersets is frequent
Closed Frequent Itemset
none of its supersets have the same support
lattice
observations from a random process observed over a countable collection of spatial regions
candidate itemsets
possible itemsets
support
ratio between the number of transactions (rows) in which the given itemset is present and the number of all transactions in the data set
items
represent concepts that have more complex definitions and domains
confidence of a rule
the ratio of support of the rule to the support of the antecedent of the rule
finding frequent itemsets, finding association rules, and finding frequent sequences
typical tasks frequent in pattern mining
twitter and facebook
what are examples of social media sites?
costly and time-consuming if data set can hardly fit into memory and candidate patterns are too long
what are the disadvantages of both the apriori algorithm and the eclat method?
single transaction
what are the number of items connected to?
a given item is somehow connected to the given transaction
what do nonempty cells indicate?
a purchase or market basket (transaction)
what do the rows represent?
antecedent of rule
what does A represent for association rules?
consequent of the rule
what does C represent for association rules?
product
what does each item represent?
minimum support
what does min_sup stand for?
compress it into a Frequent Pattern Tree Structure
what does the FP-G do to data?
arbitrary itemset
what does the variable I (capital i) denote?
a small number of itemsets, which will probably not give any new information
what happens if you set min_sup too high?
results in a large number of itemsets being to specific to be considered frequent
what happens if you set min_sup too low?
avoid having to generate and test the support of all possible items (provides a shortcut); narrows things down greatly
what is a pro of the monotonicity theorem/rule?
computational runtimes grow exponentially with the number of items
what is a problem that is faced when doing frequent itemset mining?
amazon
what is an example of a hypermarket?
preferences
what is an example of an item?
Transactional Data
what is the type of data that has nonempty cells called?
very large data sets recorded in hypermarkets and social media sites
what was frequent pattern mining created for?
commercial domains
where is data pattern mining frequently shown?
disadvantage in the apriori method is the entire transactional data set needs to be scanned at every step to count the support of candidate items; esp problematic if the dataset is very large
why was the Eclat Method created?