Chapter 6: Frequent Pattern Mining

¡Supera tus tareas y exámenes ahora con Quizwiz!

association rules

Descriptive; discovers links or associations amongst data.

Frequent pattern growth

_______________________ is a method of mining frequent itemsets without candidate generation.

min_sup threshold

a hyperparameter and is highly important

names

another name for identifiers

frequent pattern mining

are there confident (significant) associations between combos? finding patterns, associations, etc.

Eclat method

can store transactional data in a vertical format; which for every item a list of transaction identifiers in which the item occurs are stored

nodes

each piece of the tree

frequent itemsets

form a frequent pattern; an arbitrary combo of items

frequent itemset mining

given a set of all items [I], transactional dataset [T], and a threshold value min_sup, aims to find "frequent itemsets" where support T is atleast min_sup

compressed dataset

has a reduced size

T

how are transactional datasets denoted?

joining the antecedent and the consequent

how do you find support within the association rules?

confidence A-->C = support (AUC)/Support (A)

how is the confidence of a rule denoted?

1- identify all frequent itemsets and their support 2- process items in decreasing order according to their support

how to make a FP- Tree

monotonicity theorem

if an itemset is infrequent, then none of its supersets would be frequent

identifiers

indicate support of given itemset

A --> C

is association denoted

superset

itemsets following the current itemset

Transactional Identifier Set

listing transactions; looking for ratios with support

enumeration tree

looks at how everything is conncected

Apriori, Eclat method, FP-Growth, and Maximal/Closed Frequent Item Sets

methods to mine frequent itemsets

Apriori Algorithm

most common for association rule mining. Finds subsets that are common to at least a minimum number of the item sets. Uses a bottom-up approach Widely used for data mining

Maximal Frequent Itemset

none follow it; is maximal if none of its supersets is frequent

Closed Frequent Itemset

none of its supersets have the same support

lattice

observations from a random process observed over a countable collection of spatial regions

candidate itemsets

possible itemsets

support

ratio between the number of transactions (rows) in which the given itemset is present and the number of all transactions in the data set

items

represent concepts that have more complex definitions and domains

confidence of a rule

the ratio of support of the rule to the support of the antecedent of the rule

finding frequent itemsets, finding association rules, and finding frequent sequences

typical tasks frequent in pattern mining

twitter and facebook

what are examples of social media sites?

costly and time-consuming if data set can hardly fit into memory and candidate patterns are too long

what are the disadvantages of both the apriori algorithm and the eclat method?

single transaction

what are the number of items connected to?

a given item is somehow connected to the given transaction

what do nonempty cells indicate?

a purchase or market basket (transaction)

what do the rows represent?

antecedent of rule

what does A represent for association rules?

consequent of the rule

what does C represent for association rules?

product

what does each item represent?

minimum support

what does min_sup stand for?

compress it into a Frequent Pattern Tree Structure

what does the FP-G do to data?

arbitrary itemset

what does the variable I (capital i) denote?

a small number of itemsets, which will probably not give any new information

what happens if you set min_sup too high?

results in a large number of itemsets being to specific to be considered frequent

what happens if you set min_sup too low?

avoid having to generate and test the support of all possible items (provides a shortcut); narrows things down greatly

what is a pro of the monotonicity theorem/rule?

computational runtimes grow exponentially with the number of items

what is a problem that is faced when doing frequent itemset mining?

amazon

what is an example of a hypermarket?

preferences

what is an example of an item?

Transactional Data

what is the type of data that has nonempty cells called?

very large data sets recorded in hypermarkets and social media sites

what was frequent pattern mining created for?

commercial domains

where is data pattern mining frequently shown?

disadvantage in the apriori method is the entire transactional data set needs to be scanned at every step to count the support of candidate items; esp problematic if the dataset is very large

why was the Eclat Method created?


Conjuntos de estudio relacionados

Honda Civic Sedan Touring (Upgrades from EX) Trim Differences

View Set

Strategy and Leadership Exam #2 Review CH. 15, 4-7

View Set

Cardio, Shock, PVD Adult 3 Final

View Set

Water: Midterm Multiple Choice Questions

View Set

AP gov unit 3 and supreme court test

View Set

History - Chapter 5 (Sections 4-?)

View Set

Fundamental Accounting Principles, 24e Study

View Set