Frequent Data Mining

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

What is lift?

(x to y) = (# of transaction with x and y)/((#of transactions of x)(# of transactions of y)) we compare lift of (x to y) with lift od (x to not y)

What is max-pattern?

An itemset X is a max-pattern if X is frequent and there exists no frequent super=pattern X is subset of Y

What is a closed pattern?

An itemset X is closed pattern if X is frequent and there exists no super-pattern where X is subset of Y with same support as X

What are the concepts of constraint based frequent pattern mining?

Anti-monotonic: If constraint c is violated, its further mining can be terminatedn Monotonic: If c is satisfied, no need to check c again for its further mining. Succinct: we can explicitly and precisely determine if any itemset satisfies theconstraintby examining ifitcontainssome specific items. Convertible: c is not monotonic nor anti-monotonic nor succinct, but it can be converted into it if items in the transaction can be properly ordered

What is the second approach and issues with multilevel rules?

Approach is generate patterns at highest level first and then next highest level etc. The issues is I/O requirements will increase dramatically because we need to perform more passed over the data. You may also miss some potentially interesting cross-level association patterns.

What is the approach for statistics based methods?

First withhold the target attribute from the rest of the data. Second extract frequent itemsets from the rest of the attributes. Binarize the continious attributes (except for target attrbute) Third for each frequent itemset, compute the corresponding descriptive statistics of the target attribute. Frequent itemset becomes a rule by introducing the target variagble as rule consequent. Lastly apply statistical test to determine interestingness of the rule.

What is discretization with interval width?

If interval is too wide then there may merge several disparate patterns or may lose some of the interesting patterns. If interval is too narrow then a pattern might be broken up into smaller patterns that

What are the types of constraints in pattern mining?

Knowledge type Data constraint Dimension/level constraint Interestingness constraint

What are the steps of the Apriori Algorithm?

Let k=1 Generate frequent itemsetsof length 1 Repeat until no new frequent itemsets are identified Generate length (k+1) candidate itemsets from length k frequent itemsets Prune candidate itemsets containing subsets of length k that are infrequent Count the support of each candidate by scanning the transaction table Eliminate candidates that are infrequent, leaving only those that are frequent

What is the first approach and issues with multilevel rules?

The approach is you extend current association rule formulation by augmenting each transaction with higher level items. example: Original Transaction: {skim milk, wheat bread} Augmented Transaction:{skim milk, wheat bread, milk, bread, food} The issues is the items that reside at higher levels have much higher support counts and if support threshold is low then that means too many frequent patterns involving items from the higher levels. Another issue is there is an increased dimensionality of the data.

What is downward closure?

This means that any subset of a frequent itemset must be frequent

chi-square

X² = Σ [(Observed - Expected)² / Expected]Larger X² = more likely the attributes are related Also if supp(Basketball&Cereal) is lower than its expected value, the relation between "play basketball" and "eat cereal" is negative.

What is frequent pattern analysis?

a pattern that occurs frequently in a dataset

What are the different candidate generations?

brute force and fk-1 X f1 method

What are the steps of frequent pattern tree mining?

first construct the tree by scanning the DB once, find frequent 1-itemsets, sort frequent items in frequency descending order, scan db again and construct fp tree second recursively grow frequent patterns by pattrn and database partition the method is for each frequent item, construct its conditional pattern-base and then its conditional FP-tree. repeat the process on eah newl created conditional FP-tree until the resulting FP-tree is empty, or it contains only one path-single path will generate all the combinations of its sub-paths, each of which is a frequent pattern

For multi level association rules, how do support and confidence vary as we traverse the concept hierarchy?

if x is the parent item for both x1 and x2, then supp(x) >= supp(x1) + supp(x2) if supp(x1 and y1) >= minsup, and x is parent of x1, y is parent of y1 then supp(x and y1) >= minsup and supp(x1 and y) >= minsup and supp(x and y) >= minsup if conf(x1 to y1) >= minconf then conf(x1 to y) >= minconf

What is frequent pattern tree mining?

it is a recursive divide and conquer approach to mine the frequent itemsets

What is the time complexity of interval generation? And what is interval generation?

you calculate intervals using k(k-1)/2 where k is number of interval boundaries the execution time if the range is partitioned into k-1 intervals, there are O(k^2) new items


Kaugnay na mga set ng pag-aaral

Course 2: Law of Contracts Final Exam

View Set

Chapter 45 Hormones and Endocrine System Key Concepts

View Set

Algebra 2 Semester 1 Final Exam: Chapter 6.5

View Set

Civil Liberties & Civil Rights Quiz

View Set

LearningCurve 3b. Infancy and Childhood

View Set

Fundamentals of IT and Cybersecurity Chapter 1 Study Guide

View Set

ECO2013 Exam 1 Pearson Questions

View Set