Data Mining Final

Ace your homework & exams now with Quizwiz!

Given the transactions in Table 1 and minsup s = 50% and k=3, how many frequent k-itemsets are there?

How many clusters are there in the following?

Which of the following is not used as the dissimilarity/similarity metric in clustering?

Popularity

Choose all correct statements (Select all that apply)

(Data) Objects belonging to the same cluster are similar to each other. (Data) Objects belonging to different clusters are dissimilar to each other.

Given the following data table (right panel)(i) Calculate the sup of {apple, beer, rice}. (ii) Calculate the conf of {apple → beer}.

(i) sup=25% (ii) conf=75%

Calculate Confidence for {Laptop,Mouse} itemset by considering the dataset above (question 15)

0.28

In the case of the association rule mining, if sup = 20% & conf = 60%, describe what do they mean with example?

20% of the customers buy bread and milk together while each customer has a 60% of chance to buy milk if they bought bread

Suppose that {2, 3, 4} is frequent in a dataset with sup=50%. We can find proper nonempty subsets {2,3}, {2,4}, {3,4}, {2}, {3}, {4}, with sup=50%, 50%, 75%, 75%, 75%, 75% respectively. These generate these association rules: 2,3 → 4 with conf =100%, 2,4 → 3 with conf =100%, 3,4 → 2 with conf =67%, 2 → 3,4 with conf =67%, 3 → 2,4 with conf =67%, 4 → 2,3 with conf =67%. What would be the percentage of sup that all rules have, ____________%?

50}

ID Sequences S1 {a, b}, {c}, {f}, {g}, {e} S2 {a, d}, {c}, {b}, {a, b, e, f} S3 {a}, {b}, {f}, {e} S4 {b}, {f, g} S5 {b}, {g} Given the set of data above. Calculate Sup and Conf when applying rules {a, b}→ {e}, and {b}→ {g}

60% and 100% 60% and 60%

Calculate Support for {Laptop} itemset by considering the dataset above.

Consider the following data transactions: 1 I1, I2, I3, I4, I5, I6 2 I7, I2, I3, I4, I5, I6 3 I1, I8, I4, I5 4 I1, I9, I10, I4, I6 5 I10, I2, I4, I11, I5

User-based collaborative filtering methods _____ (A) Are based on user's similarity only(B) Are based on item's similarity only(C) Have a complexity that grows linearly with the number of customers and items

A and C

What is the relation between candidate and frequent itemsets?

A frequent itemset must be a candidate itemset

The followings are about the clustering. Find the best match for each of the statements

A link computes the distance between closest elements in clusters -- Single link A link computes the means of all pairwise distances -- Average link A link computers the distance between means of two clusters -- Centroid A link computes the distance between farthest elements in clusters -- complete link

Before any clustering is performed through a hierarchical approach, it is required to determine dissimilarity calculated by a distance function. Which of the following measures do we have for finding dissimilarity between two clusters?

A, B, and C Single-link, complete link, and average link

Select all the CF evaluation metrics. (select all that apply)

Accuracy Learning rate Confidence User satisfaction Communication

A clustering method starts with 1 point and recursively adds two or more appropriate clusters. What are the methods here?

Agglomerative

When we recursively add two or more appropriate clusters in clustering is called the "divisive" type of clustering.

Agree

Input dataset X Initialize the rule r While the termination criterion is not satisfied d=Scan(X) v=FindFrequentPatterns(d,r,o) r=FindAssociationRules(v) End Output r ------ What would be the possible output of r? ____________.

All association rules

What operations are performed on Big data?

All of the above Analytic Semantic Graph Processing

Select the correct statement

All of the above Hierarchical clustering algorithms typically have local objective Partitional algorithms typically have global objective Closeness can be measured by correlation K-means is an exclusive clustering algorithm

Identify the correct statements related to Collaborative Filtering

All the listed options The problem of collaborative filtering is to predict how well a user will like an item that he has not rated given a set of historical preference judgments for a community of users. Predict the opinion the user will have on the different items Recommend the 'best' items based on the user's previous likings and the opinions of like-minded users whose ratings are similar

The key difference between frequent pattern mining and other mining techniques is that the former is focused on ﬁnding out

An interesting pattern

Which of the following does not fall in the types of sequential pattern mining algorithms?

Apriori

Which of the following is a frequent pattern mining technique (select all that apply)?

Association rule mining Sequential pattern mining

A simple example that is often used to explain the concept of a mining technique is discovering items that will be purchased together with items already purchased. What is the proper name of the mining technique?

Association rules

Suppose that you are given a set of sequences for sequential pattern mining. If you need to find the complete set of frequent subsequences, it can be a form of

Association rules

What does (a) indicate in the diagram?

Which of the following is not true?

Classification in deep learning is done separately (from the feature extraction)

Choose which data mining task is the most suitable for the following scenario: Determining which tour group is suitable to a new member based on her past location ratings

Clustering

To subdivide a market into a distinct subset of customers where each subset can be targeted with a distinct marketing mix

Clustering

Identify the correct recommendation system's algorithm(s) from given options.

Collaborative filtering

There are two main types of deep learning algorithms? What are they?

Convolutional neural nets Recurrent neural nets

Which of the following is true (select all that apply)?

Deep learning is also called Deep Neural Network Deep learning processes unstructured or unlabeled data

What is the goal of collaborative filtering?

Delivering recommended products or services

TensorFlow is a kind of deep learning technique

Disagree

Let minsup = 20% and minconf = 60%. The following are two examples of class association rules: Student, School → Education game → Sport According to the mining class association rules (CAR), what would be the sup and conf for both of these rules, respectively?

First part: Student, school --> education sup=29% conf=100% Second Part: game --> Sport sup=29% conf=67%ID Sequences S1 {a, b}, {c}, {f}, {g}, {e} S2 {a, d}, {c}, {b}, {a, b, e, f} S3 {a}, {b}, {f}, {e} S4 {b}, {f, g} S5 {b}, {g}

What is the output of the following algorithm?

Frequent itemset involving rare items

Select True or False

How to avoid the rare item problem? The value of support must be lower. HW 3 Question 18

Can you select some examples that do not require cluster analysis?

If we want to have class label information in the classification Dividing the students into different registration groups alphabetically, by the last name Graph partitioning

Select from the following that uses a proprietary semantic analyzer and data/knowledge graph to deliver expert-level data for on-page and on-site optimization. (case-sensitive)

InLinks

What does Apriori algorithm do?

It mines all frequent patterns through pruning rules with lesser Support

What do you think about the exclusive clustering approach?

It stipulates that each data object can only exist in one cluster It does not overlap Data grouped exclusively K-means LAst picture

Which of the following algorithm is less expensive and scalable?

Item-to-Item Collaborative Filtering

After going through the following algorithm, identify the correct clustering approach

K-means partitional

What would be of d if q=1?

Manhattan distance

In MapReduce, which of the following is true?

Map >Combine >Reduce

Which of the following is a programming model of Bigdata?

Mapreduce

Which of the following is direct application of frequent itemset mining?

Market Basket Analysis

Clustering deals with finding a structure in a collection of labeled data.

What do you mean by Support (A)?

Number of transactions containing A / Total number of transactions

The following formula is used in a clustering approach. What is to achieve here mainly?

Objective function

Enumerate all possible ways of partitioning the points into clusters and evaluate the 'goodness' of each potential set of clusters by using _______

Objective function-based clustering

Which big data analytics technique uses historical data to predict future outcomes?

Predictive analytics

Matching: classify each of the statements

Question 18 in class test

K-means clustering open ended

Question 19 in class test

Dissimilarity of data objects open ended

Question 20 In class tets

The characteristics of Big Data can be defined by at least 3-scale. In the following, classify the phrases/statements according to the meaning of the 3-scale.

Question 7 L10QQ

Given the following database and rules. Consider min conf = 60% in database D. Apply the Apriori algorithm to find accepted and rejected rules from the following rules.

R1, R2, R5, R6 - ACCEPTEDR3, R4 - REJECTED

Beak detector in CNN can represent a small region with fewer parameters

Right

Clusters can be characterized by noise and outliers.

Right

Tell a name of an association rule mining algorithm.

Selected apriori -- Corect answer none?

A key issue of frequent pattern mining is the "order" of transactions, which is called

Sequential pattern

Choose which data mining task is suitable for the following scenario: first, buy digital camera, then buy large SD memory cards

Sequential pattern analysis

See the diagram and identify the types of clusters in (ii)

Soft clustering

There are some challenges with the association rule mining. Sometimes it accepts that all items in the dataset have similar frequencies. For example, if the frequencies of items vary a great deal, we will encounter two problems. What are they? (select all that apply)

Some rare items will not be found Some frequent items will be associated with one another in all possible ways

Which of the following is the clustering requirement?

Summarization (size and shape)

Support is an indication of how frequently the items appear in the database, expressed by:

Sup=Pr(X ∪ Y)

Recommender systems can be defined as

Systems that evaluate quality based on the preferences of others with a similar point of view

Suppose that you observe the following dendrogram once you conduct K-Means clustering analysis on a dataset. Which of the following decision can be drawn from the dendrogram?

The above dendrogram interpretation is not possible for K-means clustering analysis

The type of data Hadoop can deal with, while DBMS cannot, is ________

Unstructured

Why do we need clustering?

Useful in data concept construction Pattern detection

Why do we need clustering? (Select all that apply)

Useful in data concept construction Pattern detection

Which scale of big data may correspond to the data security analysis?

Veracity

Given a set of items I and a set of transactions T, the goal of the problem of the sequential pattern is to discover all the sequences with a minimum support where the minimum support of a sequence is deﬁned as the fraction of all the data sequences that contain the particular sequence.

Yes

In many applications, some items appear very frequently in the data, while others rarely appear.

Yes

Given the following set of market-based transactions.(a) Find two rules that have 60% sup and 75% conf. (b) For the following rules, calculate the sup and conf, respectively.{Diaper} → {Milk,Beer} {Milk} → {Diaper,Beer}

a) {Bread} --> {Milk}, {Diaper} --> {Beer} b) {Diaper} --> {Milk,Beer}: sup = 40%, conf=50% {Milk} --> {Diaper,Beer}: sup=40%, conf=50%

Association rule mainly used to find ________________________.

all co-occurrence relationships

What is/are not the basis that a cluster can be identified as heterogenous or homogeneous?

none of them (different sizes, different shapes, different densities)

T1 1, 3, 4, 7 T2 2, 3, 5 T3 1, 2, 3, 5, 8 T4 2, 5, T5 1, 7 Using the Apriori algorithm, find all k-item frequent itemsets from the following dataset. Consider k=3. First, you need to show all the scanning steps. Then, the final result for k=3 would be ________.

scan T C1 : {1}:3, {2}:3, {3}:3, {4}:1, {5}:3 {7}:2, {8}:1 F1 : {1}:3, {2}:3, {3}:3, {5}:3 C2 : {1,2}, {1,3}, {1,5}, {2,3}, {2,5}, {3,5}, scan T C2 : {1,2}:1, {1,3}:2, {1,5}:1, {2,3}:2, {2,5}:2, {3,5}:2, F2 : {1,3}:2, {2,3}:2, {2,5}:3, {3,5}:2 C3 : {2,3,5} 2. scan T C3 : {2, 3, 5}:2 F3: {2, 3, 5} Resulting k=3 itemset is {2,3,5} HW 3 question 22

Data Mining Final

Related study sets

Missouri Laws and Rules for Health Insurance

Chapter 8 Chemistry test

Hip Hop History Final 215

AP BIO: Unit 5 Topic Questions

3770 Ch.13

Legal Test 1

IS205 Chapter 9

Polymorphism in Java

PSC 201 - Ch. 14 & 15

Chapter 16 test

Becker, RC, Quiz ; ch 4

Unit 1 Technology Terminology

Policy 310: Exam 3

NF.1 PrepU: CH. 1- Nursing Foundations

Quiz 5

Insurance Quizzes

OCHEM Lab Midterm review

Crim 155 Final

Finance Chapter 8- Risk and Rates of Return

DEVELOPMENTAL PSYCH FINAL (TCU)