Data Mining Final
Given the transactions in Table 1 and minsup s = 50% and k=3, how many frequent k-itemsets are there?
0
How many clusters are there in the following?
4
Which of the following is not used as the dissimilarity/similarity metric in clustering?
Popularity
Choose all correct statements (Select all that apply)
(Data) Objects belonging to the same cluster are similar to each other. (Data) Objects belonging to different clusters are dissimilar to each other.
Given the following data table (right panel)(i) Calculate the sup of {apple, beer, rice}. (ii) Calculate the conf of {apple → beer}.
(i) sup=25% (ii) conf=75%
Calculate Confidence for {Laptop,Mouse} itemset by considering the dataset above (question 15)
0.28
In the case of the association rule mining, if sup = 20% & conf = 60%, describe what do they mean with example?
20% of the customers buy bread and milk together while each customer has a 60% of chance to buy milk if they bought bread
Suppose that {2, 3, 4} is frequent in a dataset with sup=50%. We can find proper nonempty subsets {2,3}, {2,4}, {3,4}, {2}, {3}, {4}, with sup=50%, 50%, 75%, 75%, 75%, 75% respectively. These generate these association rules: 2,3 → 4 with conf =100%, 2,4 → 3 with conf =100%, 3,4 → 2 with conf =67%, 2 → 3,4 with conf =67%, 3 → 2,4 with conf =67%, 4 → 2,3 with conf =67%. What would be the percentage of sup that all rules have, ____________%?
50}
ID Sequences S1 {a, b}, {c}, {f}, {g}, {e} S2 {a, d}, {c}, {b}, {a, b, e, f} S3 {a}, {b}, {f}, {e} S4 {b}, {f, g} S5 {b}, {g} Given the set of data above. Calculate Sup and Conf when applying rules {a, b}→ {e}, and {b}→ {g}
60% and 100% 60% and 60%
Calculate Support for {Laptop} itemset by considering the dataset above.
70
Consider the following data transactions: 1 I1, I2, I3, I4, I5, I6 2 I7, I2, I3, I4, I5, I6 3 I1, I8, I4, I5 4 I1, I9, I10, I4, I6 5 I10, I2, I4, I11, I5
<I1>,<I2>,<I4>,<I5>, <I6>,<I1, I4>, <I2, I4>, <I2, I5>, <I4, I5>, <I4, I6>, <I2, I4,I5>
User-based collaborative filtering methods _____ (A) Are based on user's similarity only(B) Are based on item's similarity only(C) Have a complexity that grows linearly with the number of customers and items
A and C
What is the relation between candidate and frequent itemsets?
A frequent itemset must be a candidate itemset
The followings are about the clustering. Find the best match for each of the statements
A link computes the distance between closest elements in clusters -- Single link A link computes the means of all pairwise distances -- Average link A link computers the distance between means of two clusters -- Centroid A link computes the distance between farthest elements in clusters -- complete link
Before any clustering is performed through a hierarchical approach, it is required to determine dissimilarity calculated by a distance function. Which of the following measures do we have for finding dissimilarity between two clusters?
A, B, and C Single-link, complete link, and average link
Select all the CF evaluation metrics. (select all that apply)
Accuracy Learning rate Confidence User satisfaction Communication
A clustering method starts with 1 point and recursively adds two or more appropriate clusters. What are the methods here?
Agglomerative
When we recursively add two or more appropriate clusters in clustering is called the "divisive" type of clustering.
Agree
Input dataset X Initialize the rule r While the termination criterion is not satisfied d=Scan(X) v=FindFrequentPatterns(d,r,o) r=FindAssociationRules(v) End Output r ------ What would be the possible output of r? ____________.
All association rules
What operations are performed on Big data?
All of the above Analytic Semantic Graph Processing
Select the correct statement
All of the above Hierarchical clustering algorithms typically have local objective Partitional algorithms typically have global objective Closeness can be measured by correlation K-means is an exclusive clustering algorithm
Identify the correct statements related to Collaborative Filtering
All the listed options The problem of collaborative filtering is to predict how well a user will like an item that he has not rated given a set of historical preference judgments for a community of users. Predict the opinion the user will have on the different items Recommend the 'best' items based on the user's previous likings and the opinions of like-minded users whose ratings are similar
The key difference between frequent pattern mining and other mining techniques is that the former is focused on finding out
An interesting pattern
Which of the following does not fall in the types of sequential pattern mining algorithms?
Apriori
Which of the following is a frequent pattern mining technique (select all that apply)?
Association rule mining Sequential pattern mining
A simple example that is often used to explain the concept of a mining technique is discovering items that will be purchased together with items already purchased. What is the proper name of the mining technique?
Association rules
Suppose that you are given a set of sequences for sequential pattern mining. If you need to find the complete set of frequent subsequences, it can be a form of
Association rules
What does (a) indicate in the diagram?
CF
Which of the following is not true?
Classification in deep learning is done separately (from the feature extraction)
Choose which data mining task is the most suitable for the following scenario: Determining which tour group is suitable to a new member based on her past location ratings
Clustering
To subdivide a market into a distinct subset of customers where each subset can be targeted with a distinct marketing mix
Clustering
Identify the correct recommendation system's algorithm(s) from given options.
Collaborative filtering
There are two main types of deep learning algorithms? What are they?
Convolutional neural nets Recurrent neural nets
Which of the following is true (select all that apply)?
Deep learning is also called Deep Neural Network Deep learning processes unstructured or unlabeled data
What is the goal of collaborative filtering?
Delivering recommended products or services
TensorFlow is a kind of deep learning technique
Disagree
Let minsup = 20% and minconf = 60%. The following are two examples of class association rules: Student, School → Education game → Sport According to the mining class association rules (CAR), what would be the sup and conf for both of these rules, respectively?
First part: Student, school --> education sup=29% conf=100% Second Part: game --> Sport sup=29% conf=67%ID Sequences S1 {a, b}, {c}, {f}, {g}, {e} S2 {a, d}, {c}, {b}, {a, b, e, f} S3 {a}, {b}, {f}, {e} S4 {b}, {f, g} S5 {b}, {g}
What is the output of the following algorithm?
Frequent itemset involving rare items
Select True or False
How to avoid the rare item problem? The value of support must be lower. HW 3 Question 18
Can you select some examples that do not require cluster analysis?
If we want to have class label information in the classification Dividing the students into different registration groups alphabetically, by the last name Graph partitioning
Select from the following that uses a proprietary semantic analyzer and data/knowledge graph to deliver expert-level data for on-page and on-site optimization. (case-sensitive)
InLinks
What does Apriori algorithm do?
It mines all frequent patterns through pruning rules with lesser Support
What do you think about the exclusive clustering approach?
It stipulates that each data object can only exist in one cluster It does not overlap Data grouped exclusively K-means LAst picture
Which of the following algorithm is less expensive and scalable?
Item-to-Item Collaborative Filtering
After going through the following algorithm, identify the correct clustering approach
K-means partitional
What would be of d if q=1?
Manhattan distance
In MapReduce, which of the following is true?
Map >Combine >Reduce
Which of the following is a programming model of Bigdata?
Mapreduce
Which of the following is direct application of frequent itemset mining?
Market Basket Analysis
Clustering deals with finding a structure in a collection of labeled data.
No
What do you mean by Support (A)?
Number of transactions containing A / Total number of transactions
The following formula is used in a clustering approach. What is to achieve here mainly?
Objective function
Enumerate all possible ways of partitioning the points into clusters and evaluate the 'goodness' of each potential set of clusters by using _______
Objective function-based clustering
Which big data analytics technique uses historical data to predict future outcomes?
Predictive analytics
Matching: classify each of the statements
Question 18 in class test
K-means clustering open ended
Question 19 in class test
Dissimilarity of data objects open ended
Question 20 In class tets
The characteristics of Big Data can be defined by at least 3-scale. In the following, classify the phrases/statements according to the meaning of the 3-scale.
Question 7 L10QQ
Given the following database and rules. Consider min conf = 60% in database D. Apply the Apriori algorithm to find accepted and rejected rules from the following rules.
R1, R2, R5, R6 - ACCEPTEDR3, R4 - REJECTED
Beak detector in CNN can represent a small region with fewer parameters
Right
Clusters can be characterized by noise and outliers.
Right
Tell a name of an association rule mining algorithm.
Selected apriori -- Corect answer none?
A key issue of frequent pattern mining is the "order" of transactions, which is called
Sequential pattern
Choose which data mining task is suitable for the following scenario: first, buy digital camera, then buy large SD memory cards
Sequential pattern analysis
See the diagram and identify the types of clusters in (ii)
Soft clustering
There are some challenges with the association rule mining. Sometimes it accepts that all items in the dataset have similar frequencies. For example, if the frequencies of items vary a great deal, we will encounter two problems. What are they? (select all that apply)
Some rare items will not be found Some frequent items will be associated with one another in all possible ways
Which of the following is the clustering requirement?
Summarization (size and shape)
Support is an indication of how frequently the items appear in the database, expressed by:
Sup=Pr(X ∪ Y)
Recommender systems can be defined as
Systems that evaluate quality based on the preferences of others with a similar point of view
Suppose that you observe the following dendrogram once you conduct K-Means clustering analysis on a dataset. Which of the following decision can be drawn from the dendrogram?
The above dendrogram interpretation is not possible for K-means clustering analysis
The type of data Hadoop can deal with, while DBMS cannot, is ________
Unstructured
Why do we need clustering?
Useful in data concept construction Pattern detection
Why do we need clustering? (Select all that apply)
Useful in data concept construction Pattern detection
Which scale of big data may correspond to the data security analysis?
Veracity
Given a set of items I and a set of transactions T, the goal of the problem of the sequential pattern is to discover all the sequences with a minimum support where the minimum support of a sequence is defined as the fraction of all the data sequences that contain the particular sequence.
Yes
In many applications, some items appear very frequently in the data, while others rarely appear.
Yes
Given the following set of market-based transactions.(a) Find two rules that have 60% sup and 75% conf. (b) For the following rules, calculate the sup and conf, respectively.{Diaper} → {Milk,Beer} {Milk} → {Diaper,Beer}
a) {Bread} --> {Milk}, {Diaper} --> {Beer} b) {Diaper} --> {Milk,Beer}: sup = 40%, conf=50% {Milk} --> {Diaper,Beer}: sup=40%, conf=50%
Association rule mainly used to find ________________________.
all co-occurrence relationships
What is/are not the basis that a cluster can be identified as heterogenous or homogeneous?
none of them (different sizes, different shapes, different densities)
T1 1, 3, 4, 7 T2 2, 3, 5 T3 1, 2, 3, 5, 8 T4 2, 5, T5 1, 7 Using the Apriori algorithm, find all k-item frequent itemsets from the following dataset. Consider k=3. First, you need to show all the scanning steps. Then, the final result for k=3 would be ________.
scan T C1 : {1}:3, {2}:3, {3}:3, {4}:1, {5}:3 {7}:2, {8}:1 F1 : {1}:3, {2}:3, {3}:3, {5}:3 C2 : {1,2}, {1,3}, {1,5}, {2,3}, {2,5}, {3,5}, scan T C2 : {1,2}:1, {1,3}:2, {1,5}:1, {2,3}:2, {2,5}:2, {3,5}:2, F2 : {1,3}:2, {2,3}:2, {2,5}:3, {3,5}:2 C3 : {2,3,5} 2. scan T C3 : {2, 3, 5}:2 F3: {2, 3, 5} Resulting k=3 itemset is {2,3,5} HW 3 question 22