Final

Ace your homework & exams now with Quizwiz!

(1) Which of the following is correct about the dissimilarity/similarity metric of "Quality" ? A. (Dis)similarity is expressed in terms of a distance function, typically metric: d(i,j) B. The definitions of distance functions are usually rather different for different types of attributes such as nominal vs numerical. C. Weights should be associated with different attributes based on applications and data semantics. D. All of the above.

(2) The idea behind K-means clustering is that a good clastering is one for which the within-cluster variation is as small as possible.

(2) Which of the following should we use to measure things with high dimensions? A. Euclidian Distance B. Manhattan Distance C. Both A and B D. Neither A nor B

(3) Which of the following is NOT the advantage of ANN as a classifier? A. Good predictive ability B. Can capture complex relationships C. High tolerance to noisy data D. Ability to classify trained patterns

(4) Classification methods can be compared and evaluated according to different criteria. Which of the following is NOT one of the criteria? A. Accuracy B. Speed C. Scalability D. Quality

(5) Which of the following are classification methods? A. Decision tree-based methods B. Rule-based methods C. Logistic regression D. All of the above

(6) Which of the following is NOT the advantage of decision tree-based classification? A. Easy to interpret for big-sized trees B. Extremely fast at classifying unknown records C. Accuracy is comparable to other classification techniques for many simple data sets D. Inexpensive to construct

1) An association rule is an implication expression of the form X -> Y, where X and Y are disjoint itemsets. (Tan Chapter 6.1, Slide 351, Textbook page 329)

1) In K-means clustering, we seek to partition the observations into a pre-specified number of clusters.

1) In what terms is the strength of the association rule measured with? (Tan Chapter 6.1, Slide 351, Textbook page 329) a) Support b) Confidence c) Both A and B d) None of the above

1. A brute force approach for finding frequent itemsets is to determine the support count for every: a. Rule generation b. Binary representation c. Candidate Itemset d. Support-based pruning

1. A tree structure called a dendrogram is commonly used to represent the process of hierarchical clustering.

1. An agglomerative hierarchical clustering method uses a top-down strategy

1. An item can be treated as a binary variable whose value is one if the item is present in a transaction and zero otherwise.

1. Association analysis is useful for discovering interesting relationships hidden in large data set.

1. Attribute subset selection does not reduce the data set size by removing irrelevant or redundant attributes (or dimensions). Han Ch 3 page 104

1. Because the presence of an item in a transaction is often considered more important than its absence, an item is an ___________ binary variable. Tan Ch 6.1 page 328 a) Asymmetric b) Symmetric c) Null d) None of the above

1. Candidate Pruning eliminates some of the candidate k-item sets using the support-based pruning strategy. Tan Ch 6.2 page 360

1. Cluster analysis is sometimes referred to as? (a) Supervised classification (b) Unsupervised classification (c) Segmentation (d) Partitioning

1. Clustering is the process of grouping a set of data objects into multiple groups so that objects within a cluster have high similarity, but very dissimilar to objects in other clusters.

1. Descriptive Modeling is a classification model that can serve as an explanatory tool to distinguish between objects of different: Tan Ch 4.1 page 145 a) Classes b) Predictive models c) Decision trees d) None of above

1. Ensemble methods are techniques for improving classification accuracy by aggregating the predictions of multiple classifiers.

1. Lowering the support threshold often results in less itemsets being declared as frequent.

1. Support is an important measure because a rule that has very low support may occur simply by: Tan, Chapter 6, Page 330 A. Chance B. Property C. Condition D. Relationship

1. The test condition for a binary attribute does not generate two potential outcomes. Tan Ch 4.3 page 157

1. Two of the most prominent proto-type based clustering technique are known as K-means and K-medoid.

1. What are the two steps of classification? a. learning step and a classification step b. classification step and optimization c. Optimization and regression d. K-means and clustering

1. What is a way to reduce the computational complexity of frequent itemset generation? a. Reduce number of Comparisons b. Reduce principle c. Rule generation d. Reduce property i. Reference: Tan Ch, 6 Pg.333

1. Whose objective is it to find all the item sets that satisfy the minsup threshold? a. Rule generation b. Frequent Itemset Generation c. Neither d. Both

1. ________ is the process of partitioning a set of data objects (or observations) into subsets Han pg 444 a) clustering b) data segmentation c) sification d) outlier detection

1. _________ is useful for discovering interesting relationships hidden in large data sets. A. Association analysis B. Market basket transaction C. Asymmetric D. Itemset

10. Apriori is the first ______ _______ ______algorithm that pioneered the use of support-based pruning to systematically control the exponential growth of candidate item sets. Tan Ch 6.2 page 357 a) Association rule mining b) Random selection rule c) Linear regression rule d) None of above

10. Which one is incorrect about bagging? a. The sampling can't be done with replacement b. It is also known as bootstrap aggregating. c. The sampling is done with replacement d. Some instances may appear several times in the same training set.

100. _________ and _______are two examples of ensemble methods that manipulate their training sets. a. Bagging and Boosting b. Linear; decision tree c. Simple regression; multiple regression d. None of above

101. The simplest and most fundamental version of cluster analysis is partitioning, which organizes the objects of a set into several exclusive groups or clusters. a. Predictive b. Normalization c. Both A & B d. None of above

102. ______________is a linear signal processing technique that, applied to a data vector X, transforms it to a numerically different vector, X0, of wavelet coefficients. a. The discrete wavelet transform (DWT) b. Linear regression c. Decision tree d. None of above

11. Ensemble method and classifier combination method are the same.

11. Which of the following eliminates some of the candidate k-item sets using the support-based pruning strategy: Tan Ch 6.2 page 360 a) Candidate Pruning b) Random selection c) Canceling Pruning d) None of above

11. Which one is not an attribute selection measure? a. Induction b. Information gain c. Minimum description length d. Multivariate splits

12. Candidate Pruning eliminates some of the candidate k-item sets using the: Tan Ch 6.2 page 360 a) Support-based pruning strategy b) Principal components analysis c) Data Analysis d) None of above

12. Which one is incorrect about tree pruning? a. Pruned trees tend to be larger and more complex b. It is a method that addresses the problem of overfitting the data c. Pruned trees are easier to comprehend d. Prepruning and postpruning are two

13. A decision tree has which of the following nodes? a. Root node b. Internal nodes c. Leaf/terminal nodes d. All of the above

13. Model generated by a learning algorithm should fit both the input data well and correctly predict the class labels of records it has never seen before.

14. A linear SVM is often known as a a. Maximal margin classifier b. Linear Decision Boundary c. Minimal margin classifier d. Structural risk minimization

14. The provision of a training set is the first step in classification

15. Most classification algorithms seek models that attain the highest accuracy

15. Which of the following is a component(s) of Bias-Variance Decomposition? a. Bias b. Variance c. Noise d. All of the above

16. Training tuples in the learning step are randomly sampled from the database under analysisT

16. Which of the following is an example of ensemble methods a. Bagging b. Boosting c. Random Forests d. All of the above

17. If the class labeled of each training is provided, this step is also known as supervised learning.

17. Which of the following is a method for constructing the ensemble of classifiers? a. By manipulating the training set b. By manipulating the input features c. By manipulating the class labels d. By manipulating the learning algorithm e. All of the above

18. Data classification consists of which of the following step. a. Learning step b. Classification step c. Both A and B d. None of the Above

18. Unsupervised learning is also known as clustering

19. Decision boundaries with large margins tend to have better generalization errors than those with small margins.

19. The target function is also known informally as a ________ model? a. Descriptive b. Classification c. Target d. Predictive

2) Apriori is the first association rule mining algorithm that pioneered the use of support-based pruning to systematically control the exponential growth of candidate itemsets. (Tan Chapter 6.2, Slide 357, Textbook page 335)

2) Which of the following distance algorithms terminates the clustering process when the distance between the nearest clusters exceeds a use-defined threshold? a) Nearest-neighbor clustering algorithm b) Single-linkage algorithm c) Farthest-neighbor clustering algorithm d) Complete-linkage algorithm

2) Which of the following is a basic clustering method(s)? (Han Chapter 10.1, Slide 485-487, Textbook page 448-450) a) Partitioning method b) Hierarchical method c) Density-based method d) Grid-based method e) All of the above

2. A discrete wavelet transform or DWT is a linear signal processing technique that, when applied to a data vector X, transforms it to a numerically different vector X'.

2. A divisive hierarchical clustering method employs a bottom up strategy.

2. Cluster analysis or simply clustering is the process of partitioning a set of data objects into, Han, Chapter 10, Page 444 A. Objects B. Search C. Subsets D. Diagnosis

2. Clustering is also called data segmentation in some applications.

2. Clustering is the process of partitioning a set of data objects (or observations) into subsets

2. In association analysis, a collection of zero or more items is termed an ____ A. Itemset B. Support Count C. Transaction D. Association Tan;

2. Multiple linear regression an extension of (simple) linear regression, which allows a response variable, y, to be modeled as a linear function of two or more predictor variables. Han Ch 3 page 106

2. Multiple linear regression is an extension of (simple) linear regression. Han Ch 3 page 106

2. One method for constructing an ensemble classifier is by manipulating the learning algorithm.

2. Ordinal attribute values cannot be grouped as long as the grouping does violate the proper order of the attribute values. Tan Ch 4.3 page 157

2. Predictive Modeling is a classification model that can also be used to predict the class label of:. Tan Ch 4.1 page 146 a) Unknown records b) Linear regression c) Decision tree d) Classes

2. The greater the similarity within a group and the greater the difference between groups, the better or more distinct the clustering.

2. What is the process of determining the frequency of occurrence for every candidate itemset that survives the candidate pruning step of the apriori-gen function. a. Support counting b. Support threshold c. Dimensionality d. Brute-force method

2. When the term classification is used without any qualification within data mining, it typically refers to supervised classification.

2. Which is not an affect of the computational complexity of the Apriori, algorithm? a. Support Threshold b. Number of Items c. Number of Transaction d. Average length i. Reference: Tan Ch.6 Pg 346

2. Which itemset generation is part of the Apriori algorithm? a. Level wise algorithm b. Generate and test c. Both

2. Which of the following are not attribution selection measures? a. Information Gain b. Gain Ratio c. Gini Index d. Splitting Index

2. Which of these is not a type of clustering? (a) Hierarchical/Partitional (b) Exclusive/Overlapping/Fuzzy (c) Complete/Partial (d) None of the above

2. ____________________ whose objective is to find all the item sets that satisfy the minsup threshold. Tan Ch 6.1 page 353 a) Frequent Itemset Generation b) Partial Itemset Generation c) None of above

2. an agglomerative hierarchical clustering algorithm that uses the minimum distance measure is also called analysis: Basic Concepts and Methods minimal spanning tree algorithm, where a spanning tree of a graph is a tree that connects all vertices, and a minimal spanning tree is the one with the least sum of edge weights.

20. The margin of the decision boundary is given by the distance between these two hyperplanes.

20. The target function is also known informally as a ________ model? a. Descriptive b. Classification c. Target d. Predictive

21. In Hunt's algorithm, a decision tree is grown in a recursive fashion by partitioning the training records into successively purer subsets.

21. Which of these is an attribute type: a. Binary b. Ordinal c. Nominal d. all of the above

22. In a decision tree, a test condition for a binary attribute generates two potential outcomes.

23. In SVM, decision boundaries with small margins tend to have better generalization errors than those with large margins.

24. Classifiers that produce decision boundaries with small margins are therefore more susceptible to model overfitting.

25. Accuracy is number of correct predictions divided by total number of predictions?

26. Ensemble method presents techniques for improving classification accuracy by aggregating the predictions of multiple classifier.

27. Oversampling is decreasing the number of negative Tuples

27. What does each classification technique employ? a. Learning algorithm b. Code c. Calculus d. Loops

28. The test condition for a binary attribute generates two potential outcomes.

29. Ordinal attribute values can be grouped as long as the grouping does not violate the order property of the attribute values.

3) Clustering is known as unsupervised learning because the class label information is not present. (Han Chapter 10.1, Slide 482, Textbook page 445)

3) Hierarchical clustering is an alternative approach which does not require that we commit to a particular choice of K.

3) Which of the following is a characteristic of the partitioning method for clustering? (Han Chapter 10.1, Slide 487, Textbook page 450) a) It is distance-based b) Uses a multiresolution grid data structure c) May filter out outliers d) None of the above

3) Which of the following is not a way in which cluster analysis studies other techniques and characterize each cluster? a) Summarization b) Reduction c) Compression d) Efficiently finding nearest neighbors

3. A hierarchical clustering method works by grouping data objects into a "tree" of clusters

3. A hierarchical clustering method works by grouping data objects into a hierarchy or "tree" of clusters.

3. A partitional clustering is simply a division of the set of data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset.

3. Bagging is a special case of random forest using all predictors in the dataset

3. Clustering can also be used for outlier detection..

3. Clustering is also known as supervised learning.

3. Evaluation of the performance of a ________ ______ is based on the counts of test records correctly and incorrectly predicted by the model. Tan Ch 4.1 page 149 a) Classification Model b) Decision Tree c) Linear Regression d) None of above

3. In types of clustering partitonal is known as nested and hierarchical is known as unnested.

3. It divides the data into k groups such that each group must contain at least one object. Han, Chapter 10, Page 448 A. Clustering Method B. Partitioning Method C. Hierarchical Method D. Density-Based Method

3. Log-linear models approximate discrete multidimensional probability distributions. Han Ch 3 page 106

3. Multiple linear regression has multiple explanatory variables. Han Ch 3 page 106

3. Prototype-Based clusters are also known as: a. Center-based clusters b. Graph-based clusters c. Density-based clusters d. Well-separated clusters

3. Random forest does not manipulate its input features and uses decision trees as its base classifiers. Tan Ch 4.3 page 157

3. The computational complexity of the Apriori, algorithm can be affected which of the following factor? a. Support Threshhold b. Number of items c. Average width of Transaction d. All of the above

3. This strategy of trimming the exponential search space based on the support measure is known as__________. Pg tan 334 a) support-based pruning b) anti-monotone c) Monotonicity Property d) level-wise

3. What is a decision tree in the classification process? a. A decision tree is a flowchart-like tree structure. b. It's a training tulip tool that helps analyzing data. c. It is a root node in a flowchart-like tree structure. d. None of the above

3. Which is not one of the 3 simple techniques involved in cluster analysis? a. Kmeans b. Agglomerative Hierarchial c. Conglomerative Hierarchial d. Dbscan i. Reference: Tan Ch.8 Pg495

3. _______ is the process of determining the frequency of occurrence for every candidate itemset that survives the pruning step. A. Support Counting B. Pruning C. Brute-force D. Candidate Generation Tan; pg 364

3. _______________ whose objective is to extract all the high-confidence rules from the frequent item sets found in the previous step. These rules are called strong rules. Tan Ch 6.1 page 353 a) Rule Generation b) Law Generation c) Sum generation d) None of above

3. m. If the clustering process is terminated when the maximum distance between nearest clusters exceeds a user-defined threshold, it is called a complete-linkage algorithm.

30. Random forest is an ensemble method that manipulates its input features and uses decision trees as its base classifiers.

31. A classification technique that has received considerable attention is: A. Support Virtual Machine B. Support Vector Machine C. Maximum Margin Hyperplanes D. Rationale for Maximum Margin

31. Most classification algorithms seek models that attain the highest accuracy, or equivalently, the lowest error rate when applied to the test set.

32. Each technique employs a learning algorithm to identify a model that best fits the relationship between the attribute set and; a. class label of the input data b. Class label of the output data c. Class label of the predictable data d. Class label of the model data

32. It is the internal nodes that is assigned a class label.

33. All are examples of Decision Tree nodes except? a. A Root Node b. An internal Node c. Leaf or terminal Node d. Temperature Node

33. Decision Boundaries with large margins tend to have better generalization errors than those with small margins.

34. Numeric prediction is where the model constructed predicts a continuous-valued function, or ordered value, as opposed to a class label.

34. The ensemble of classifiers can be constructed in many ways EXCEPT a. By manipulating the training set. b. By manipulating the input features c. By manipulating the bias-variance d. By manipulating the class labels e. By manipulating the learning algorithm

35. The Decision tree starts as multiple nodes.

35. The class can be obtained by taking a majority vote on the individual predictions or by weighting each prediction with the accuracy of the; a. base classifier b. bade method c. base number d. base aggregating

36. A classifier is usually trained to minimize its, a. training boundary b. training variability c. training compositions d. training error

36. An ensemble tends to be less accurate than its base classifiers.

37. Support Vector Classifier (SVC) work with more than 2 predictors

38. Each classification technique employs a ____ to identify a model that best fits the relationship between the attribute set and class label. A. Learning Algorithm B. Regressive model C. Data cleaning technique D. Visualization output

39. A decision tree has which type of nodes? A. Root node B. Internal node C. Terminal node D. All of the above

39. Using Radial Basis Kernel, you get lower error rates than Polynomial kernel functions.

4) Which of the following is a characteristic of the Grid-based method for clustering? (Han Chapter 10.1, Slide 487, Textbook page 450) a) It is distance-based b) Uses a multiresolution grid data structure c) Both A and B d) None of the above

4) Which of the following is a group of three legitimate or correct types of clusters? a) Well-Separated, Graph-Based, Shared-Property b) Well-Separated, Unionized, Density-Based c) Prototype-Based, Unionized, Density-Based d) Prototype-Based, Associated, Shared-Property

4. A Hierarchical clustering method works by grouping data objects into a hierarchy or Han, Chapter 10, Page 457 A. Data Summarization B. Visualization C. Tree of Clusters D. Sub Clusters

4. Bagging and Boosting are two examples of ensemble methods that manipulate their _______ : Tan Ch 4.1 page 149 a) Training sets b) Decision tree c) Simple regression d) None of above

4. Bagging, boosting, and random forests are examples of a. Ensemble methods b. Regression c. Association analysis d. Text mining

4. In a decision tree, each leaf node is assigned a class label.

4. What is another name for clustering? a. Data differentiation C b. Data aggregation c. Data segmentation d. None

4. What is the process of reducing the number of random variables or attributes under consideration? a. Numerosity reduction b. Dimensionality reduction c. Data compression d. Data reduction

4. Which is not a type of cluster? a. Well separated b. Prototype based c. Graph Based d. Nearest i. Reference: Tan Ch. 8 Pg. 493

4. Which one of these is a strategy to decrease the total sum of the squared error (SSE) by increasing the number of cluster? (a) Split a cluster (b) Introduce a new cluster centroid (c) All of the above (d) None of the above

4. ______ is the first association rule mining algorithm that pioneered the use of support-based pruning to control the exponential growth of candidate itemsets. A. Apriori B. Regression C. Logistics D. Cluster Tan; pg 357

4. __________ This operation generates new candidate kitemsets based on the frequent (k - l)-itemsets found in the previous iteration. Tan pg 338 a) candidate generation b) anti-monotone c) Monotonicity Property d) level-wise

4. ____________, is the first association rule mining algorithm that pioneered the use of support-based pruning to systematically control the exponential growth of candidate item sets. Tan Ch 6.2 page 357 a) Apriori b) Random selection c) Linear regression d) None of above

40. A linear sum is a classifier that searches for a hyperplane with the largest margins is also known as A. Maximal Margin Classifier B. Capacity C. Structural Risk Minimization D. Universal Approximators

40. An association rule is an implication expression of the form X -> Y, where X and Y are disjoint itemsets.

41. Clustering is known as unsupervised learning because the class label information is not present.

41. _______ and numeric prediction are two major types of prediction problems. A. Classification B. Clustering C. Regression D. Unsupervised Learning

42. The _____ of a classifier on a given test is the percentage of test set tuples that are correctly classified by the classifier. A. Supervised learning B. Unsupervised learning C. Accuracy D. Overfit

42. When the term classification is used without any qualification within data mining, it typically refers to supervised classification.

43. The test condition for a binary attribute does not generate two potential outcomes.

43. What is a classification model used for? a. Descriptive modeling b. Predictive Modeling c. Neither d. Both

44. Ordinal attribute values cannot be grouped as long as the grouping does violate the proper order of the attribute values.

44. Which set of data is used to build classification model? a. Training set b. Test set c. Neither

45. Random forest does not manipulate its input features and uses decision trees as its base classifiers.

46. In Hunt's algorithm, a decision tree is grown in a recursive fashion by partitioning the training records into successively purer subsets.

46. Which of these are examples of Ensemble methods? a. Bagging b. Boosting c. Random Forests d. All of the above

47. The similarity function, K, which is computed in the original attribute space' is known as the kernel function.

47. Which approach does not involve sampling? a. Bootstrapping b. Leave one out c. Thresh hold moving

48. The kernel trick helps to address some of the concerns about how to implement nonlinear SVM. First, we do not have to know the exact form of the mapping function functions used in nonlinear SVM must satisfy a mathematical principle known as Mercers theorem.

48. What is Boosting used for? a. Boosting is used to change the distribution of training sets b. Boosting is used to change the error rates c. Boosting is used to fix certain models

49. All classification approaches have two steps: model construction(learning) and model usage (applying).

49. ______________ is a classification model can serve as an explanatory tool to distinguish between objects of different classes. a) Descriptive Modeling b) Predictive modeling c) Decision tree d) None of above

5) Which of the following is a characteristic of the Density-based method for clustering? (Han Chapter 10.1, Slide 487, Textbook page 450) a) It can find arbitrarily shaped clusters b) Distance-based c) Both A and B d) None of the above

5) Which of the following is a group of three legitimate or correct requirements for clustering in data mining? a) Scalability, small size, ability to deal with noisy data b) Scalability, discovery of clusters with arbitrary Shape, small size c) Ability to deal with noisy data, interpretability and usability, and domain known to determine input parameters d) Ability to deal with different types of attributes, color coordination, and interpretability and usability

5. The ______ state that if an itemset is frequent, then all of its subsets must also be frequent. A. Apriori Principle B. Algorithm C. Pseudocode D. Support Counting Tan; pg 355

5. The discrete wavelet transform is a linear signal processing technique that, applied to a data vector X, transforms it to a numerically different vector, X0, of _____: Tan Ch 3.4 page 100 a) Wavelet coefficients b) Multiple coefficients c) Decision tree coefficients d) None of above

5. The space requirements for K-means are modest because only the data points and Tan, Chapter 8, 505 A. Centroids are deleted B. Centroids are store C. Centroids are small D. Centroids are simple

5. This can be used as a data reduction technique because it allows a large data set to be represented by a much smaller sample. a. Clustering b. Sampling c. Hash Tree d. Pruning

5. Using soft margin approach, how the formulation can be modified to learn a decision boundary that is tolerable to small training errors can be examined.

5. What method uses a nonlinear mapping to transform the original training data into a higher dimension and in this new dimension, it searches for the linear optimal separating hyperplane. a. Bagging b. Boosting c. Random Forest d. SVM

5. Which is not a requirement for clustering in data mining? a. Scalability b. Ability to deal with noisy data c. Deal with different attributes d. Lineararity i. Reference: Han Ch.10 pG. 446

5. Which one of these is a strategy to decrease the number of clusters, while trying to minimize the increase in the total sum of the squared error (SSE)? (a) Disperse a cluster (b) Merge two clusters (c) All of the above (d) None of the above

5. Which partioning method quantize the object space into a ﬁnite number of cells? a. Hierarchical b. Grid based c. Density

5. ___________ This operation eliminates some of the candidate k-itemsets using the support-based pruning strategy. Tan pg 338 a) Candidate Pruning b) sification c) outlier detection d) Scalability

5. ________________. This operation eliminates some of the candidate k-itemset using the support-based pruning strategy. Tan Ch 6.2 page 360 a) Candidate Pruning b) Candidate Generation c) Canceling d) None of above

50. Model construction is to describe a set of predetermined classes.

50. _______________ is a classification model can also be used to predict the class label of unknown records. a. Predictive Modeling b. Linear regression c. Decision tree d. None of above

51. Evaluation of the performance of a classification model is based on the counts of test records correctly and incorrectly predicted by the model. These counts are tabulated in a table known as a______________. a) Confusion matrix b) Normal Metrix c) Simple Metrix d) None of above

51. Model usage is for classifying future or unknown records.

52. The test condition for a binary attribute generates one potential outcomes.

52. _________ and _______are two examples of ensemble methods that manipulate their training sets. a) Bagging and Boosting b) Linear; decision tree c) Simple regression; multiple regression d) None of above

53. Ordinal attribute values cann't be grouped as long as the grouping does not violate the order property of the attribute values.

54. Random forest, which is described in Section 5.6.6, is an ensemble method that manipulates its input features and uses decision trees as its base classifiers.

54. The simplest and most fundamental version of cluster analysis is ________, which organizes the objects of a set into several exclusive groups or clusters. a) Partitioning b) Normalization c) Predictive d) None of above

55. This algorithm is the basis of many existing decision tree induction algorithms. a. C4.5 b. Hunts c. CART d. K-means

56. What kind of decision tree can be used to allow test conditions that involve more than one attribute? a. Oblique b. Skeleton c. Decision boundary d. Gain Ratio

57. Another name for a formal explanation relating the margin of linear classifier to its generalization error is: a. SVM b. CART c. SRM d. Slack variable

58. This is the basic algorithm for: Create a node N; if tuples in D are all of the same class, C, then... return N as a leaf node labeled with the class C; if attribute list is empty then ... return N as a leaf node labeled with the majority class in D; // majority voting a. K-Means b. Clustering c. C4.5 d. Decision Tree

59. Traditionally, learning models assume that data classes are balanced and well distributed, however, in real life, data is class-imbalanced. This is also known as: a. Cost and benefits b. Cross validation c. Class imbalance problem d. boosting

6) Which of the following are examples of cluster analysis applications? i. Biology ii. Wrapping Presents iii. Psychology and Medicine iv. Tying Shoes a) I and II b) I and III c) I, III, and IV d) I, II, III, IV

6) Which of the following is a characteristic of the Hierarchical method for clustering? (Han Chapter 10.1, Slide 487, Textbook page 450) a) It cannot correct erroneous merges or splits b) Distance-based c) Uses a multiresolution grid data structure d) Can find arbitrarily shaped clusters

6. Information gain is not an attribute selection measure.

6. The simplest and most fundamental version of ______ ______ is partitioning, which organizes the objects of a data set into several exclusive groups or clusters. Han Ch 10.2 Page 451 a) Cluster analysis b) Normalization analysis c) Predictive analysis d) None of above

6. This algorithm is sensitive to outliers because such objects are far away from the majority of data and thus, when assigned to a cluster, they can dramatically distort the mean value of the cluster. a. Decision tree b. Clustering c. K-means d. Linear regression

6. Two strategies that decrease the total SSE by increasing the number of clusters are: Tan, Chapter 8, 507 A. Split a cluster B. Introduce a new cluster centroid C. Disperse a cluster D. A and B

6. What type of SVM would you use in order to correctly classify nonlinear data? a. Linear b. Radial c. Polynomial d. B or C

6. Which is not a characteristic of the hierarchial method? a. Clustering is a hierarchical decomposition (i.e., multiple levels) b. Cannot correct erroneous merges or splits c. May incorporate other techniques like micro clustering or "linkages" d. May Filter outliers

6. Which of the following is a data reduction strategy? (a) dimensionality reduction (b) numerosity reduction (c) data compression (d) All of the above

6. _____ is the process of partitioning a set of data objects into subsets. A. Cluster Analysis B. Outlier detection C. Supervised Learning D. Machine Learning

6. __________ works by grouping data objects into a hierarchy or "tree" of clusters. Han pg 457 a) hierarchical clustering b) sification c) outlier detection d) Scalability

6. __________________ (also called the Karhunen-Loeve, or K-L, method) searches for k n-dimensional orthogonal vectors that can best be used to represent the data, where k <= n. Han Ch 3 page 102 a) Principal components analysis b) Data Analysis c) Financial analysis d) None of above

60. Another name for showing interesting relationships between attribute-value pairs that occur frequently in a given data set. a. Classification b. Frequent patterns c. Associative classification d. Association rules

61. Which is not an algorithm parameter? a. D, the complete set of training tuples and associated class labels b. Attribute list c. Attribution selection method d. Attribution method

62. Which is not a scenario for splitting attribute? a. The splitting attribute is Discrete-valued b. The splitting attribute is Continuous-valued c. The splitting attribute is Discrete-valued and Binary tree d. The splitting attribute is Continuous-valued and Binary tree

63. Which is not a method for expressing attributes? a. Binary Attribute b. Nominal Attribute c. Ordinal Attribute d. Discrete Attribute

64. Which is not a popular Ensemble method? a. Bagging b. Boosting c. Bumping d. Random Forest

65. Which is not a method for partitioning? a. Holdout b. Random sampling c. Cross Validation d. Pruning

66. Which is not a type of tree node? a. Root Node b. Internal Node c. Leaf Node d. Stem Node

67. what is SVM? a. Support Vector Machines b. Supply Victory Man c. Supplemental Vaccine Master d. Super Victory Machine

68. What is C or the optimal separating hyperplane in the topic of Support Vector Machines? a. It is the minimum perpendicular distance between each point and these parating line b. It is the crossing value c. It is the distance between triangle and rectangle d. It is the choosing point

69. What is the basic focus of a support vector classifier? a. Separable Hyperplanes b. Hyper x c. sential lining d. Having variables

7) The Apriori Principle states "If an itemset is frequent, then all of its subsets mst also be frequent".

7. Ensemble methods work better with unstable classifiers.

7. What is an item? Tan Ch 6.1 page 328 a) Asymmetric binary variable b) Symmetric binary variable c) Null d) None of the above

7. Which statement about tuple (x,y) is correct? a. x is the attribute set and y is a special attribute b. x and y are attribute sets c. x and y are special attributes d. x is a special attribute and y is the attribute set

70. Common kernel functions DOES NOT include a. Linear b. Polynomial c. Radial Basis d. Visualization tools

71. What are Non-Separating Classes? a. for any straight line or plane drawn there will always be at least some points on the wrong side of the line b. for any classes you have, the support vector classifier gets more complex c. for all the points in the map, they are all different classes and have identities d. for every class, there is a point associated with it and they don't separate

72. The support vector classifier is fairly easy to think about. However it may not be all that powerful. Why? a. because it only allows for a linear decision boundary b. because it is hard for interpret c. because it is outdated d. because it only allows using certain variables, specific computer applications to run

73. What is anti-monotone property? a. Support for an itemset never exceeds the support for its subsets b. A rule against constants c. A constant variable d. A training set

74. When is an association rule? a. An implication expression of the form x -> y where x and y are disjoint item sets b. A rule to constrain variables c. A rule to change variables d. A rule to swap variables

75. What is clustering? a. Grouping variables b. Separating variables c. Making variables close d. Removing outliers

76. What are some of the requirements for cluster analysis? a. Scalability, ability to deal with different types of attributes, ability to deal with noise data b. Outliers, scalability, noise data c. Scalability, extrapolation, noise data d. Noise data, ability to deal with difference

77. In what terms is the strength of the association rule measured with? a. Support b. Confidence c. Both A and B d. None of the above

78. Which one of these is a strategy to decrease the number of clusters, while trying to minimize the increase in the total sum of the squared error (SSE)? a. Disperse a cluster b. Merge two clusters c. All of the above d. None of the above

79. Descriptive Modeling is a classification model that can serve as an explanatory tool to distinguish between objects of different: a. Classes b. Predictive models c. Decision trees d. None of above

8) The main advantage of grid-based clustering methods is the fast processing time/

8. Support vector machines (SVMs) is a method for the classification of both linear and nonlinear data.

8. Which of the following is not a component of Principal components analysis: Han Ch 3 page 102 a) Data analysis b) Financial Analysis c) Decision Tree Analysis d) None of above

8. Which one is not an example of classification technique? a. Induction b. Decision tree c. Rule-based classifiers d. Neural networks

80. Predictive Modeling is a classification model that can also be used to predict the class label of: a. Unknown records b. Linear regression c. Decision tree d. Classes

81. Evaluation of the performance of a ________ ______ is based on the counts of test records correctly and incorrectly predicted by the model. a. Classification Model b. Decision Tree c. Linear Regression d. None of above

82. Bagging and Boosting are two examples of ensemble methods that manipulate their _______. a. Training sets b. Decision tree c. Simple regression d. None of above

83. The discrete wavelet transform is a linear signal processing technique that, applied to a data vector X, transforms it to a numerically different vector, X0, of _____: a. Wavelet coefficients b. Multiple coefficients c. Decision tree coefficients d. None of above

84. The simplest and most fundamental version of ______ ______ is partitioning, which organizes the objects of a data set into several exclusive groups or clusters. a. Cluster analysis b. Normalization analysis c. Predictive analysis d. None of above

85. ________ is the task of learning a target function / that maps each attribute set x to one of the predefined class Iabels y. a. Classification b. Modelling c. descriptive modelling d. regression

86. ________ A classification model can also be used to predict the class label of unknown records. a. Predictive Modeling b. performance metric c. decision tree d. Internal nodes

87. ________each of which has exactly one incoming edge and two or more outgoing edges. a. Internal nodes b. Leaf c. root mode d. exterior nodes

88. _________ where a classification algorithm builds the classifier by analyzing or "learning from" a training set made up of database tuples and their associated class labels. a. learning step b. training set c. attribute vector d. training tuples

89. ________ is constructed to predict class (categorical) labels, such as "safe" or "risky" for the loan application data; "yes" or "no" for the marketing data; or "treatment A," "treatment B," or "treatment C" for the medical data. a. classifier b. predictor c. classification d. learning step

9) Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) clusters large amounts of numeric data with hierarchical clustering and totally foregoes partitional clustering.

9. AdaBoost is short for Adaptive Boosting.

9. These rules are called strong rules: Tan Ch 6.1 page 353 a) Rule Generation b) Law Generation c) Sum generation d) None of above

9. Which one is a classifier that searches for a hyperplane with the largest margin? a. A linear SVM b. Minimal margin classifier c. A non-linear SVM d. Kernel trick

90. _________ is a heuristic for selecting the splitting criterion that "best" separates a given data partition, D, of class-labeled training tuples into individual classes. a. attribute selection measure b. splitting rules c. information gain d. Induction of a decision tree

92. The decision tree has three types of nodes. Which of the following is correct? ① root node ② internal node ③ leaf or terminal node a. ① and ② b. ② and ③ c. ① and ③ d. ①, ②, ③

93. Which of the following are classification methods? a. Decision tree-based methods b. Rule-based methods c. Logistic regression d. All of the above

94. Classification methods can be compared and evaluated according to different criteria. Which of the following is NOT one of the criteria? a. Accuracy b. Speed c. Scalability d. Quality

96. Which of the following is NOT the advantage of decision tree-based classification? a. Easy to interpret for big-sized trees b. Extremely fast at classifying unknown records c. Accuracy is comparable to other classification techniques for many simple data sets d. Inexpensive to construct

97. ______________ is a classification model can serve as an explanatory tool to distinguish between objects of different classes. a. Linear regression b. Predictive modeling c. Both A & B d. None of above

98. _______________ is a classification model can also be used to predict the class label of unknown records. a. Decision tree b. Linear regression c. Both A & B d. None of above

99. Evaluation of the performance of a classification model is based on the counts of test records correctly and incorrectly predicted by the model. These counts are tabulated in a table known as a______________. a. Simple Metrix b. Normal Metrix c. Both A & B d. None of above

A classification model is useful for lots of purposes. Which of the following is correct? ① Descriptive Modeling ② Predictive Modeling ③ Linear Modeling a. ① and ② b. ② and ③ c. only ① d. only ②

A common strategy adopted by many association rule mining algorithms is to decompose the problem into frequent itemset generation and rule generation

A rule can be measured in terms of its support and confidence

Agglomerative and Divisive are the two types of hierarchical clustering

Classification is: a. The task of assigning objects to one of several predefined categories b. Identifying trends c. Removing anomalies d. Adding anomalies

Data classification is a two-step process, the learning step and classification step.

KNN Normally have 3-10 neighbors

KNN is A simple algorithm classifies new cases based on a similarity measure (e.g., distance functions)

SRM stands for structural risk minimization

Support vector machines (SVMs) are a method for classification type? a. Linear data b. Nonlinear data c. All of the above d. None of the above

The apriori principle states that if an itemset is frequent, then all of its subsets must also be frequent.

The learning step in data classification is also known as? a. Training phase b. Training set c. Training tuples d. None of the above

Weaknesses of K-means a. Need to specify k , in advance b. the number of clusters in advance c. all of the above d. none of the above

What are classification techniques most suited for? a. Predicting or describing data sets with binary or nominal categories b. Sorting data c. Finding data d. Extrapolating data

What are some the requirements for cluster analysis? A. Scalability, ability to deal with different types of attributes, ability to deal with noise data B. Outliers, scalability, noise data C. Scalability, extrapolation, noise data D. Noise data, ability to deal with different types of attributes, boosting

What are the requirements for an effective candidate generation procedure? a) It should avoid generating too many unnecessary candidates b) It must ensure that the candidate set is complete c) It should not generate the same candidate itemset more than once d) All of the above

What is a class label attribute? a. A discrete-valued and unordered predefined class determined by another database attribute b. A speed up to data analysis c. A fast method for analyzing data d. A quick look at data

What is a training tuple? a. Individual tuples making up the training set b. A decision tree-based dataset c. A database with huge datasets d. A decision tree that branches off many nodes

What is anti-monotone property? A. Support for an itemset never exceeds the support for its subsets B. A rule against constants C. A constant variable D. A training set

What is clustering? A. Set of clusters resulting from a cluster analysis B. Separating variables C. Making variables close D. Removing outliers

What is one use of clustering? A. Outlier detection B. Separating variables C. Removing data D. Extrapolating data

What is regression analysis? a. A statistical methodology that is most often used for numeric prediction b. A technique that splits sample size c. A technique that increases sample size d. A technique that decreases sample size

What is support based pruning? A. Strategy of trimming the exponential search space based on the support measure B. Trimming data C. Trimming information D. Trimming support variables

When is an association rule? A. An implication expression of the form x -> y where x are disjoint items B. A rule to constrain variables C. A rule to change variables D. A rule to swap variables

Which of these is a type of cluster? (a) Prototype-based (b) Graph-based (c) Density-based (d) all of the above

Which of these is an example of ensemble method? a. Bagging b. Boosting c. Random forests d. All of the above

________ in some applications because clustering partitions large data sets into groups according to their similarity Han pg 445 a) data segmentation b) sification c) outlier detection d) Scalability

what is NOT an example of Numerical Attributes a. temperature b. height c. weight d. people's names

what restriction does Euclidian distance have a. No restriction, direct measurement b. restricted to only use numbers c. restricted to only use strings d. restricted to only use boolean

what restriction does Manhattan distance have a. Restrictions, only move in one direction b. no restrictions c. restrictions, only move in all direction d. restrictions, only move in opposite direction

which of the following is Common Distance Function(s) a. Euclidean Distance b. Manhattan Distance c. all of the above d. none of the above

which of the following is a type(s) of clustering method? a. Partitional Clustering b. Hierarchical clustering c. all of the above d. none of the above

Final

Related study sets

B/HW Ch 5+20, 8, 9, 10, 11, 12, 13

Networking 1 exam review

Darwin's influences and Principles of Evolution

chapter 19

Sociology Clep

INFO 360 - Chapter 5

Mouse & Motorcycle Ch 1-3

Bio

Ancient Greek Philosophers - Unit 1 - CHALLENGE 3: Plato and Aristotle

VTA 170 Test 1 practice

FIN. C.11

Ch 27 - Coronary Vascular Disease

History of Psychology Video Notes

Algebra II Regents Review

Prep for Exam 3

ATI Endocrine questions

Parts of the brain and what each does

Chapter 21 Meghan Jones

Help desk Chapter 6 Help Desk Operation

Chapter 15 Concept recall questions