Chapter 7 Data Mining Review

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

12. Unsupervised evaluation can be internal or external. Which of the following is an internal method for evaluating alternative clusterings produced by the K-Means algorithm? a. Use a production rule generator to compare the rule sets generated for each clustering. b. Compute and compare class resemblance scores for the clusters formed by each clustering. c. Compare the sum of squared error differences between instances and their corresponding cluster centers for each alternative clustering. d. Create and compare the decision trees determined by each alternative clustering. (7.7)

Compare the sum of squared error differences between instances and their corresponding cluster centers for each alternative clustering

4. The hypothesis of no significant difference. a. nil b. invalid c. null d. void

null

3. If a real-valued attribute is normally distributed, we know that approximately 95% of all attribute values lie within a. one standard deviation of the mean. b. two standard deviations of the mean. c. three standard deviations of the mean. d. four standard deviations of the mean.

two standard deviations of the mean

6. Data used to optimize the parameter settings of a supervised learner model. a. training b. test c. verification d. validation

validation

7. We have performed a supervised classification on a dataset containing 100 test set instances. Eighty of the test set instances were correctly classified. The 95% test set accuracy confidence boundaries are: a. 76% and 84% b. 72% and 88% c. 78% and 82% d. 70% and 90%

72% and 88%

10. The correlation between the number of years an employee has worked for a company and the salary of the employee is 0.75. What can be said about employee salary and years worked? a. There is no relationship between salary and years worked. b. Individuals that have worked for the company the longest have higher salaries. c. Individuals that have worked for the company the longest have lower salaries. d. The majority of employees have been with the company a long time. e. The majority of employees have been with the company a short period of time.

Individuals that have worked for the company the longest have higher salaries

5. A decision tree is built to determine individuals likely to default on an unsecured loan. The null hypothesis states that an individual will not default on the loan. The decision tree correctly classifies 80% of the instances in a test dataset. Fifteen percent of the mistakes made by the model are type 1 errors. What can be said about the performance of the model? a. The accuracy of the model for correctly determining those individuals who did not default on their loan was at least 75%. b. The accuracy of the model for correctly determining those individuals who defaulted on their loan was at least 75%. c. The majority of errors made by the model accepted individuals who defaulted. d. The majority of errors made by the model rejected individuals who did not default. e. More than one of a,b,c or d is correct. (7.3)

The majority of errors made by the model rejected individuals who did not default

9. We have built and tested two supervised learner modelsM1 and M2. We compare the test set accuracy of the models using the classical hypothesis testing paradigm using a 95% confidence setting. The computed value of P is 2.53. What can we say about this result? a. Model M1 performs significantly better than M2. b. Model M2 performs significantly better than M1. c. Both models perform at the same level of accuracy. d. The models differ significantly in their performance. e. More than one of a,b,c or d is correct. (7.5)

The models differ significantly in their performance

2. The standard error is defined as the square root of this computation. a. The sample variance divided by the total number of sample instances. b. The population variance divided by the total number of sample instances. c. The sample variance divided by the sample mean. d. The population variance divided by the sample mean.

The sample variance divided by the total number of sample instances

8. Bootstrapping allows us to a. choose the same training instance several times. b. choose the same test set instance several times. c. build models with alternative subsets of the training data several times. d. test a model with alternative subsets of the test data several times. (7.4)

choose the same training instance several times

13. The average squared difference between classifier predicted output and actual output. a. mean squared error b. root mean squared error c. mean absolute error d. mean relative error

mean squared error

1. Selecting data so as to assure that each class is properly represented in both the training and test set. a. cross validation b. stratification c. verification d. bootstrapping (7.2)

stratification

11. The correlation coefficient for two real-valued attributes is -0.85. What does this value tell you? a. The attributes are not linearly related. b. As the value of one attribute increases the value of the second attribute also increases. c. As the value of one attribute decreases the value of the second attribute increases. d. The attributes show a curvilinear relationship. (7.6)

As the value of one attribute decreases the value of the second attribute increases


Set pelajaran terkait

smartbook Wk 4 - Apply: Summative Assessment: Washburn Guitars: Pricing Decisions and Diversity

View Set

The Bluest Eye Test: Spring ROBISON

View Set