MGT 3604 Final Exam Adaptive Test Prep Questions

Ace your homework & exams now with Quizwiz!

In the k-nearest neighbors method, when the value of k is set to 1, A) The classification or prediction of a new observation is based solely on the single most similar observation from the training set. B) The new observation's class is naïvely assigned to the most common class in the training set. C) The new observation's prediction is used to estimate the anticipated error rate on future data over the entire training set. D) The classification or prediction of a new observation is subject to the smallest possible classification error.

A) The classification or prediction of a new observation is based solely on the single most similar

32. Logistic regression is similar to linear regression, except that it attempts to classify ______________ as a linear function of explanatory variables. A) a categorical response. B) a continuous response. C) categorical predictors. D) independent variables.

A) a categorical response.

17. If the outcome variable is ________, we can use logistic regression as a classification tool. A) binary B) continuous C) a predictor variable D) multivariate

A) binary

40. A _________ chart compares the number of actual Class 1 observations identified if considered in decreasing order of their estimated probability of being in Class 1 and compares this to the number of actual Class 1 observations identified if randomly selected. A) cumulative lift B) classification confusion C) decile-wise lift chart D) ROC curve

A) cumulative lift

15. The choice of __________ affects classification errors. A) cutoff value B) error rate C) accuracy measure D) class

A) cutoff value

37. Instead of Y as outcome variable, in logistic regression, we use a _______ called the logit. A) function of Y B) subset of Y C) class of Y D) classification of Y

A) function of Y

35. The set of recorded values of variables associated with a single entity is a(n) A) observation. B) data point. C) classification. D) location.

A) observation.

24. Estimation methods are also referred to as A) prediction methods. B) clustering methods. C) association methods. D) supervised methods.

A) prediction methods.

27. A(n) _______________ is often displayed as a row of values in a spreadsheet or database in which the columns correspond to the variables. A) record B) data point C) classification D) location

A) record

23. Data-mining methods for predicting an outcome based on a set of input variables are referred to as A) supervised learning. B) unsupervised learning. C) dimension reduction. D) data sampling.

A) supervised learning.

33. A positive average error on the validation data suggests a tendency to _________ the output variable in the validation data. A) underestimate B) overestimate C) accurately estimate D) inaccurately estimate

A) underestimate

10. The data used to evaluate candidate predictive models are called the A) validation set. B) training set. C) test set. D) estimation set.

A) validation set.

8. A characteristic or quantity of interest that can take on different values is a(n) A) variable. B) observation. C) record. D) quality.

A) variable.

16. From the lift chart shown below, we can infer that if 200 observations with the largest estimated probabilities of being in Class 1 were selected, 150 of them would correspond to actual Class 1 members. If 200 cases were selected at random, approximately how many could we expect to be in Class 1? A) 100 B) 150 C) 200 D) 250

A) 100

36. Use the classification confusion matrix to determine the number of Class 1's in the data set that are correctly classified as Class 1. A) 201 B) 85 C) 25 D) 2689

A) 201

30. Given the following classification confusion matrix, what is the overall error rate? A) 3.67% B) 7.53% C) 92.47% D) 96.33%

A) 3.67%

11. Given the following classification confusion matrix, what is the overall error rate? A) 3.88% B) 7.49% C) 92.51% D) 96.12%

A) 3.88%

29. Given the following classification confusion matrix, what is accuracy of the model? A) 96.12% B) 92.51% C) 7.49% D) 3.88%

A) 96.12%

1. Given the following classification confusion matrix, what is the accuracy of the model? A) 96.33% B) 92.47% C) 7.53% D) 3.67%

A) 96.33%

25. As we increase the cutoff value, _______ error will decrease and _________ error will rise. A) Class 0, Class 1 B) Class 1, Class 0 C) the overall, the individual D) sensitivity, specificity

A) Class 0, Class 1

28. ______________ involves descriptive statistics, data visualization, and clustering. A) Data exploration B) Data partitioning C) Data preparation D) Model assessment

A) Data exploration

21. __________ is a method of extracting data relevant to the business problem under consideration. It is the first step in the data mining process. A) Data sampling B) Data partitioning C) Model construction D) Model assessment

A) Data sampling

13. Which classification confusion matrix shown below represents the following information? The number of Class 0's that are incorrectly classified as Class 1 is 28. The number of Class 0's that are correctly classified as Class 0 is 3258. The number of Class 1's that are correctly classified as Class 1 is 224. The number of Class 1's that are incorrectly classified as Class 0 is 85. A) Pic 1 B) Pic 2 C) Pic 3 D) Pic 4

A) Pic 1

34. ______ is one minus the Class 1 error rate. A) Sensitivity B) Specificity C) Accuracy D) Cutoff value

A) Sensitivity

In classification, which of the following would be considered as a categorical variable for a credit approval decision for a requester? A) Marital status of the requester B) Reject or accept credit approval C) Income of the requester D) Gender of the requester

B) Reject or accept credit approval

An observation is classified as Class 1 if: A) The predicted probability of this observation to be in Class 1 is less than the cutoff value B) The predicted probability of this observation to be in Class 1 is greater than or equal to the cutoff value C) The allowable probability of making Class 1 error is less than the test p-valued. D) The allowable probability of making Class 1 error is greater than or equal to the test p-value.

B) The predicted probability of this observation to be in Class 1 is greater than or equal to the cutoff value

Data used to build a data mining model. A) Validation data B) Training data C) Test data D) Hidden data

B) Training data

19. A negative RMSE suggests a tendency to ________ the output variable in the test data. A) underestimate B) overestimate C) accurately estimate D) inaccurately estimate

B) overestimate

42. The data used to build the candidate predictive model are called the A) validation set. B) training set. C) test set. D) estimation set.

B) training set.

7. Use the classification confusion matrix to determine the number of Class 0 observations in the data set are incorrectly classified as Class 1. A) 201 B) 85 C) 25 D) 2689

C) 25

20. ______ is one minus the overall error rate. A) Sensitivity B) Specificity C) Accuracy D) Cutoff value

C) Accuracy

_____ is the process of estimating the value of a categorical outcome variable. A) Sampling B) Prediction C) Classification D) Validation

C) Classification

14. __________ is the manipulation of data with the goal of putting it in a form suitable for formal modeling. A) Data sampling B) Data partitioning C) Data preparation D) Model assessment

C) Data preparation

3. __________ is the step in data mining that includes addressing missing and erroneous data, reducing the number of variables, defining new variables, and data exploration. A) Data sampling B) Data partitioning C) Data preparation D) Model assessment

C) Data preparation

Test set is the data set used to: A) Build the data mining model. B) Estimate accuracy of candidate models on unseen data. C) Estimate accuracy of final model on unseen data. D) Show counts of actual versus predicted class values

C) Estimate accuracy of final model on unseen data.

Supervised learning and unsupervised clustering both require at least one A) Hidden attribute. B) Output attribute. C) Input attribute. D) Categorical attribute.

C) Input attribute.

18. ______________ occurs when the analyst builds a model that does a great job of explaining the sample of data on which it is based, but fails to accurately predict outside the sample data. A) Supervised learning B) Classification C) Model overfitting D) Estimation

C) Model overfitting

6. The best value of k can be determined by building models with k between 1 and 20 and selecting the value of k that results in the smallest A) value of k. B) number of observations. C) classification error. D) number of neighbors.

C) classification error.

41. A tree that classifies a categorical outcome variable by splitting observations into groups via a sequence of hierarchical rules is called a(n) A) decision tree. B) tree diagram. C) classification tree. D) regression tree.

C) classification tree.

2. Use the classification confusion matrix to determine the number of Class 0 observations in the data set are correctly classified as Class 0. A) 201 B) 85 C) 25 D) 2689

D) 2689

The effectiveness of a classification method can be judged by computing the misclassification errors and summarizing them in a A) Pivot table B) Payoff table C) Dendrogram D) Confusion matrix

D) Confusion matrix

38. Which of the following equations is a logistic regression equation? A) Pic 1 B) Pic 2 C) Pic 3 D) Pic 4

D) Pic 4

39. _____ is a category of data-mining techniques in which an algorithm learns how to predict or classify an outcome variable of interest. A) Data sampling B) Data partitioning C) Model construction D) Supervised learning

D) Supervised learning

22. Which of these is NOT a step in the data mining process? A) data sampling B) data partitioning C) data exploration D) data explanation

D) data explanation

26. An important part of ________ is applying the chosen model to the test data as a final evaluation of model performance. A) data sampling B) data partitioning C) model construction D) model assessment

D) model assessment

4. A tree that predicts values of a continuous outcome variable by splitting observations into groups via a sequence of hierarchical rules is called a(n) A) decision tree. B) tree diagram. C) classification tree. D) regression tree.

D) regression tree.

31. Logistic regression equations have what shape? A) line B) quadratic C) exponential D) s-shape

D) s-shape

9. Given the XLMiner output shown below, the best k for the Classification Model is A) 1 B) 10 C) 20 D) 2

B) 10

12. Use the classification confusion matrix to determine the number of Class 1's in the data set that are incorrectly classified as Class 0. A) 201 B) 85 C) 25 D) 2689

B) 85

5. ___________ is dividing the sample data into three sets for training, validation, and testing of the data-mining algorithm performance. A) Data sampling B) Data partitioning C) Data preparation D) Model assessment

B) Data partitioning

_____ is a generalization of linear regression for predicting an outcome of a binary variable. A) Multiple linear regression B) Logistic regression C) The k-nearest neighbors method D) Cluster analysis

B) Logistic regression


Related study sets

Ch8: Life Insurance: State Law (ID)

View Set

Chapter 7: The Price Level and Inflation

View Set