Quiz 4

Ace your homework & exams now with Quizwiz!

Which one is NOT one of the advantages of the Naive Bayes classifiers? - Assumption of independence of features - Ability to handle both categorical and numerical variables - Ease to build and understand - Robustness to irrelevant features

Assumption of independence of features

With the k-NN model for a numerical target, after we determined the k nearest neighbors of a new data record, how the target value is predicted? - Average of the neighbors - Majority vote determines the predicted class - Through a linear combination of neighbors - Through a logistic regression between the neighbors

Average of the neighbors

The following chart shows the prediction error of a decision tree based on the training set and validation set as functions of the number of splits. To avoid overfitting what is the best number of splits? (graph with blue training set line that curves down to the right / red validation set line curves down at same rate until split number 5 and then increases) - 5 - 10 - 16 - 3

5

In the logistic regression model the target variable is: - A categorical variable - A numeric variable - Either a numeric or a binary variable - A number between 0 and 1

A categorical variable

Which of the following technique is NOT useful for preventing over-fitting a decision tree? - Adding duplicate records - Cross-validation - Early stopping - Pruning a tree

Adding duplicate records

What is the primary method to avoid underfitting when you are training a classification and regression tree model? - Adding the number of tests (splits) - Increasing the entropy or the Gini index of the leaves - Converting numerical variables to categorical - Substituting multi-level categorical variables with binary dummy variables

Adding the number of tests (splits)

True or False: In the k-nearest neighbor models, increasing the value of k leads to overfitting.

False

How can we turn the logistic regression model into a classification model? - By setting a cutoff value and comparing the predicted probability with it - By setting a cutoff value and comparing the predicted odds with it - By introducing inverse natural log function to the model - By setting a cutoff value and comparing the predicted log odds with it

By setting a cutoff value and comparing the predicted probability with it

True or False: A decision tree can be pre- or post- pruned to avoid underfitting a classification and regression tree.

False

Which statement is INCORRECT about a CART trained to predict a numerical target? - Impurity of the leaves can be measured with a Gini index of Entropy - Prediction is computed as the average of numerical variable at the leaves - Training procedure is similar to training a CART for classification - Pruning procedures and techniques are similar to those for the classification tree

Impurity of the leaves can be measured with a Gini index of Entropy

Which of the following statements is INCORRECT about the logistic regression model? - In the logistic regression, the intercept cannot be zero because of the natural logarithm function - Logistic regression can be used for classification - Logistic regression uses odds and the natural logarithm function - Logistic regression can be developed for a binary or a multi-class target variable

In the logistic regression, the intercept cannot be zero because of the natural logarithm function

We are building a decision tree to predict loan default with four predictors: Age, Income, Gender, and Credit Score. For the first split, we have calculated the Gini index of each test. Based on the following information, which predictor is the best for the first split? Gini Age = 0.49 Gini Income = 0.40 Gini Gender = 0.72 Gini Credit = 0.58 - Income - Gender - Age - Credit

Income

Which statement is INCORRECT about Naive Bayes classifier? - It computes and includes prior probability predictors - It returns the event with which the join probability of that event and predictors is maximized - It examines the existing evidence to predict the probability of target levels - It identifies the dependent variables level (i.e. events) that increases the probability of the desired target class label

It computes and includes prior probability predictors

The following chart shows the prediction error of a decision tree based on the training set and validation set as functions of the number of splits. What phenomenon is causing the gap between the two curves at higher numbers of splits? (graph with blue training set line that curves down to the right / red validation set line curves down at same rate until split number 5 and then increases) - Model overfitting - Model underfitting - Model instability - Model variability

Model overfitting

Which statement is INCORRECT about the structure of decision trees? - Numerical attributes cannot be tested in the tree - Each node represents a test result on a predictor - Each branch is a test on the predictor - Each leaf is a terminal node with prediction

Numerical attributes cannot be tested in the tree

If events A and B are statistically independent, what is P(A|B), that is the conditional probability of A, given B? - P(A) - P(A)*P(B) - P(B|A) - P(B)

P(A)

Which statement explains the issues when linear regression is used to model binary target variables? - Predicted probabilities can be >1 or <0 leading to model interpretation difficulties - Instability of the model coefficients - Complexity of logarithmic calculations - Large intercept value associated with the linear regression models

Predicted probabilities can be >1 or <0 leading to model interpretation difficulties

What is the predicted variable in the logistic regression model? - Probability of class membership - Confusion matrix - RMSE - A number between -1 and 1

Probability of class membership

Which statement about Entropy and the Gini index is correct? - Smaller values of both metrics indicate higher purity of a node - Larger values of both metrics indicate higher purity of a node - Larger values of Entropy and smaller values of the Gini index indicate higher purity of a node - Larger values of the Gini index and smaller values of Entropy indicate higher purity of a node

Smaller values of both metrics indicate higher purity of a node

Which statement is correct about the cutoff value of the probability calculated by a logistic regression model to be used for classification? - The cutoff value is an arbitrary value determined by model performance assessment - Larger cutoff values result in higher model performance - Smaller cutoff values result in higher model performance - The cutoff value must always be set to 0.5

The cutoff value is an arbitrary value determined by model performance assessment

What statement is correct about the k-nearest neighbor (k-NN) method? - The value of k can control model over and underfitting - Underfitted k-NN models can be fixed by adding a dummy variable for accuracy - Overfitted k-NN models can be fixed by decreasing k - Logistic regression is a special case of k-NN

The value of k can control model over and underfitting

True or False: K-nearest neighbor (k-NN) is a supervised method that can be used for predicting categorical and numerical targets.

True

True or False: Statistical independence for two events is present when the outcome of the first event has no impact on the probability of the second event.

True

True or False: The overall goal of building a decision tree for classification is to create leaves that are purer in terms of class labels.

True

True or False: With the Naive Bayes classification method, the zero frequency problem occurs if a given scenario for a single predictor has not be observed.

True

What statement is INCORRECT about the k-nearest neighbor (k-NN) method? - When k=1 (closest record) the classifier performance is maximum - k is an arbitrary number that can be selected by trial-and-error - Too small value for k may lead to over-fitting - Different k value can change the performance of the classifier

When k=1 (closest record) the classifier performance is maximum


Related study sets

Chapter 3 Quiz: The Cellular Level of Organization

View Set

Ch 15 Mini Sim on Accounting and Accounting Information

View Set

Chapter 3 Lesson 2 U.S. History Notes - 9/11/16

View Set

Care of Clients with Genito-Urinary Disorders

View Set

Real Estate Investment & Finance: Chapter 4

View Set

(DONE!) Managing People and Work - Chapter 2

View Set