Machine Learning Quiz Answers

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

What are the odds (not probability) of rolling a 6 on a fair sided die

0.2

Select all operations you can perform with lin reg that will never decrease the R² on the training set. Adding features Removing features Scaling features by a positive value Scaling features by a negative value Scaling the features by any real-valued constant Scaling the target by a positive real-valued constant

Adding features, scaling by positive, scaling negative, scaling the target

Select all statements that are true about cubic natural splines. They are: differentiable continuous nonlinear have some linear segments

All of the above

Which is a most common method for estimating the confidence in estimated parameters (such as β in linear regression) of a machine learning method

Bootstrapping

Which of the following is true about decision trees: Can be used for classification and regression A disadvantage is they are not scale invariant A disadvantage is that they only can be used for quantitative features An advantage is that they can conveniently accomodate any prior knowledge An advantage is that they can be interpreted

Can be used for classification and regression, they can be interpreted

If your data contains a categorical feature/predictor with multiple valid values, what would be a reasonable approach for using in your machine learning algorithm?

Constructing an indicator/dummy variable

Linear regression with cubic regression splines fits functions that are Equivalent to a single cubic function Linear Continuously differentiable Differentiable Continuous

Continuously differentiable, differentiable, continuous

Benefit of pruning decision tree

Decrease variance

Which of the following is true about decision trees Can only be used for regression Can only be used for qualitative features Can only work with centered and normalized features Usually branch using 2-feature conditions Easy to interpret

Easy to interpret

If QDA has a higher area under the curve than LDA then

Either QDA or LDA can be preferable based on the relative costs of false positives or false negatives

True or False: ridge regression will never overfit because it is regularized

False

True or False: the primary disadvantage of Lasso is that it requires all predictors to have non-negative weights in the final model

False

True or false: RSS on the training set increases with the addition of features

False

True or false: using kernels in SVMs prevents overfitting

False

Which would be best suited for a recurrent neural net Learning the same problem multiple times Image recognition Unsupervised learning Genre detection Predicting tides

Genre detection

Which statement is true about using KNN for classification with k=1: Is common in practice because it is very fast Has a 0 error on the training set Has a 0 error on the test set Has a linear decision boundary Never overfits

Has a 0 error on the training set

Select which of the following are examples of prediction: Identifying patients as sick based on temp, antibody levels, and presence of a cough Developing a new process for evaluating loans based on the splits in a decision tree, Reviewing feature coefficients after perform log reg on whether or not a student passed a test Labeling a student as likely to pass a test based on previous exam grades and class attendance

Identifying patients as sick, labeling a student as likely to pass

Select all valid reasons for reducing the number of features: Improve interprebaility of parameters Increase flexibility Reduce error on the test set Reduce error on the training set

Improve interpretability, reduce error on test set

How does QDA generalize LDA?

It does not assume that all classes have the same variance

Select all methods that are examples of unsupervised learning Decision trees LDA KNN K-Means PCA

K-means, PCA

Assume that the LDA classification error is 0.9 on the test set, then

LDA is worse than always predicting the most common class

Select all of the following machine learning methods that are generative: LDA QDA KNN Log Reg Lin Reg

LDA, QDA

Select all parametric classification methods: KNN LDA QDA Lin Reg Log Reg

LDA, QDA< Log Reg

If you have a small training set and a flexible classifier, what is the most likely appropriate strategy for model validation?

LOO

Assume that you have a problem with many features and you expect that only a few of them to be important. Lin Reg Lasso Forward feature selection Ridge KNN

Lasso, forward feature

Select ALL methods that you would consider using if you have more features than data points Lin Reg Lasso Ridge Regression Bootstrap

Lasso, ridge regression

Select all ML methods for which scaling features will never change the training error or the test error: Lin Reg KNN LDA QDA Log Reg

Lin Reg, LDA, QDA, Log Reg

Select all that are true: Lin reg can fail if the data set is not iid Outlier data does not impact a linear regression model Linear regression can be used to fit a nonlinear function Linear regression is a generative model

Lin reg can fail is the data set is not iid Line reg can be used to fit a nonlinear function

Which machine learning method when scaling features will never change the training error or the test error: Lin reg Lasso Ridge KNN

Linear regression

Select all classification methods that have linear classification boundaries in 2-class classification: Log reg LDA QDA KNN Lin Reg

Log Reg, LDA

Consider a neural network with a single output node, several input nodes, and no hidden layers. The single unit in this network uses a sigmoid activation function. If you train this network using the cross-entropy objective, which other machine learning method will make the most similar predictions? SVM Feed forward neural net LDA Decision tree trained using CART linear regression poisson regression Recurrent neural net Random forest Log reg

Log reg

Which of the following fit a separating hyperplane Log Reg KNN LDA Support vector classifiers Lin Reg

Log reg, LDA, support vector classifiers

Decision boundary is linear in the feature space in the following methods: SVM with a polynomial kernel Maximum margin classifier Single layer neural net Recurrent neural net Log reg LDA

Maximum margin classifier, log reg, LDA

If the ROC curve of method A is never below the ROC curve of method B, then on this dataset

Method A is no worse then Method B

It is a good practice to run LOO CV multiple times to get a better estimate of the desired parameter

No

Select the method for which the following statement is satisfied. If the coefficient βₙ for the feature Xₙ is smaller than the coefficient β₀ for the feature X₀, then Xₙ is less important: Lin Reg Log Reg KNN Naive Bayes LDA and QDA None of the Above

None of the above

Select all of the following machine learning methods which are generative: SVM Log Reg QDA Lasso LDA PCA

QDA, LDA

What is used to approximate the purity of a node in a regression tree? Gini index RSS Cross validation Confusion matrix ROC

RSS

Which of the following are examples of ensemble learning KNN Random forests Decision trees LOO Boosted decision trees

Random forests, boosted decision trees

What are valid reasons to reduce the number of features

Reduce overfitting, reduce computational complexity

Select all true statements: FOrward step-size feature selection is guaranteed to achieve the minimal error on the training set Reguklarization can be seen as a heurisitic approach to feature selection CV is appropriate for determining the right number of features to use Regularization penalizes model complexitiy

Regularization is a heuristic approach CV is appropriate for determining number of features Regularization penalizes model complexity

What is one of the benefits of using the L1 norm in regularization of linear regression vs the L2 norm?

Results in sparse solutions

Which of the following is true Ridge regression shrinks regression coefficients towards 0 Lasso expands the regression coefficients towards infinity Ridge regression doesn't show any improvement over lin reg Ridge regression tends to assign non-zero coefficients to fewer features than best subset selection

Ridge shrinks coefficients towards 0

Which of the following are true: SVMs effectively eliminate the bias variance trade off The number of support vectors is independent of the kernel used SVMs are generalizations of the maximal margin classifier SVMs can fit nonlinear decision boundries

SVMs are generalizations, can fit nonlinear decision boundaries

The overall strategy in bootstrapping is to

Sample with replacement

What is a valid reason for using boosting over a single decision tree?

Single decision trees cannot include predictive power from multiple, overlapping regions of the feature space

Recall

TP/(TP+FN)

True or false: a benefit of pruning decision trees is that it may help decrease variance

True

True or false: the purpose of random forests is to decorrelate trees when doing bagging

True

Which are true about hyperplanes: An appropriate interpretation is that it divides a p-dimensional space into three equal size partitions A line is a hyperplane in three dimensional space Hyperplanes can only be defined in 2-dimensions The following is valid equation for a hyperplane in five-dimensional space β₀+β₁X₁+β₂X₂+β₃X₃+β₄X₄+β₅X₅=0 A line is a hyperplane in two dimensional space

Valid equation, a line is a hyperplane in two dimensional space

Decision trees can handle qualitative features ______

always

Which values usually increase when increasing the regularization coefficient in ridge regression from 0: bias variance training error test erro

bias, training erro

Cubic regression splines are: continuously differentiable continuous piecewise linear piecewise constant

continuously differentiable, continuous

The principal components identified by PCA are eigenvectors of the covariance matrix always positive orthogonal always negative parallel

eigenvectors, orthogonal

Suppose that for a linear regression fit, the coefficient βₙ for the feature Xₙ is smaller than β₀ for the feature X₀. Then removing Xₙ would increase the prediction error ____ compared to X₀

either more or less

Logistic regression assumes a linear model of log odds and therefore

has a linear decision boundary between classes in the feature space

Data points with high leverage in simple linear regression are ones that

have a very different X (feature) value from other data points

Simple linear regression assumes that the target is a linear combination of feature values plus a noise εₙ for each data point n. Select ALL statements with what linear regression assumes the noise εₙ They are heteroscedastic They are homoscedastic They are independent(statistically) They are identically distributed They are normally distributed They are positive

homoscedastic, independent, identically distributed, normally distributed

Select all that are true about convolutional neural nets They are especially appropriate for image recognition They can only be used with ReLU as an activation function They are a type of recurrent neural net with constraints They are a type of feed forward neural network They were developed especially for understanding natural language

image recognition, feed forward neural net

QDA will never have a greater ______ than LDA

negative likelihood of the training set

Adding interaction features in linear regression will

never increase the training RSS

Random variables X and Y are independent _______ their correlation coefficient is zero

only if

Using dummy variables for a qualitative feature with p classes adds___ new binary features

p-1

Select all statements that are true about the k-means algorithm It is randomized It automatically finds the number of clusters It minimizes the classification error Its output depends on whether the features are normalized It does not require features to be centered

randomized, output depends on normalization of features, does not require features to be centered

Bootstrapping constructs data sets by sampling ________

randomly with replacement

The first principal components in PCA is: the same idrection as computed using total least squares the same direction as computed using linear regression the direction that minimizes the datas variance the direction that maximizes the data variance

same direction as total least squares, maximizes variance

In simple linear regression (Y=β₀+Xβ₁), the R² statistic is equal to

the square of the correlation coefficient between X and Y

QDA will never have a higher eroor than LDA on the

training set

True or false: the ROC curve shows the true positive rate as a function of the false positive rate

true

True or false: the principal components identified by PCA are othogonal

true

Select all that are true about PCA: PCA is falling out of favor to Lasso PCA is unsupervised learning technique PCA seeks dimensions that minimze variance PCA expands the dimensionality of the data PCA is an effective texhnique for linear dimensionality reduction

unsupervised learning technique, effective technique for linear dimensionality reduction

What option should you taken when an SVM with a polynomial kernel overfits on the training set Use a polynomial kernel of a smaller degree Use a polynomial kernel of a larger degre Create more data by bootstrapping

use kernel of smaller degree

Select reasons to use slack variables in SVMs and maximum margin classifiers When classes are not seperable to use a non linear kernel to handle non linear kernels to increase the number of support vectors to reduce sensitivity to outliers to decrease the number of support vectors

when classes are not seperable, to increase the number of support vectors, to reduce sensitivity to outliers


Kaugnay na mga set ng pag-aaral

Chapter 16 Therapy and Treatment

View Set

Digital Cloud Leader Google form practice test

View Set

Chapter 14 Activities, Media, and the Natural World

View Set

Apply knowledge of muscle anatomy and physiology

View Set

Human-Induced Change: The Case of Climate Change

View Set