SML

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

A convolutional layer in a neural network can only appear directly after the input layer.

False

A marketing company want to build a model for predicting the number of visitors to a web page. Since the number of visitors is an integer, this is best viewed as a classification problem.

False

A neural network is an ensemble method.

False

An epoch, in the context of stochastic gradient descent, is the number of iterations required for the training to converge

False

Bootstrap aggregating (bagging) works best on simple models with high biasand low variance.

False

Classification problems have only qualitative inputs

False

Compared to Bagging, Random Forest on the same trainset performs better by decreasing the bias and increasing the variance.

False

Deep learning is a nonparametric method

False

Deep neural networks can only be used for regression, not classification.

False

For classification, the input variables have to be categorical.

False

In neural networks, sigmoid activation function is a common choice for the last layer in a multi-class classification problem.

False

K-nearest neighbors is a generative model.

False

LASSO and Ridge Regression are mathematically equivalent

False

Like Bagging technique, cross-validation helps to reduce the flexibility of model.

False

Linear discriminative analysis (LDA) is a non-parametric model.

False

Linear regression requires all input variables to be numerical (quantitative).

False

Logistic regression and linear discriminative analysis (LDA) will always produce the same decision boundary for binary classification problems.

False

Neural networks can only be used for classification problems, and not forregression problems.

False

Normalizing the dataset is important for the performance of a classification tree.

False

QDA is a non-linear classification algorithm and always has a quadratic decision surface.

False

Regularization can only be used for regression methods, and not for classification methods.

False

Standard Gradient Descent is guaranteed to converge and find the global minimum.

False

The least squares problem always has a unique solution ̂ β.

False

The optimal weights in deep learning always have a closed form solution, but it is to expensive to compute when the number of data points is large.

False

The partitioning of the input space shown below could be generated by recursive binary splitting

False

The training error usually increases when we increase the model flexibility

False

To talk about the bias-variance tradeoff only has meaning when learning by minimizing the mean squared error cost function.

False

Using bootstrap aggregating, we never have to worry about how much data we have collected. We can always sample more datasets to improve our models.

False

When bootstrapping a dataset, it is important to sample without replacement.

False

k-NN is a linear classifier if k = 1

False

Logistic regression is a regression method.

False It is a classification method (despite its name)

A nonlinear classifier can never have a linear decision boundary

False, A non-linear classifier can still produce a linear classifier.

One should not split datasets randomly into training and test data, but alway stake the last data points as the test data.

False, If the data is collected in an non random fashion this could lead to test datasetonly contains data of a certain type.

Random forest is a special version of boosting with trees

False, Random forest is an bagging algorithm not a boosting algorithm

Regularization decreases the bias of the model

False, Regularization is used to reduce the variance and will thus increase the bias

A classifier ˆG(X) is said to be linear if the function ˆG, which mapseach input to a predicted class, is a linear function of the modelparameters.

False, a classifier is said to be linear if its decision boundary islinear. ˆG takes values in a discrete set and can not be a linearfunction!

The model bias typically tends to zero as the number of trainingdata points tends to infinity.

False, any mismatch between the postulated model and the trueinput-output relationship will result in a model bias which doesnot vanish as the number of data points becomes large.

Boosting is typically used to improve large models with small bias and high variance.

False, boosting is typically used with simple base models with high bias and low variance.

A non-parametric model for classification always achieve zero training error since the complexity grows with data.

False, consider e.g. k-nearest-neighbors with k = 10.

Misclassification loss is sensitive to outliers, i.e. incorrectly classi-fied training data points far from the decision boundary.

False, misclassification loss yields a loss of 1 for any misclassifiedpoint, regardless of how far from the decision boundary it is.

he correlation between any pair of ensemble members of a baggedregression modelˆfBbag(x) = 1BB∑b=1ˆf?b(x)tends to zero as the number of ensemble members B tends toinfinity.

False, the ensemble members are conditionally independent (giventhe training data set) and the correlation between any pair ofensemble members is independent of B.

The Bayes classifier can not be implemented in practice, but if itcould it would always attain zero test error.

False, there is typically an irreducible error.

An underfitted model has high bias and low variance. Therefore, it shows a low accuracy on trainset and high accuracy on testset.

False.

In A Linear Regression model with Gaussian noise, MLE and MSE always give the same result.

False.

Solving a logistic regression problem using gradient descent can lead to multiple local optimum solutions.

False. (Because the logistic loss is convex. Reference in the draft(April 30, 2021) SML book on page 96: "Examples of convex func-tions are the cost functions for logistic regression, linear regressionand L1-regularized linear regression.")

The model y = θ1 x1 + θ21 x2 + ε is an example of a linear regression model with a parameters θ1, θ21, input variables x1, x2 and a noiseterm ε.

False. (Reference in the draft (April 30, 2021) SML book on page49: "linear regression is a model which is linear in its parameters")

It is easy to parallelize the training of a boosted model.

False. (Reference in the draft (April 30, 2021) SML book onpage 151: "Another unfortunate aspect of the sequential nature ofboosting is that it is not possible to parallelize the learning.")

The bias of the model decreases as the size of the training dataset goes to infinity.

False. (See Figure 4.9 (a) in the draft (April 30, 2021) SML book. The bias is approximately constant as the number of training ex-amples increases.)

If gradient decent converges, the solution is guaranteed to be a local minmum.

False. Gradient decent can converge to saddle points.

classifier is called linear if the function that maps each input to a predicted class is linear in the parameters.

False. It is called linear if it has a linear decision boundary.

When using LDA for binary classification, the mid point between two clusters μ = (μ1+μ2)/2 will always give p(y = 1|x = μ) = 1/2

False. It is not true in general. However, it is true if ̂ pi_1 = ̂ pi_= 1/2 .

Random forest is an extension of Adaboost.

False. Random forest is an extension of bagging

A model with lower bias always performs better than a model with higher bias in terms of the mean squared error on test data.

False. The mean squared error is the sum of the bias and the variance. An increasein bias can reduce the variance.

A classification tree with a single binary split is a linear classifier.

True

A k-nearest neighbors classifier always attains zero training error for k = 1 for datasets where no inputs are repeated, i.e. xi \neq xj ∀i \neq j.

True

A linear classifier has a linear decision boundary.

True

A neural network with linear activation functions is linear in the input variables.

True

Boosting primarily increases performance by reducing the bias of the base model.

True

Cross-validation can be used to learn the regularization parameter λ in ridge regression.

True

Deep learning is a parametric method

True

In binary classification, the output can take only two possible values.

True

LASSO and Ridge Regression are two different methods for regularization

True

LASSO regularization can be used as an input selection method.

True

LDA is a special case of QDA

True

Logistic regression is a linear classifier.

True

One could use LDA as base classifier in boosting.

True

Quadratic discriminant analysis is a parametric model.

True

Regression models have quantitative outputs.

True

Regularization allows us to restrict the flexibility of a model.

True

Regularization can be used to avoid overfitting in linear regression.

True

Regularization may prevent overfit

True

Regularization, like ridge regression and LASSO, adds an extra term to thecost function.

True

The absolute error loss function is more robust to outliers than the squared error loss function.

True

The expected mean squared error for new, previously unseen data points can be decomposed into a sum of squared bias, variance and irreducible error.

True

The k-NN classifier most often suffers from overfitting when k = 1

True

When using bagging, an out-of-bag estimate of the expected new data error Enew is computationally much cheaper than a k-fold cross-validation.

True

c-fold cross validation can be used for selecting a good value of k in k-NN.

True

k-NN is a nonparametric method

True

In a neural network model, a convolutional layer uses significantly fewer parameters compared to the dense layer with the same number of hidden units.

True (Because of the sparse interactions and parameter sharing inconvolutional layers. Reference in the draft (April 30, 2021) SMLbook on page 126: "Furthermore, a convolutional layer uses sig-nificantly fewer parameters compared to the corresponding denselayer.")

Convolutional neural networks are well suited for classification problems where the input is an image.

True, Convolutional neural networks are well suited for problems where the data hasa local structure, such as images or time series.

The model y = β0 + β1 x1 + β2 sin(x2) + ε is a linear regression model(β0, β1 and β2 are the unknown parameters)

True, It is linear in the parameters and thus a linear model

Probabilistic models assign probability distributions to unknownmodel parameters.

True, in a probabilistic model the belief about unknown modelparameters is represented using probability distributions.

The model bias of k-NN typically increases as k increases.

True, the model becomes less flexible (= larger bias) as k increases.For large enough k the model will always predict according to thedominating class.

Both CART and K-Nearest Neighbor are non-parametric.

True.

Dropout is a regularization technique which prevents overfitting and generalizes the model.

True.

Bagging allows to estimate the expected new data error Enew without cross-validation.

True. (Reference in the draft (April 30, 2021) SML book on page141: "When using bagging, it turns out that there is a way toestimate the expected new data error Enew without using cross-validation.")

Enforcing a maximum depth for the tree can help reduce overfitting in decision trees.

True. (Reference in the draft (April 30, 2021) SML book on page29)

For models that are trained iteratively, a lower training error Etrain can be achieved by training longer.

True. (Reference in the draft (April 30, 2021) SML book onpage 240: "For models that are trained iteratively we can reduceEtrainby training longer.")

A higher value of the regularization hyperparameter in a linearregression problem with L1 regularization (also called LASSO) leads to a more sparse model, where fewer model parameters arenon-zero.

True. Reference in the draft (April 30, 2021) SML book on page94: "Whereas L2 regularization pushes all parameters towardssmall values (but not necessarily exactly zero), L1 tends to fa-vor so-called sparse solutions where only a few of the parametersare non-zero, and the rest are exactly zero."

A too large number of ensemble members leads to an increased complexity of a bagging model and results in a higher variance.

alse. (Reference in the draft (April 30, 2021) SML book on page140: "It is important to understand that by the construction ofbagging, more ensemble members does not make the resultingmodel more flexible, but only reduces the variance.")


Set pelajaran terkait

Test Out Security Pro Domain 1: Access Control and Identity Management

View Set

OSU Biology 1113 Final Exam - Mackey

View Set

America and the World Since 1945

View Set

Nutrition Chapter 12: Water and Major Minerals

View Set

PSYC 100: States of Consciousness

View Set

5-1 SmartBook Assignment: Chapter 6 (Sections 6.1 through 6.4)

View Set