Qualitative Questions wrong - SRM
Determine which of the following statements about Principal Components Regression (PCR) is/are true. 1) When performing PCR, it is recommended to standardize each predictor prior to generating the principal components. 2) In the absence of standardization, high-variance variables tend to play a larger role in the principal components obtained. 3) PCR guarantees that the directions that best explain the predictors will also be the best directions to use for predicting the target.
1 and 2
Determine which of the following statements about random forests is/are true. 1) To build each tree, a bootstrapped sample of 𝑛 observations is used, and for each split within the tree, a new random selection of 𝑚 predictors is made. 2) Out-of-bag estimation can be used to estimate the test error for random forests. 3) Random forests reduce bias through the averaging of multiple decorrelated trees.
1 and 2
Determine which of the following statements is/are true for a simple linear relationship, 𝑦=𝛽0+𝛽1𝑥+𝜀. 1) If 𝜀=0, the 95% confidence interval is equal to the 95% prediction interval. 2) The prediction interval is always at least as wide as the confidence interval. 3) The prediction interval quantifies the possible range for E(𝑦∣𝑥).
1 and 2
Determine which of the following statements about decision trees versus linear models is/are true. 1) Decision trees are easier to interpret than linear models. 2) Decision trees are more robust than linear models. 3) Decision trees handle qualitative predictors more easily than linear models.
1 and 3
Determine which of the following statements on bagging is/are true. 1) 𝐵 different bootstrapped training data sets are generated, and then a method is trained on the 𝑏th bootstrapped training set in order to get 𝑓^∗𝑏(𝑥), and finally all the predictions are averaged to obtain 𝑓^bag(𝑥)=1𝐵∑𝑏=1𝐵𝑓^∗𝑏(𝑥). 2) Bagging has been demonstrated to give impressive improvements in accuracy by combining together hundreds or even thousands of trees into a single procedure. 3) The out-of-bag (OOB) approach for estimating the test error is particularly convenient when performing bagging on large data sets for which cross-validation would be computationally onerous.
1, 2 and 3
Let 𝑝^𝑚𝑘 represents the proportion of training observations in the 𝑚th region that are from the 𝑘th class. Determine which of the following statements about the Gini index and entropy is/are true. 1) The Gini index is defined by 𝐺=∑𝑘=1𝐾𝑝^𝑚𝑘(1−𝑝^𝑚𝑘), which measures the total variance across the 𝐾 classes. 2) The entropy is defined by 𝐷=−∑𝑘=1𝐾𝑝^𝑚𝑘log𝑝^𝑚𝑘. 3) The entropy and Gini index are similar measures numerically. 4) A large value of the Gini index indicates that a node contains predominantly observations from a single class.
1, 2 and 3
You are performing a principal components analysis on a data set with 50 observations from three independent continuous variables. Consider the following statements: 1) The maximum number of principal components that can be extracted from this data is three. 2) The first principal component represents the direction along which the data vary the most. 3) The third principal component will be orthogonal to the first principal component. Determine which of the above statements is true.
1, 2 and 3
Determine which of the following statements is/are true. 1) If 𝑝1=𝑝2, the Gini index is equal to the classification error rate. 2) The cross-entropy is always greater than or equal to the Gini index regardless of the value of 𝑝1. 3) The cross-entropy is always greater than or equal to the classification error rate regardless of the value of 𝑝2.
1, 2, and 3
Sarah is applying principal component analysis to a large data set with four variables. Loadings for the first four principal components are estimated. Determine which of the following statements is/are true with respect the loadings. 1) The loadings are unique. 2) For a given principal component, the sum of the squares of the loadings across the four variables is one. 3) Together, the four principal components explain 100% of the variance.
2 and 3
Determine which of the following statements about cost complexity pruning is NOT true. 1) It is a way to select a small set of subtrees for consideration, also known as weakest link pruning. 2) Rather than considering every possible subtree, this method considers a sequence of trees indexed by a nonnegative tuning parameter 𝛼. 3) As the tuning parameter 𝛼 value increases, there is a price to pay for having a tree with many terminal nodes, so the error sum of squares plus the number of terminal nodes will be minimized for a smaller subtree.
3
Determine which of the following statements about decision trees are true. 1) They generally have better predictive accuracy compared to other statistical methods. 2) Like in linear regression, categorical variables should be handled using dummy variables. 3) Pruning helps to reduce variance and leads to a smoother fit.
3
You are given the following statements concerning decision trees: 1) A decision tree with 𝑛 leaves has 𝑛 branches. 2) A stump is a decision tree with no leaves. 3) The number of branches is not less than the number of internal nodes. Determine which of the statements is/are true.
3
Determine the scenario for which linear models are preferred over decision trees.
A stable model is desired even when anticipating periodic changes to the data set.
Determine which of the following statements about model selection criteria—Mallows' 𝐶𝑝, Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and adjusted 𝑅2—is true.
Adjusted 𝑅2 is the only criterion among these that directly measures the proportion of variance explained by the model, adjusting for the number of predictors in the model.
Determine which of the following statements about the bias-variance trade-off is FALSE.
As method flexibility increases, the expected test MSE decreases regardless of the nature of the true function 𝑓.
Determine which of the following statements about 𝐾-Nearest Neighbors is FALSE.
As 𝐾 grows, the decision boundary becomes overly flexible and finds patterns in the data that don't correspond to the Bayes decision boundary.
Determine which of the following statement is true regarding ridge regression.
As 𝜆 increases, the training RSS will increase.
Determine which of the following statements about the model selection criteria—Mallows' 𝐶𝑝, Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and adjusted 𝑅2—is NOT true.
BIC and 𝐶𝑝 both use the likelihood function to adjust for the number of parameters, ensuring that simpler models are not unduly favored over more complex models that better capture the underlying patterns.
Determine which of the statements about boosting is NOT true.
Boosting constructs its models in parallel, similar to bagging.
An analyst is assessing four statistical models to determine the best fit for predicting sales based on historical data. Each model is evaluated using different model selection methods: Mallows' 𝐶𝑝, Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and cross-validation error. Considering the inherent characteristics of these methods, determine which of the following statements correctly describes a unique feature of one of these methods compared to the others.
Cross-validation error, unlike 𝐶𝑝, AIC, and BIC, provides a direct measurement of the model's prediction error on new, unseen data by utilizing separate subsets for training and testing.
Determine which of the following statements regarding errors in basic linear regression models is true.
Error terms are also known as disturbance terms.
Determine which of the following statements represents assumptions commonly associated with generalized linear models.
Explanatory variables are fixed and not random.
Determine which of the following statements about the linear probability model is FALSE.
Fitted values always fall between 0 and 1, aligning with the bounds of probability.
Determine which of the following statements about bagging is true.
For a sufficiently large number of bootstrap samples, out-of-bag error is virtually equivalent to leave-one-out-cross-validation error.
Determine which of the following statements about the assumptions of Ordinary Least Squares (OLS) and Generalized Linear Models (GLMs) is FALSE.
GLMs assume that the errors follow a linear relationship with the predictors.
Determine which of the following statements about bagging and its variable importance measures is NOT true.
In bagging classification trees, we can add up the total amount that the Gini index is increased by splits over a given predictor, averaged over all trees.
Determine which of the following statements regarding ensemble methods is false.
In bagging, the bagged trees are grown deep and then pruned.
Determine which of the following statements about backward stepwise selection is NOT true.
It can be applied in a high-dimensional setting, unlike forward stepwise selection.
Determine which of the following statements about the linear probability model and non-linear models is NOT true.
It is easy to distinguish between logit and probit models graphically, since the forms of their functions are quite different.
Determine which one of the following statements makes the best argument for choosing LOOCV over 5-fold CV.
Models fit on smaller subsets of the training data result in greater overestimates of the test error.
Determine which of the following statements about the Poisson regression is true.
Pearson residuals are useful for assessing model fit in Poisson regression.
Determine which of the following statements regarding regression trees is true.
Recursive binary splitting divides the predictor space into non-overlapping regions.
Which of the following situations demonstrates the use of modeling for inference?
Researchers in a clinical trial use statistical learning to identify the largest risk factor for heart diseases.
Determine which of the following methods is NOT appropriate to decide the number of principal components to use.
Retain the principal components whose eigenvectors have an absolute value greater than a specified threshold.
Determine which of the following statements with regard to Principal Component Analysis (PCA) is true.
The PCA process involves linear transformations for dimensionality reduction.
Determine which of the following statements about ordinal dependent variables is true
The cumulative probit model will give results that are very similar to the cumulative logit model.
Consider the following relationship between a response variable 𝑌 and an explanatory variable 𝑥: 𝑌=𝑓(𝑥)+𝜀
The variance of 𝜀 remains constant, regardless of our choice of 𝑓 hat.
Determine which of the following statements about bagging for regression trees is FALSE.
To apply bagging to regression trees, we simply construct 𝐵 regression trees using a single bootstrapped training set, and average the resulting predictions.
Determine when a tree is said to be 'optimally fit.'
When additional splits no longer significantly improve the model's performance on validation data.
Determine which of the following algorithms is considered a greedy algorithm. 1) Recursive binary splitting 2) Forward stepwise selection 3) Backward stepwise selection
all 3
Determine which of the following statements about the validation set approach is/are NOT true. 1) The process entails randomly partitioning the available observations into two segments: a training set and a hold-out set. 2) The model is trained using the training set, and then the trained model is employed to forecast the responses of the observations within the hold-out set. 3) The validation set error rate, commonly evaluated using mean squared error when dealing with a quantitative response, gives an approximation of the test error rate.
all are true
Determine which of the following statements about tree-based methods is/are true. 1) Combining a large number of trees can often result in dramatic improvements in prediction accuracy, at the expense of some loss in interpretation. 2) Decision trees can be applied to both regression and classification problems. 3) When constructing a regression tree, the goal is to find boxes 𝑅1,...,𝑅𝐽 that minimize the RSS, given by ∑𝑗=1𝐽∑𝑖∈𝑅𝑗(𝑦𝑖−𝑦^𝑅𝑗)2, where 𝑦^𝑅𝑗 is the mean response for the training observations within the 𝑗th box.
all are true