SRM- Conceptual Questions
You are given the following statements: I. All white noise processes are non-stationary. II. As time, t, increases, the variance of a random walk increases. III. First-order differencing a random walk series results in a white noise series. Determine which of the above statements is/are false.
I only
Which of the following leads to unreliable results from a multiple linear regression? I. Excluding a key predictor II. Including as many predictors as possible III. Errors not following a normal distribution
I, II and III
For the K-nearest neighbors classifier, which of the following is/are true as K increases? I. Flexibility increases. II. Squared bias increases. III. Variance decreases.
II and III
Sarah is applying principal component analysis to a large data set with four variables. Loadings for the first four principal components are estimated. Determine which of the following statements is/are true with respect the loadings. I. The loadings are unique. II. For a given principal component, the sum of the squares of the loadings across the four variables is one. III. Together, the four principal components explain 100% of the variance.
II and III
Which of the following is a property of K-means clustering but not of hierarchical clustering?
Its algorithm has an unspecified number of iterations to obtain the final cluster assignments.
Which of the following methods can use out-of-bag error to estimate test error? I. Bagging II. Boosting III. Random forest
Bagging and Random Forest
Determine which of the following statements is false regarding a simple linear regression model with a response variable y and an explanatory variable xx
The choice of explanatory variable x affects the total sum of squares
You are interested in modeling the aggregate auto claims for a specific block of business. Determine which model has the most appropriate distribution.
GLM with a Tweedie response
Which of the following is an unsupervised learning technique? I. K-means clustering II. K-nearest neighbors III. k-fold cross-validation
I only
Which of the following is a property of hierarchical clustering?
It is commonly a bottom-up or agglomerative approach.
Determine which one of the following statements about ridge regression is false.
Ridge regression shrinks the coefficient estimates, which has the benefit of reducing the bias
Which of the following is true regarding models that use numerous decision trees to obtain a prediction function f^?
They address the lacking predictive accuracy of a single decision tree
Two Poisson regressions are modeled using the same data: one accounts for varying exposures and the other does not. Which of the following are true? I. The coefficient estimates should not be the same for both models. II. With all else equal, a unit change in predictor xj changes the estimated means of both models by the same factor if the corresponding coefficient estimate is the same. III. One model is better at handling overdispersion than the other.
I and II
You are given the following statements on supervised learning: I. The variance and the squared bias are inversely related. II. The squared bias of a statistical learning method increases as the method's flexibility decreases. III. As model flexibility increases, the test mean squared error (MSE) monotonically decreases. Determine which of the following statements is/are true.
I and II
Determine which of the following statements is/are true for a simple linear relationship, y=β0+β1x+εy I. If ε=0, the 95% confidence interval is equal to the 95% prediction interval. II. The prediction interval is always at least as wide as the confidence interval. III. The prediction interval quantifies the possible range for E(y∣x)
I and II are correct.. Answer not given in A,B,C options
Damon considers using either a simple moving average with length k or exponential smoothing with weight w to smooth a time series. You are given the following statements: I. The larger the value of k, the smoother the time series. II. w=a^2 results in a smoother time series than w=a, where a≠1 and a≠0. III. Using a simple moving average with length k=1 or exponential smoothing with weight w=0 will result in the same smoothed series. Determine which of the statements is/are true.
I and III
Determine which of the following statements about clustering is/are true. I. Cutting a dendrogram at a lower height will not decrease the number of clusters. II. K-means clustering requires plotting the data before determining the number of clusters. III. For a given number of clusters, hierarchical clustering can sometimes yield less accurate results than K-means clustering
I and III
Determine which of the following statements about random forests is/are true? I. If the number of predictors used at each split is equal to the total number of available predictors, the result is the same as using bagging. II. When building a specific tree, the same subset of predictor variables is used at each split. III. Random forests are an improvement over bagging because the trees are decorrelated.
I and III
Trish runs a regression on a data set of n observations. She then calculates a 95% confidence interval (t,u) on y for a given set of predictors. She also calculates a 95% prediction interval (v,w) on y for the same set of predictors. Determine which of the following must be true. I. limn→∞(u−t)=0 II. limn→∞(w−v)=0 III. w−v>u−t
I and III
You are given the following statements relating to scaled deviance of GLMs: I. Scaled deviance can be used to assess the quality of fit for nested models. II. A small scaled deviance indicates a poor fit for a model. III. A saturated model has a scaled deviance of zero. Determine which of the above statements are true.
I and III
Determine which of the following statements is/are true regarding control charts. I. Control charts are used to detect nonstationarity in a time series. II. A control chart has superimposed lines called reference limits. III. An R chart examines the stability of the mean of a time series
I only
You are given the following statements about different resampling methods: I. Leave-one-out cross-validation (LOOCV) is a special case of kk-fold cross-validation. II. k-fold cross-validation has higher variance than LOOCV when k<n III. LOOCV tends to overestimate the test error rate in comparison to validation set approach. Determine which of the above statements are correct.
I only
Consider the following statements concerning cross-validation and a data set with n observations: I. In general, performing LOOCV requires fitting a model for a total of n times. II. With least squares regression, the LOOCV estimate for the test MSE can be calculated by fitting a model once. III. Performing k-fold cross validation requires fitting a model for a total of k times. Determine which of the statements is/are true.
I, II and III
Determine which of the following indicates that a nonstationary time series can be represented as a random walk. I. A control chart of the series detects a linear trend in time and increasing variability. II. The differenced series follows a white noise model. III. The standard deviation of the original series is greater than the standard deviation of the differenced series.
I, II and III
Determine which of the following statements is/are true about Pearson residuals. I. They can be used to calculate a goodness-of-fit statistic. II. They can be used to detect if additional variables of interest can be used to improve the model specification. III. They can be used to identify unusual observations.
I, II and III
Determine which of the following statements about hierarchical clustering is/are true. I. The method may not assign extreme outliers to any cluster. II. The resulting dendrogram can be used to obtain different numbers of clusters. III. The method is not robust to small changes in the data
II and III
Which of the following is/are considered a benefit of K-means clustering over hierarchical clustering? I. Running the algorithm once is guaranteed to find clusters with the global minimum of the total within-cluster variation. II. It is less restrictive in its clustering structure. III. There are fewer areas of consideration in clustering a dataset.
II and III
Determine which of the following statements about selecting the optimal number of clusters in K-means clustering is/are true. I. K should be set equal to n, the number of observations. II. Choose K such that the total within-cluster variation is minimized. III. The determination of K is subjective and there does not exist one method to determine the optimal number of clusters
III only
You are given the following three statements regarding shrinkage methods in linear regression: I. As tuning parameter, λ, increases towards ∞, the penalty term has no effect and a ridge regression will result in the unconstrained estimates. II. For a given dataset, the number of variables in a lasso regression model will always be greater than or equal to the number of variables in a ridge regression model. III. The issue of selecting a tuning parameter for a ridge regression can be addressed with cross-validation. Determine which of the above statements are true.
III only
You are given the following statements concerning bagging in the context of decision trees: I. Bagging is a procedure used to reduce bias of a statistical learning method. II. For bagging, a large number of trees, B, leads to overfitting. III. Out-of-bag observations are observations not used to fit a given bagged tree. Determine which of the statements is/are true
Not listed.. Only III is true
Determine which of the following statements is not a drawback of using linear probability models over nonlinear functions of explanatory variables to model a Bernoulli response.
Explanatory variables that are highly correlated can result in severe multicollinearity
Determine which of the following statements is/are true. I. The leverage for each observation in a linear model must be between 1/n and 1. II. The n leverages in a linear model must sum to the number of explanatory variables. III. If an explanatory variable is uncorrelated with all other explanatory variables, the corresponding variance inflation factor would be zero.
I only
I. Mallow's Cp is an unbiased estimate of the test MSE if it is calculated using an unbiased estimate of σ2σ2. II. Mallow's Cp and Akaike information criterion are proportional to each other. III. A large value of Mallow's CpCp indicates a model with a high test error.
I, II and III
For any statistical learning method, which of the following increases monotonically as flexibility increases? Training MSE Test MSE Squared Bias Variance
Answer not given... Only variance increases monotonically
Which of the following is relevant to unsupervised learning? I. Heteroscedasticity II. Hierarchical clustering III. Hierarchical principle
II only
Which of the following is not a reasonable course of action to address multicollinearity?
Transform the response variable with a concave function
Determine which of the following statements are true. I. The deviance is useful for testing the significance of explanatory variables in nested models. II. The deviance for normal distributions is proportional to the residual sum of squares. III. The deviance is defined as a measure of distance between saturated and fitted model.
All
A modeler creates a regression tree model using recursive binary splitting. Looking at the results, the decision tree appears to be too large, which causes overfitting. The modeler would like to adjust the model to make it more interpretable. Determine which of the following actions does not help solve the interpretability issue.
Apply bagging in constructing the decision tree model
Determine which of the following considerations may make decision trees preferable to other statistical learning methods. I. Decision trees are easily interpretable. II. Decision trees can be displayed graphically. III. Decision trees are easier to explain than linear regression methods.
Correct Answer not given.. All are true
From an investigation of the residuals of fitting a linear regression by ordinary least squares, it is clear that the spread of the residuals increases as the predicted values increase. Observed values of the dependent variable range from 0 to 100. Determine which of the following statements is/are true with regard to transforming the dependent variable to make the variance of the residuals more constant. I. Taking the logarithm of one plus the value of the dependent variable may make the variance of the residuals more constant. II. A square root transformation may make the variance of the residuals more constant. III. A logit transformation may make the variance of the residuals more constant
I and II
Determine the situation in which ridge regression will outperform lasso regression in terms of prediction accuracy.
Ridge regression will outperform if the response is a function of all of the predictors
Brett is given a series of observations from a longitudinal data set. He is interested in knowing whether the observations are realizations of a stationary model or a non-stationary model. If it turns out to be a non-stationary model, then he is interested in whether it is a random walk model. Which of the following observations and corresponding conclusions is inaccurate?
If the sample variance of the series is less than the sample variance of the differenced series, then the series is likely a random walk model.
Determine which of the following pairs of distribution and link function is the most appropriate to model if a person is hospitalized or not.
Binomial distribution, logit link function
You are given the following three statistical learning tools: I. Cluster Analysis II. Logistic Regression III. Ridge Regression Determine which of the above are examples of supervised learning.
II and III
Determine which of the following is/are true regarding k-means clustering. I. At every iteration of the algorithm, the number of clusters changes. II. Categorical variables can be used in the analysis. III. Inversions are a major drawback of performing k-means clustering.
None
Which of the following is/are true for a Poisson regression? I. A square root link is as appropriate as a log link. II. If the model is adequate, the deviance is a realization from a chi-square distribution. III. A large Pearson chi-square statistic indicates that overdispersion is likely more severe.
II and III
You are given the following three statistical learning tools: I. Cluster Analysis II. Logistic Regression III. Ridge Regression Determine which of the above are examples of supervised learning
II and III
Consider the following statements regarding the tuning parameter λ in the lasso model-fitting procedure: I. As λ increases, the number of predictors in the chosen model will increase. II. As λ increases, the bias of the parameters in the chosen model will increase. III. As λ increases, the variance of the predictions made by the chosen model will increase. Determine which of the above statements are true.
II only
Determine which of the following statements is/are true. I. The number of clusters must be pre-specified for both K-means and hierarchical clustering. II. The K-means clustering algorithm is less sensitive to the presence of outliers than the hierarchical clustering algorithm. III. The K-means clustering algorithm requires random assignments while the hierarchical clustering algorithm does not.
III. The K-means clustering algorithm requires random assignments while the hierarchical clustering algorithm does not
Given a family of distributions where the variance is related to the mean through a power function: Var[Y]=ϕE[Y]p where ϕ is the scale parameter. One can characterize members of the exponential family of distributions using this formula. You are given the following statements on the value of p for a given distribution: I. Normal (Gaussian) distribution, p=0 II. Compound Poisson-gamma distribution, 1<p<2 III. Inverse Gaussian distribution, p=−1 Determine which of the above statements are correct.
I and II
Determine which of the following statements concerning decision tree pruning is/are true. I. The recursive binary splitting method can lead to overfitting the data. II. A tree with more splits tends to have lower variance. III. When using the cost complexity pruning method, α=0 results in a very large tree
I and III
You are given statements about the following autoregressive model: yt=β0+β1yt−1+εt,t=1,2,... I. If β1≥1, the model is non-stationary. II. If β0=0, the model reduces to a white noise process. III. If β1=0, the lag k autocorrelation is zero for any positive integer k. Determine which statements is/are true.
I and III
You are considering using k-fold cross-validation (CV) in order to estimate the test error of a regression model, and have two options for choice of k: -5-fold CV -Leave-one-out CV (LOOCV) Determine which one of the following statements makes the best argument for choosing LOOCV over 5-fold CV.
Models fit on smaller subsets of the training data result in greater overestimates of the test error.
Determine which of the following statements is applicable to K-means clustering and is not applicable to hierarchical clustering.
The algorithm must be initialized with an assignment of the data points to a cluster
Determine which of the following statements is NOT true about clustering methods.
Clustering is used to reduce the dimensionality of a dataset while retaining explanation for a good fraction of the variance
Determine which of the following statements is/are true for a linear regression with one explanatory variable. I. R^2 is the fraction of the variation in y about y¯ that is explained by the linear relationship of y with x II. R^2 is the ratio of the regression sum of squares to the total sum of squares. III. The standard error of the regression provides an estimate of the variance of y for a given x based on n−1 degrees of freedom
I and II
Determine which of the following approaches can be used to detect multicolinearity. I. Inspect the correlation matrix II. Calculate variance inflation factors III. Inspect the residuals versus fitted values plot
I and II only
Determine which of the following statements is/are true about clustering methods: I. If K is held constant, K-means clustering will always produce the same cluster assignments. II. Given a linkage and a dissimilarity measure, hierarchical clustering will always produce the same cluster assignments for a specific number of clusters. III. Given identical data sets, cutting a dendrogram to obtain five clusters produces the same cluster assignments as K-means clustering with K=5
II only
Determine which of the following statements on hierarchical and k-means clustering is/are true. I. Applying hierarchical and k-means clustering on the same dataset results in the same clusters if the dendrogram is cut at a height that results in k clusters. II. k-means clustering is greedy. III. Standardizing the variables does not affect the result of hierarchical or k-means clustering.
II only
You are given the following three statistical learning tools: I. Linear Regression II. Boosting III. Lasso Regression Rank these statistical learning tools based on their flexibility in descending order.
II>I>III
Determine which of the following statements is true about lasso regression, relative to ordinary least squares.
Lasso regression is less flexible and thus results in an improved prediction accuracy when its decrease in variance is greater than its increase in bias.
A dataset of 1,000 variables on the 500 companies listed in the S&P 500 is available for studying the change in their stock prices one day after Christmas. Using linear regression, Melody is interested in a good model with 10 variables from among the 1,000 to explain the change in stock prices. Bethany has a similar goal, except she does not require her model to have 10 variables. Determine which approaches are appropriate for the two analysts.
Melody: lasso regression; Bethany: forward selection
Determine which of the following statements is/are true about autoregressive models of order one, AR(1). I. An AR(1) model is a meandering process. II. A stationary AR(1) model is a generalization of both a white noise process and a random walk model. III. The lag kk autocorrelation of a stationary AR(1) model is always non-negative.
None
Determine which of the following statements is/are true about subset selection in the context of normal linear models. I. Best subset selection results in a nested set of best models, each with different number of predictors. II. Residual sum of squares is a suitable metric for selecting the best model among models with different number of predictors. III. Forward stepwise selection cannot be used in high-dimensional settings.
None
You are given the following three statistical learning tools: I. Boosting II. K-Nearest Neighbors (KNN) III. Regression Tree Determine which of the above are examples of unsupervised learning.
None are unsupervised learning
Determine which of the following GLM selection considerations is true
Other things equal, when the number of observations > 1,000, BIC penalizes more for the number of parameters used in the model than AIC.
Andy wants to study the impact of the number of miles driven on the frequency of personal auto insurance claims. Determine which of the following distribution and link function is most appropriate for his model.
Poisson distribution, log link function
Simon uses a statistical learning method to estimate the number of ears of corn produced per acre of land. He has multiple training data sets and applies the same statistical method to all of them. The results are not identical, but very similar, between the training data sets. Which of the following best describes the statistical learning method?
The method has low variance
Which of the following describes a benefit of boosting?
The prediction is obtained through fitting successive trees.
Determine the relationship between the classification error rate, the total Gini index, and the total cross-entropy.
cross-entropy > Gini index > classification error rate
Which of the following are true regarding random forest? I. It is a special case of bagging. II. Every tree is constructed independently of every other tree. III. Relative to bagging, it attempts to make the constructed trees less similar.
II and III
Which of the following is true regarding multicollinearity? I. It is a defect in the data caused by certain observations. II. It can be detected using variance inflation factors. III. It can lead to inflated estimates of standard errors used for inference.
II and III
Which of the following models can be used to accommodate underdispersion relative to the Poisson model? I. Negative binomial model II. Zero-inflated model III. Hurdle model
III only
Determine which of the following statements is true about the linear probability, logistic, probit, and complementary log-log regression models for binary dependent variables.
The logit function is more similar to the complementary log-log function than the probit function for negative values of z
You are given the following equations on a stationary time series {yt} for any positive integers t and s where t≠s I. E(yt+4)=E(ys+5) II. Cov(yt,yt+3)=Cov(ys+3,ys) III. Var(yt)=Var(ys+2) Determine which equations are true.
I, II and III
You are given the following equations on a stationary time series {yt} for any positive integers t and s where t≠s: I. E(yt+4)=E(ys+5) II. Cov(yt,yt+3)=Cov(ys+3,ys) III. Var(yt)=Var(ys+2) Determine which equations are true.
I, II and III
Which of the following is/are true? I. The domain of a Tweedie distribution makes it suitable for modeling a count response variable. II. A negative binomial model is a special case of a heterogeneity model. III. A latent class model is a special case of a zero-inflated model.
II only
You want to perform principal component analysis on a n×p dataset X with n observations and pp standardized features. Determine which of the following statements is/are true. The column variances of X are one, while the row means ofX are zero. The principal component score vectors have length p, while the principal component loading vectors have length n. The eigenvectors of the matrix XTX are the principal component directions, while the eigenvalues are the variances of the components.
III only
You are given the following statements: I. Unsupervised learning is used to draw inferences from datasets without a specified response variable. II. For any statistical learning method, the interpretability and the flexibility of the model are inversely related. III. Logistic regression is an example of a parametric method, while classification tree is an example of a non-parametric method. Determine which of the statements is/are true.
I, II and III
Skyler uses a statistical learning method to estimate the number of ears of corn produced per acre of land. He has multiple training data sets and applies the same statistical method to all of them. The results vary tremendously between the training data sets. Which of the following best describes the statistical learning method?
The method has high flexibility
You are given three statements about the K-means clustering algorithm. I. The K-means clustering algorithm requires that the observations be standardized to have mean zero and standard deviation one. II. The K-means clustering algorithm seeks to find subgroups of homogeneous observations. III. The K-means clustering algorithm looks for a low-dimensional representation of the observations that explains a significant amount of the variance Determine which of the statements I, II, or III are true.
II only
For a bagging procedure, 80% of the bagged trees' first splits are by the same dummy variable. Given this information, which of the following is a recommended course of action? I. Run a random forest instead. II. Drop the dummy variable from the predictors and rerun the bagging procedure. III. Make no changes; this information is not an issue.
I only
Consider the following statements: I. The proportion of variance explained by an additional principal component never decreases as more principal components are added. II. The cumulative proportion of variance explained never decreases as more principal components are added. III. Using all possible principal components provides the best understanding of the data. IV. A scree plot provides a method for determining the number of principal components to use. Determine which of the statements are correct.
II and IV
Determine which of the following statements regarding statistical learning methods is/are true. I. Methods that are highly interpretable are more likely to be highly flexible. II. When inference is the goal, there are clear advantages to using a lasso method versus a bagging method. III. Using a more flexible method will produce a more accurate prediction against unseen data.
II only
You are given the following statements concerning the relationship between a response Y and pp predictors, X=(X1,X2,...,Xp): Y=f(X)+εY=f(X)+ε I. The accuracy of the prediction for Y depends on both the reducible error and the irreducible error. II. The variability of ε can be reduced by using the most appropriate learning method to estimate ff. III. ε has a mean of zero. Determine which of the statements is/are false.
II only
You are given the following statements concerning decision trees: I. A decision tree with n leaves has n branches. II. A stump is a decision tree with no leaves. III. The number of branches is not less than the number of internal nodes. Determine which of the statements is/are true.
III only
Which of the following is/are potential options for addressing outliers? I. Include the observation but comment on its effects. II. Delete the observation from the dataset. III. Create a binary variable to indicate the presence of an outlier.
I, II and III
For a Poisson regression using its canonical link, which of the following is/are true? I. An offset is useful for adjusting against overdispersion. II. Using a log link instead would change the model results. III. To estimate the variance of the response, the only parameters that need estimating are the regression coefficients.
III only
An analyst is modeling the probability of a certain phenomenon occurring. The analyst has observed that the simple linear model currently in use results in predicted values less than zero and greater than one. Determine which of the following is the most appropriate way to address this issue.
Use a logit function to transform the linear model into only predicting values between 0 and 1.
You are performing a principal components analysis on a data set with 50 observations from three independent continuous variables. Consider the following statements: I. The maximum number of principal components that can be extracted from this data is three. II. The first principal component represents the direction along which the data vary the most. III. The third principal component will be orthogonal to the first principal component. Determine which of the above statements is true.
I, II, and III
Determine which of the following statements about hierarchical clustering is/are true. I. Running the algorithm multiple times with the same dissimilarity measure and linkage results in different cluster assignments. II. Variables must always be standardized before hierarchical clustering is performed. III. At every step before two clusters fuse, (i2)(i2) pairwise inter-cluster dissimilarities are compared where ii is the number of clusters before fusion.
III only
Which of the following is true regarding studentized residuals in multiple linear regression?
They should be realizations of a t-distribution.
You are given the following statements on the bias-variance tradeoff: I. Bias refers to the error arising from the method's sensitivity towards the training data set. II. Variance refers to the error arising from the assumptions made in the statistical learning tool. III. The variance of a statistical learning method increases as the method's flexibility increases. Determine which of the statements is/are true.
III only
You are given the following statements comparing k-fold cross validation (with k<n) and Leave-One-Out Cross Validation (LOOCV), used on a GLM with log link and gamma error. I. k-fold validation has a computational advantage over LOOCV. II. k-fold validation has an advantage over LOOCV in bias reduction. III. k-fold validation has an advantage over LOOCV in variance reduction. Determine which of the above statements are true
I and III
Which of the following is/are true regarding hierarchical clustering but not K-means clustering? I. Choosing a linkage is necessary. II. Rerunning the algorithm could produce different cluster results. III. The choice to standardize variables has a significant impact on the cluster results.
I only
Determine which of the following statements is/are true about selection criteria in the context of regression models with Gaussian errors, i.e. Y=β0+β1X1+...+βpXp+ε,ε∼N(0,σ2) I. Mallow's Cp is an unbiased estimate of the test MSE if it is calculated using an unbiased estimate of σ2 II. Mallow's Cp and Akaike information criterion are proportional to each other. III. A large value of Mallow's Cp indicates a model with a high test error
I, II and III
