Midterm
T
"Collinearity refers to the situation in which two or more predictor variables are closely related to one another."
B
(1) What is RSS? A. Discrepancy between the actual outcome and the predicted outcome B. Total square discrepancy between the actual outcome and the predicted outcomes or fit C. Total sum of squares D. None of the above.
B
(1) Which of the following statement is TRUE? A. The lasso produces simpler but less interpretable models than ridge regression. B. The lasso produces simpler and more interpretable models than ridge regression. C. Ridge regression produces simpler but less interpretable models than ridge regression. D. Ridge regression produces simpler and more interpretable models than ridge regression.
T
(2) The first step in multiple regression analysis is to compute the F-statistic and to examine the associated p-value.
D
(2) Which of the following statement is NOT true about the similarity between forward stepwise selection and backward stepwise selection? A. They both provide an efficient alternative to best subset selection. B. Both of them can be applied in settings where p is too large to apply best subset selection. C. Both of them are not guaranteed to yield the best model containing a subset of the p predictors. D. All of the above are correct
A
(2) Which quantity helps assess the quality of a linear regression fit? A. RSE B. RSS C. TSS D. R
C
(3) Which one is the best-known technique for shrinking the regression coefficients towards zero? A. Ridge regression B. The lasso C. Both of them D. None of them
D
(3) Why do we need another method, when we have logistic regression? A. When the classes are well-separated, the parameter estimates for the logistic regression model are surprisingly unstable. Linear discriminant analysis does not suffer from this problem. B. If n is small and the distribution of the predictors X is approximately normal in each of the classes, the linear discriminant model is more stable than the logistic regression model. C. Linear discriminant analysis is popular when we have more than two response classes. D. All of the above.
D
1) Which of the following is not true of Quadratic Discriminant Analysis (QDA)? a) It is a classification method b) It works in an identical fashion as LDA regardless of the fact it estimates separate variances or covariances for each class c) It assumes every class has the same variance or covariance d) Its boundary is straight
E
1. In our hospital we notice the combined presence of two predictors (X1 being the degree of infection by a pathogen and X2 being a measure of a certain gene expression) results in a worse prognosis than one where only X1 or X2 is present. The p-value for X2 is in fact is quite large. In addition, the prognosis gets even worse for female patients. How might we set up a regression model? a. ŷ = B0 + B1X1 + B2X2 + B3(X1+X2) + error b. ŷ = B0 + B1X1 + B2X2 + B3(X1X2) + B4(X3) + error where X3 is either 1 or -1 c. ŷ = B0 + B1X1 + B2X2 + B3(X1X2) + B4(X3) + error where X3 is either 1 or 0 d. ŷ = B0 + B1X1 + B3(X1X2) + B4(X3) + error where X3 is a dummy variable e. b and c
A
1. LOOCV use which statistical measurement? A. MSE B. Mean C. Standard deviation D. Variance
A
1. The ___________ is the average error that results from using a statistical learning method to predict the response on a new observation— that is, a measurement that was not used in training the method. (Chapter 5.1 page 176) a) Test error b) training error rate c) Variance d) Mean Error
T
1. The dummy variable approach cannot be easily extended to accommodate qualitative responses with more than two levels.
T
1. There is no one best method for all possible data sets
D
1. Validation set approach is a strategy that involves randomly dividing the available set of observations into 2 parts. Which one below is not one of them? (a) Training set (b) Validation set (c) Hold-out set (d) None of the above
A
1. What is linear regression? (03-linear_regression-Abu.pdf/ Page 1) a) A simple approach to supervised learning b) A simple approach to unsupervised learning c) A simple data mining tool d) A regression function that is always linear
E
1. What is one possible way to deal with collinearity of variables? a. LASSO regression b. stepwise regression c. use penalty terms d. a and c e. all of the above
A
1. What is the best way to measure closeness? a. Minimize the Least Squares Criterion b. Maximize the least squares criterion c. Minimize the Most Square Criterion d. Maximize the Most Squares Criterion
F
1. When having more than two qualitative responses, it is best to use the dummy variable approach.
A
1. Which could be considered as an advantage when using the validation set approach? a. Simple to use b. Only a subset is needed to fit the model c. A lot of variability among MSE's d. All of the above
C
1. Which model can be used to model a qualitative variables? A. Linear regression B. Multiple regression C. Logistic regression D. All of the above
A
1. Which one is a similarity of Linear Discrimination Analysis (LDA) and Logistic Regression? a. Both produce linear boundaries b. Both assume that observations are drawn from the normal distribution c. Both a & b d. None of the above
D
1. Which one of the approaches below are not part of the 3 classical approaches that are automated and efficient to choose smaller set of models? (a) Forward selection (b) Backward selection (c) Mixed selection (d) None of the above
C
1. ____________ refers to the amount by which ˆf would change if we estimated it using a different training data set. (Chapter 2.2 Page 34) a) Variance b) Bias c) MSC d) Yield
B
1.There are two main reasons that we may wish to estimate f: Trevor Chapter 2, Page 17 A. Inference and Situation B. Prediction and Inference C. Prediction and Inputs D. Situation and Inputs
T
2) "Backward selection cannot be used if p > n, while forward selection can always be used."
A
2) What is a null model? (Trevor Chapter 3, slide 93, textbook page 78) a) A model that contains an intercept but no predictors b) A model that contains predictors but no intercepts c) A model that contains both intercepts and predictors d) A model that contains nothing
C
2) Which of the following is a correct assumption of Linear Discriminant Analysis? a) The observations are from a random sample b) Each individual predictor variable is normally distributed c) Both A and B d) It is stable when the classes are well separated
F
2. In both the regression and classification settings, choosing the correct level of flexibility does not affect the success of any statistical learning method.
F
2. Logical Discriminant Analysis (LDA) It is simple, mathematically robust and often produces models whose accuracy is as good as more complex methods. Algorithm.
C
2. Pick the term most associated with logistic regression a. classifier discriminant analysis b. least squares c. maximum likelihood d. MSE
C
2. The subset selection approach: a. Uses projections as predictors to fit linear regression model by least squares b. Can perform variable selection c. Identifies a subset of the p predictors believed to be related to the response d. None of the above
A
2. Validation set approach error rate tend to be _______________ the test error rate for the model fit on the entire data set. A. Overestimate B. Underestimate C. Near D. Further
F
2. We cannot consider the problem of predicting a binary response using multiple predictors.
B
2. What is a con of using best subset selection (a) Conceptually approach (b) Computation limitations (c) None of the above (d) All of the above
A
2. What is a residual? (03-linear_regression-Abu.pdf/ Page 5) a) A discrepancy between the actual outcome and the predicted outcome b) A discrepancy between the predicted outcome and the actual outcome c) A function between the predicted outcome and the actual outcome d) None of the above
A
2. What is the equation of simple regression? A. Y = b0 +b1(X) B. Y = b0+b1 C. Y = b0+X D. Y = b0 (X)
C
2. When we fit a linear regression model to a particular data set, many problems may occur. Which is NOT a problem in linear regression model? Trevor Chapter 3, Page 92 A. Non-linearity of the response-predictor relationships and correlation of error terms B. Non-constant variance of error terms and outliers C. Correlation of error terms and applications D. High-leverage points and collinearity
A
2. Which are differences when comparing Linear Discrimination Analysis (LDA) and Quadratic Discrimination Analysis (QDA)? a. QDA estimates separate variances/covariances for each class while LDA does not b. LDA estimates separate variances/covariances for each class while QDA does not c. QDA assumes that every class has the same variance/covariance while LDA does not d. None of the above
E
2. Which of the following is an advantage of LOOCV? a. b. can be used for any predictive model c. less likely to overestimate test error d. less computationally expensive e. b and c
D
2. Which of these is a common problem that occur when we fit a linear regression model to a particular data set (a) outliers (b) collinearity (c) high-leverage points (d) All of the above
B
2. ____________ supervised alternative to partial least Principal Components Analysis (PCA). (Chapter 6.3.1 page 230) a) Principal Components Regression (PCR) b) Partial least squares (PLS) c) Select Tuning Parameter d) Ride Regression
C
2. ____________refers to the error that is introduced by approximating a real-life problem, which may be extremely complicated, by a much simpler model. (Chapter 2.2 page 35) a) Variance b) Linear relationship c) Bias d) MSE
D
3) What are reasons to not use Logistic Regression in a scenario? a.When there more than two response classes b.When all the classes are well separated c.When n is small and the distribution predictors x is approximately normal d.All of the above
A
3) What does Backward selection start with? (Trevor Chapter 3, slide 94, textbook page 79) a) All variables in the model b) The null model c) No variables d) All of the above
A
3. Another word for when two or more variables are related to one another. a. Collinearity b. High Leverage Point c. Non-linearity d. Correlation
B
3. Classification problems occur often, perhaps even more so than regression problems. Which is NOT an example: Trevor Chapter 4, page 128 A. A person arrives at the emergency room with a set of symptoms that could possibly be attributed to one of three medical conditions B. The logistic function will always produce an S-shaped curve of this form, and so regardless of the value of X. C. An online banking service must be able to determine whether or not a transaction being performed on the site is fraudulent, on the basis of the user's IP address, past transaction history, and so forth. D. On the basis of DNA sequence data for a number of patients with and without a given disease, a biologist would like to figure out which DNA mutations are deleterious (disease-causing) and which are not.
B
3. Forward stepwise selection: a. Begins with model containing all predictors, then deletes one predictor at a time b. Begins with model containing no predictors, then adds one predictor at a time c. Begins with model containing no predictors, then deletes one predictor at a time d. Begins with model containing a random set of predictors, then deletes one predictor at a time
D
3. In which of the following metrics is it preferable to have a larger value? a.AIC b. BIC c. Bayes classifie d. Adjusted R2 e. Cp
B
3. Simple linear regression is a straightforward approach for predicting: (a) a quantitative response X on the basis of a single predictor variable Y (b) a quantitative response Y on the basis of a single predictor variable X (c) a qualitative response X on the basis of a single predictor variable Y (d) a qualitative response Y on the basis of a single predictor variable X
D
3. We are trying to sort data into 3 classes. What type of method would you employ? a. LDA/QDA b. KNN regression c. logarithmic regression d. a and b e. all of the above
D
3. What are the resampling methods? A. Cross Validation B. LOOCV C. Bootstrap D. All of the above
B
3. What is a computationally efficient alternative to best forward stepwise selection subset selection. (Chapter 6.12 page 207) a) Backward stepwise selection b) Forward stepwise selection c) Optimal Model d) Validation and Cross-Validation
C
3. When does the Bayes classifier have the lowest possible error rate of all classifiers? a. Always b. Never c. Only when ALL the terms are correctly specified in p(X) equation d. When some of the terms are specified in p(X) equation
B
3. Which model is best to model many predictors? A. Linear regression B. Multiple Regression C. Logistic Regression D. All of the above
C
3. Which of these is a efficient alternative to best subset selection (a) Forward stepwise selection (b) Backward stepwise selection (c) All of the above (d) None of the above
F
4) Bayes' classifier is an attainable gold standard for error reduction.
F
4. In KNN, bias increases as K decreases.
T
5) An advantage of KNN is ability to surpass the accuracies of Linear Discriminant Analysis (LDA) and Logistic Regression when the decision boundary is highly non-linear.
A
A model looks like Y=β0+β1X+e is a: a.Linear regression b.Logistic regression c.Linear discriminant analysis d.Quadratic discriminant analysis
T
An outlier is a point for which yi is far from the value predicted by the model.
F
Bias refers to the accuracy that is introduced by approximating a real life problem.
T
Collinearity refers to the situation in which two or more predictor variables are closely related to one another.
T
Degrees of freedom is a quantity that summarizes the flexibility of a curve
F
In logistic regression, if β_1>0, this means there is no relationship between Y and X.
T
In regression model, some predictors are not quantitative but are qualitative, taking a discrete set of values.
F
In simple linear regression, there are several predictors that predict a response
F
In the KNN approach, as K grows, the method becomes more flexible and produces a decision boundary that is close to linear.
A
In which dataset KNN classifier method can be applied best? (Trevor PG 105) a. Quantitative b. Qualitative c. Conceptual d. Methodological
B
In which dataset KNN regression method can be applied best? (Trevor PG 105) a. Quantitative b. Qualitative c. Conceptual d. Methodological
B
In which dataset can Linear regression applied?(Trevor PG 61) a. Supervised b. Unsupervised c. Quantitative d. Qualitative.
T
KNN uses the Bayes classifier to group data points (TRUE/ FALSE)(Trevor pg 54)
T
Linear regression is a simple approach to supervised learning. It assumes that the dependence of out come Y on the predictor x1, x2,... xp is linear.
F
Logistic regression is a method used to predict numerical values using a number of predictors (TRUE/ FALSE)(Trevor pg 146)
T
Most statistical learning problems fall into one of two categories: supervised or unsupervised
T
Most traditional statistical techniques for regression and classification are intended for the low dimensional setting in which n, the number of observations, is much greater than p, the number of features.
T
Smaller RSS means the model is tighter and has smaller variance. True. (PPT)
T
The least squares approach chooses slope and intercept to minimize the residual sum of squares.
T
The null hypothesis is the most common test to perform when testing the standard error.
C
What are 2 reasons we might not prefer to just use the ordinary least squares (OLS) estimates? a. Not Enough Information and Not Enough Technology b. Too Messy and Not Readable c. Prediction Accuracy and Model Interpretability d. Lack of Validation and Model Accuracy
C
What does R^2 represent in a linear model? (pg 85 Trevor) a. the slope of the regression line b. strength in the correlation of a dataset c. variation in the residual that was captured by the model d. Residual sum squares
B
What is Lasso? a.A technique for analyzing multiple regression data that suffer from multicollinearity b.A regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces. c.A statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variable d.All of the above
A
What is Mixed selection a combination of? (Trevor Chapter 3, slide 94, textbook page 79) a) Forward and Backward selection b) Only Forward selection c) Only Backward selection d) None of the above
C
What is a catch-all for what we miss with a model, where the true relationship may not be linear, other variables cause variations in Y, and has a possible measurement error. a.Simple Linear Regression b.Multiple Linear Regression c.Error Term d.Population Regression Line
A
What is the advantage LOOCV has over Validation set approach? (Trevor PG 194) a. It is far less bias. b. It is far less unbiased. c. Can be applied to larger dataset. d. Far less test error.
A
What is the population regression line? a.the best linear approximation to the true relationship between X and Y b.the mean population line c.the linearity of the model with x and y d.same the simple linear regression line
A
What is the purpose of performing Cross Validation? (PG 176) a. Estimating test error. b. Estimating variance of data points. c. Estimating predictor. d. Estimating best subset.
A
Which Validation attempts to address drawbacks a.Leave one out cross validation b.Hold out validation set c.Validation set approach d.First on in validation
D
Which is NOT A selection model? a.Forward b.Backward c.Mixed d.Inverse
B
Which of the following is NOT a Classification method? (Pg 71 Trevor) a. LDA b. Multiple linear regression c. KNN d. Logistic regression
D
Which of the following is a resampling method? (Trevor pg 193) a. LDA b. QDA c. KNN' d. LOOCV
C
Which of the following is not Cross-Validation Method? (Trevor PG 176) a. Leave-One-Out b. K-Fold c. Classification d. Bootstrap
C
Which value is statistical measure of how close the data are to the fitted regression line? a.t-statistic b.p-value c.R-squared d.F-statistic
E
a) What is the Formula equation of a Linear Regression? (Chapter 3.1 page 61) b) X ≈ β0 + β1Y. c) y ≈ β1 + β0X. d) X2=logx* β0Y + β1. e) Y ≈ β0 + β1X.
B
what is a residual (in statistic definition)? a. left over from something b. discrepancy between the actual outcome and the predicted outcome c. finding similarities and differences in a statistical testing d. something you don't want
A
what is linear regression model? a. y=B0+B1X+e b. y=MX+b c. y=BB+B2X+e d. y=B0+A1X+e
D
what's the full name of LOOCV a. Look-Ozone-Outspace City Validation b. Lamda-Octagon-Operation Cyber Vislization c. Least-Original-Organizational Civil Variable d. Leave-One-Out Cross Validation
C
whats a null model? a. a model that uses no value b. a model that uses the null hypothesis c. a model that contains an intercept but no predictors d. a model that contains all values except null
A
x_i= 1, if ith person is female; 0, if ith person is male y_i=β_0+β_1 x_i+ε_i, which model is for female variables? a.y_i=β_0+β_1+ε_i b.y_i=β_0+ε_i c.y_i=β_1 x_i+ε_i d.y_i=β_0+〖2β〗_1+ε_i