BADM 211 - Test 2 Quizzes
What could be the objectives behind fitting a regression model? 1. Explanatory or descriptive: assessing the average impact of inputs on an outcome 2. Explanatory or descriptive: Generating descriptive statistics among a set of predictors 3. Predictive: given a set of input values for the observation, predicting the outcome variable for the observations 1 and 3 1 and 2 1, 2, and 3
1 and 3
Which of the following statements is/are true with respect to the equation 6.1 in the book? 1. 𝛽0, 𝛽1, ... 𝛽𝑝 are the coefficients estimated by the model 2. ε is the unexplained part and is called noise 3. 𝑋0, 𝑋1, ... 𝑋𝑝 are the regressors or covariates Only 3 Only 2 1, 2, and 3 Only 1
1,2, and 3
Which of the following is/are true about cross-validation? 1. It is an alternative approach to data partitioning 2. It's immensely useful when the dataset is not large enough for partitioning into train, test and validation set Only 1 Neither 1 nor 2 Both 1 and 2 Only 2
Both 1 and 2
Which of the following statement(s) is/are true with respect to Tables 6.3 and 6.4? 1. Table 6.3 reports the errors on the training data 2. Table 6.4 reports the errors on the validation data Neither 1 nor 2 Both 1 and 2 Only 2 Only 1
Both 1 and 2
What is another name for confusion matrix? coefficient Correlation matrix Classification matrix
Classification matrix
A classifier that performs well will have a ROC curve that is a perfectly diagonal line. True False
False
A higher RMSE means better performance True False
False
If for variable i, P_i <0.05, then it is not significantly associated with the outcome variable. True False
False
Sensitivity is defined as the ability of a classifier to rule out unimportant classes accurately. True False
False
In the used Toyota Corolla cars example in the book (Page no. 165), which of the variables mentioned in Table 6.1 can NOT be modeled as outcome variable in linear regression? Age Kilometers Price Fuel Type
Fuel Type
Which of the following is not an example of feature selection algorithms ? Backward Elimination Exhaustive Search Multiple Linear Regression Forward Selection
Multiple Linear Regression
Which of the best describes 𝛽1 in equation 6.1? 1. A unit change is Y is associated with the 𝛽1 units change in 𝑋1 2. A unit change is 𝑋1 is associated with the 𝛽1 units change in 𝑌 3. Predictor 𝑋1 helps to decrease the prediction error in Y by 𝛽1 units Only 3 Only 2 Only 1 None of them
Only 2
Which of the following is/are true in the context of Linear Regression models? 1. It fits a relationship between multiple numerical outcomes variables and a set of predictors at once. 2. It fits a relationship between a numerical outcome and a set of predictors. 3. It attempts to fit multiple relationships between a numerical outcome and a set of predictors simultaneously. Only 2 Only 1 Only 3 All: 1, 2, 3
Only 2
Which of these statements are true? 1. Interpreting regression coefficients plays an important role in predictive modeling 2. Interpreting regression coefficients plays an important role in explanatory modeling Both 1 and 2 Neither 1 nor 2 Only 2 Only 1
Only 2
Which of the following is true about PCA? PCA is an unsupervised dimension reduction algorithm PCA stands for principal correspondence analysis. PCA is a supervised learning algorithm
PCA is an unsupervised dimension reduction algorithm
Similar MAE implies __
The model did not overfit the training data
If for variable I, beta_i > 0, then there is a positive association with variable i and the outcome variable True false
True
If the assumptions of MLR holds, then the MEAN ERROR (ME) =0. True False
True
PCA is most effective when original features are correlated. TRUE/FALSE
True
Sensitivity and specificity are important accuracy measures when it is more important to predict membership in one class. True False
True
The PCs are uncorrelated. TRUE/FALSE
True
To estimate future classification error, the confusion matrix is computed from the Training set Validation set Entire dataset
Validation set
kc_df.shape (21613,20) Based on the above figure and code, there are ______________ records in the data. a. 21613 b. 2
a. 21613
MLR assumes a. A linear relationship between the predictors and outcome variable b. A nonlinear relationship between the predictors and outcome variable
a. A linear relationship between the predictors and outcome variable
Which of the following is TRUE about LASSO? a. By increasing the penalty parameter lambda , we are shrinking more features to zero b. By increasing the penalty parameter lambda , we are shrinking less features to zero
a. By increasing the penalty parameter lambda , we are shrinking more features to zero
Assume the data is named credit_df, which of the following code outputs the number of rows and columns in the data? a. Credit_df.shape b. Credit_df.count() c. Credit_df.describe() d. Credit_df.head()
a. Credit_df.shape
Which of the following is a classification problem? a. Identification of digits ( 0-9) using images of handwritten digits. b. Predicting lifetime value of customers. c. Forecasting next day sales d. Forecasting number of people watching Super Bowl to decide whether to advertise or not on TV
a. Identification of digits ( 0-9) using images of handwritten digits.
The main limitation of exhaustive search is that a. It is computationally infeasible b. It eliminates predictors that perform well
a. It is computationally infeasible
We estimate beta coefficients by a. Minimizing error in the training set b. Maximizing error in the training set c. Minimizing error in the validation set d. Maximizing error in the validation set
a. Minimizing error in the training set
In this problem we are trying to predict the loan amount to approve a customer. This is a/an a. Regression problem b. Classification problem c. Unsupervised learning problem
a. Regression problem
In LASSO, increasing the penalty parameter lambda results in a. Shrinking more beta coefficients towards zero b. Shrinking fewer beta coefficients towards zero
a. Shrinking more beta coefficients towards zero
The naive predictor in regression is a. The mean of the outcome variable in the validation set b. The median of the outcome variable in the validation set c. The variance of the outcome variable in the validation set
a. The mean of the outcome variable in the validation set
Which of the following is TRUE about LASSO ? a. The penalty parameter lambda acts as a tradeoff between the number of features and prediction error b. The penalty parameter lambda acts as a tradeoff between the number of records and prediction error
a. The penalty parameter lambda acts as a tradeoff between the number of features and prediction error
The dimensionality of data corresponds to the number of a. Variables in the data b. Records in the data c. Variables and records combined in the data
a. Variables in the data
What is the naïve rule for classifying? a. Randomly assign a class, where the probability of being assigned one of m classes is based on the frequency of that class in the dataset. b. Classify the record as a member of the majority class c. Randomly assign a class, where the probability of being assigned one of m classes is 1/m.
b. Classify the record as a member of the majority class
Which of the following error metrics can not be used to asses predictive performance ? a. RMSE b. Mean error c. Mean absolute error
b. Mean error
In this problem we are trying to predict the loan amount to approve a customer. This is a/an a. Explanatory modeling b. Predictive modeling
b. Predictive modeling
Two popular way of normalizing the data is by 1) subtracting the mean and dividing by standard deviation, and 2) _ a. Take the logarithm (base 2) of the variable b. Rescale the variable to a uniform range c. Multiple data by a constant
b. Rescale the variable to a uniform range
Before we build a predictive model, we need to divide the data into training and validation set. The main reason for doing this is a. To have a simpler model b. To avoid overfitting c. To have a more balanced data
b. To avoid overfitting
Which dataset is used to create the predictive model? a. Validation set b. Training set c. Both training and validation set
b. Training set
A propensity score is used in combination with a _____ to determine class membership. a. Confusion matrix b. cutoff value c. ROC curve
b. cutoff value
If you have 4 variables/predictors, then in exhaustive search you need to evaluate a. 8 models b. 12 models c. 15 models d. 20 models
c. 15 models
What is the minimum number of Principal components needed to explain at least 60% of the variance in the data? a. 1 b. 2 c. 3 d. 4
c. 3
Misclassification error arises when ____ a. An outlier is erroneously left in the dataset. b. A categorical variable is misclassified as numerical. c. A record belongs to one class but is classified as another.
c. A record belongs to one class but is classified as another.
Which of the following is TRUE about Multiple Linear Regression (MLR)? a. MLR assumes and non-linear relationship between outcome and predictors. b. Estimated beta coefficients are the ones that maximizes prediction error in the training set. c. Estimated beta coefficients are the ones that minimizes prediction error in the training set. d. Estimated beta coefficients are the ones that maximizes prediction error in the validation set.
c. Estimated beta coefficients are the ones that minimizes prediction error in the training set.
I want to visualize the relationship between Income and Loan_Amount. Therefore, I need to use a/an a. Histogram b. Bar chart c. Scatter plot d. Heatmap
c. Scatter plot
What do the off-diagonal cells in a confusion matrix tell us? a. Total number of records b. The number of correct classifications c. The number of misclassifications
c. The number of misclassifications
PCA is most useful when _______. a. There are categorical predictors b. There are many rows but only a few predictors c. There is high correlation among predictors d. There is no correlation among predictors
c. There is high correlation among predictors
Would you expect root mean square error (RMSE) to be higher for validation data than for the training data? a. Yes, there are more observations in the validation data, thus RMSE is expected to be higher for train data b. No, there are more observations in the train data, thus RMSE is expected to be higher for train data c. Yes, predictions are likely to be worse for the validation data than the training data d. No, the model can't be validated if the error is not smallest for the validation data
c. Yes, predictions are likely to be worse for the validation data than the training data
Suppose you have 5 features in the data and you are transforming your data using PCA. How many PCs do you have? a. 1 b. 2 c. 3 d. 5 e. 10
d. 5
In table 6.3 the Python command "pd.get_dummies" ______. a. Converts numerical predictors into dummy variables b. Converts a numerical outcome into a categorical outcome c. Partitions the data d. Converts categorical predictors into dummy variables
d. Converts categorical predictors into dummy variables
What is regularization in the context of linear models? a. It's a data science philosophy that prescribes testing our model's performance on irregular data, such as outliers b. It's a data science philosophy that prescribes regular updates to our models using the latest data c. It's a method to shrink the data so that model performance can be improved on the training data d. It's a method to drop a few predictors from the model in order to simplify the mode
d. It's a method to drop a few predictors from the model in order to simplify the mode
Which of the following is a feature extraction method? a. Exhaustive search b. Subset selection c. Regularization d. PCA
d. PCA
Which of the following is not an assumption of Linear Regression models? a. There is a linear relationship between the predictors and the outcome variable b. The cases(records) are independent of each other c. The variability in Y for a given set of predictors is the same regardless of the values of the predictors d. The noise follows a poisson distribution
d. The noise follows a poisson distribution
Which of the following is not a reason for dimensionality reduction? a. Cost of obtaining extra features b. To avoid multicollinearity in the data c. To avoid overfitting d. To avoid imbalance in the outcome variable
d. To avoid imbalance in the outcome variable
Which of the following is TRUE about PCA? a. The first PC explains the least variability in the data. b. The first PC is a non-linear combination of original features. c. If the data has 5 original features, then we have 2 PCs. d. You can reduce redundancy in the data by using PCA.
d. You can reduce redundancy in the data by using PCA.
Dimensionality reduction means _______________ . a. reducing the number of records in the data b. reducing the number of missing values in the data c. reducing the number of outliers in the data d. reducing the number of features in the data
d. reducing the number of features in the data
Suppose you have 5 features in the data and you are transforming your data using PCA. Which PC explains the most variance in the data? a. PC5 b. PC4 c. PC3 d. PC2 e. PC1
e. PC1
How many cells will a confusion matrix have if there are m classes? 4 (m-1)^2 m^2
m^2
Principal component analysis is a/an __________ method for dimensionality reduction. supervised unsupervised
unsupervised