Quiz 3 (Exam 2)
A linear regression model that is developed on the original data, can be used to compare the effect of predictors on the predicted target variable.
False
Consider two models A and B. If the prediction accuracy of Model A is higher than that of Model B for the training dataset, we can safely say that Model A is better than Model B.
False
In evaluating a predictive model with a numerical target, the mean absolute error (MAE) can be negative or positive but the mean error (ME) is always positive.
False
In the k-means clustering technique, the desired number of clusters (k) is a number that is determined in the middle of the algorithm by calculating the model error.
False
What is the error rate of the following confusion matrix? (rounded to 2 decimal places)
0.41 (b+c) / (a+b+c+d)
What is the fall-out score of the following confusion matrix given that "1" is positive? (rounded to 2 places)
0.47 (c) / (c+d)
What is the sensitivity score of the following confusion matrix given that "1" is positive? (rounded to 2 decimal places)
0.71 (a) / (a+b)
What is the Euclidean distance between the following two records WITHOUT normalization? Round your answer to 1 decimal.
11.5
The following chart shows the within-cluster sum of square errors versus the number of clusters in a k-means clustering model. Based on the Elbow method, what value of k is optimum for clustering?
4
We have trained a classification model and it's ROC curve is shown below. Given that the Area Under the Curve (AUC) is our performance metric. Which model is performing better?
A
Which of the following models (the dark blue line) shows a case of underfitting?
A
In the following confusion matrix, which cell is the FALSE POSITIVE?
C
The following linear model is developed on the normalized data to predict used car prices. Which of the predictors has the LARGEST effect on the predicted price?
CC
Which statement is INCORRECT about clustering?
Clustering is useful for predicting association rules
Which of the following variable search methods for the linear regression model examines all possible combinations of variables?
Exhaustive search
We run two k-means clustering models on the same data with k=3 and k=5. The model with k=3 is necessarily better than the other one because a smaller value of k is always better for clustering.
False
Which statement is INCORRECT about choosing the number of clusters in the k-means clustering method?
Maximizing the within-cluster sums of squared errors (WSS) is the goal when selecting k
The following figure shows residual plots of two linear regression models A and B. Which of the following statements is CORRECT?
Model B is violating homoscedasticity assumption
We have developed a linear regression model and the residual plots are shown in the following figure. What statement is CORRECT about the model?
Model is violating the linearity assumption
Which of the followings is NOT a strategy to prevent model over-fitting?
Set a limit on the value of R2 metric
You are building a multiple linear regression model to predict median house price (MEDV) in Boston using a data set with 12 predictors as shown in the following correlation matrix. Based on the matrix, you would expect the violation of the multicollinearity assumption to happen between what variables?
TAX & RAD (Hint: multicollinearity means a strong linear relationship between two predictors (independent variables)).
Which statement is INCORRECT about the k-means clustering algorithm?
The algorithm starts with initial centroids that are determined by distance function
In evaluating a predictive model with a numerical target, the root mean squared error (RMSE) has the same unit as the predicted variable.
True
In the standardized linear regression model, normalized predictors don't have the same unit and scale as the original predictors.
True
When a model is over-fitted the regression coefficients represent noise in the data, rather than the genuine relationships in the population.
True