KNN
Which of the following value of k in k-NN would minimize the leave one out cross validation accuracy? A) 3 B) 5 C) Both have same D) None of these
B) 5 5-NN will have least leave one out cross validation error.
Suppose, you want to predict the class of new data point x=1 and y=1 using eucludian distance in 3-NN. In which class this data point belong to? A) + Class B) - Class C) Can't say D) None of these
A) + Class All three nearest point are of +class so this point will be classified as +class.
Which of the following will be Euclidean Distance between the two data point A(1,3) and B(2,3)? A) 1 B) 2 C) 4 D) 8
A) 1 sqrt( mod((1-2)) + mod((3-3))) = sqrt(1 + 0) = 1
Imagine you are solving a classification problem with a highly imbalanced class. The majority class is observed 99% of the time in the training data. Your model has 99% accuracy after taking the predictions on the test set. Which of the following is true in such a case? 1. The accuracy metric is not a good idea for imbalanced class problems. 2. The accuracy metric is a good idea for imbalanced class problems. 3. Precision and recall metrics are good for imbalanced class problems. 4. Precision and recall metrics aren't good for imbalanced class problems. A) 1 and 3 B) 1 and 4 C) 2 and 3 D) 2 and 4
A) 1 and 3 1. The accuracy metric is not a good idea for imbalanced class problems. 3. Precision and recall metrics are good for imbalanced class problems.
When you find noise in data which of the following option would you consider in k-NN? A) I will increase the value of k B) I will decrease the value of k C) Noise can not be dependent on value of k D) None of these
A) I will increase the value of k To be more sure of which classifications you make, you can try increasing the value of k.
Which of the following is true about Manhattan distance? A) It can be used for continuous variables B) It can be used for categorical variables C) It can be used for categorical as well as continuous D) None of these
A) It can be used for continuous variables
A company has build a KNN classifier that gets 100% accuracy on training data. When they deployed this model on client side it has been found that the model is not at all accurate. Which of the following thing might gone wrong? Note: Model has successfully deployed and no technical issues are found at client side except the model performance A) It is probably a overfitted model B) It is probably a underfitted model C) Can't say D) None of these
A) It is probably a overfitted model In an overfitted module, it seems to be performing well on training data, but it is not generalized enough to give the same results on a new data.
Which of the following machine learning algorithm can be used for imputing missing values of both categorical and continuous variables? A) K-NN B) Linear Regression C) Logistic Regression
A) K-NN
Suppose we have a dataset that can be trained with 100% accuracy with the help of a decision tree of depth 6. Now consider the points below and choose the option based on these points.Note: All other hyper parameters are the same, and other factors are not affected. 1. Depth 4 will have high bias and low variance 2. Depth 4 will have low bias and low variance A) Only 1 B) Only 2 C) Both 1 and 2 D) None of the above
A) Only 1 1. Depth 4 will have high bias and low variance
Suppose you want to develop a machine learning algorithm that predicts the number of views on the articles in a blog. Your data analysis is based on features like author name, number of articles written by the same author, etc. Which of the following evaluation metrics would you choose in that case? 1. Mean Square Error 2. Accuracy 3. F1 Score A) Only 1 B) Only 2 C) Only 3 D) 1 and 3 E) 2 and 3 F) 1 and 2
A) Only 1 Mean Square Error
Adding a non-important feature to a linear regression model may result in. 1. Increase in R-square 2. Decrease in R-square A) Only 1 is correct B) Only 2 is correct C) Either 1 or 2 D) None of these
A) Only 1 is correct 1. Increase in R-square
True-False: It is possible to construct a 2-NN classifier by using the 1-NN classifier? A) TRUE B) FALSE
A) TRUE You can implement a 2-NN classifier by ensembling 1-NN classifiers
In k-NN what will happen when you increase/decrease the value of k? A) The boundary becomes smoother with increasing value of K B) The boundary becomes smoother with decreasing value of K C) Smoothness of boundary doesn't dependent on value of K D) None of these
A) The boundary becomes smoother with increasing value of K The decision boundary would become smoother by increasing the value of K
Which of the following will be true about k in k-NN in terms of Bias? A) When you increase the k the bias will be increases B) When you decrease the k the bias will be increases C) Can't say D) None of these
A) When you increase the k the bias will be increases large K means simple model, simple model always condider as high bias
Which of the following statement is true about k-NN algorithm? 1. k-NN performs much better if all of the data have the same scale 2. k-NN works well with a small number of input variables (p), but struggles when the number of inputs is very large 3. k-NN makes no assumptions about the functional form of the problem being solved
All of the above
Which of the following option is true about k-NN algorithm? A) It can be used for classification B) It can be used for regression C) It can be used in both classification and regression
C) It can be used in both classification and regression
Which of the following is a good test dataset characteristic? A. Large enough to yield meaningful results B. Is representative of the dataset as a whole C. Both A and B D. None of the above
C. Both A and B
What is the purpose of performing cross-validation? A. To assess the predictive performance of the models B. To judge how the trained model performs outside the sample on test data C. Both A and B
C. Both A and B A. To assess the predictive performance of the models B. To judge how the trained model performs outside the sample on test data
Which of the following is a disadvantage of decision trees? A. Factor analysis B. Decision trees are robust to outliers C. Decision trees are prone to be overfit D. None of the above
C. Decision trees are prone to be overfit
Which of the following statements is true for k-NN classifiers? A) The classification accuracy is better with larger values of k B) The decision boundary is smoother with smaller values of k C) The decision boundary is linear D) k-NN does not require an explicit training step
D) k-NN does not require an explicit training step
How do you handle missing or corrupted data in a dataset? A. Drop missing rows or columns B. Replace missing values with mean/median/mode C. Assign a unique category to missing values D. All of the above
D. All of the above
The most widely used metrics and tools to assess a classification model are: A. Confusion matrix B. Cost-sensitive accuracy C. Area under the ROC curve D. All of the above
D. All of the above
Which of the following is a widely used and effective machine learning algorithm based on the idea of bagging? A. Decision Tree B. Regression C. Classification D. Random Forest
D. Random Forest
For which of the following hyperparameters higher value is better for the decision tree algorithm? 1. Number of samples used for split 2. Depth of tree 3. Samples for leaf A)1 and 2 B) 2 and 3 C) 1 and 3 D) 1, 2 and 3 E) Can't say
E) Can't say For all three options, A, B, and C, it is not necessary that if you increase the value of the parameter, the performance may increase. For example, if we have a very high value of depth of the tree, the resulting tree may overfit the data and would not generalize well. On the other hand, if we have a very low value, the tree may underfit the data. So, we can't say for sure that "higher is better."
k-NN algorithm does more computation on test time rather than train time.
True
Imagine you are working on a project which is a binary classification problem. You trained a model on the training dataset and got the below confusion matrix on the Based on the above confusion matrix, choose which option(s) below will give you the correct predictions. 1. Accuracy is ~0.91 2. Misclassification rate is ~ 0.91 3. True Negative rate is ~0.95 4. True positive rate is ~0.95 A) 1 and 3 B) 2 and 4 C) 1 and 4 D) 2 and 3
C) 1 and 4 1. Accuracy is ~0.91 4. True positive rate is ~0.95
You have given the following 2 statements, find which of these option is/are true in case of k-NN? 1. In case of very large value of k, we may include points from other classes into the neighborhood. 2. In case of too small value of k the algorithm is very sensitive to noise A) 1 B) 2 C) 1 and 2 D) None of these
C) 1 and 2 Both the options are true and are self explanatory.
In k-NN it is very likely to overfit due to the curse of dimensionality. Which of the following option would you consider to handle such problem? 1. Dimensionality Reduction 2. Feature selection A) 1 B) 2 C) 1 and 2 D) None of these
C) 1 and 2 In such case you can use either dimensionality reduction algorithm or the feature selection algorithm
In the image below, which would be the best value for k assuming that the algorithm you are using is k-Nearest Neighbor. A) 3 B) 10 C) 20 D 50
B) 10
Which of the following will be true about k in k-NN in terms of variance? A) When you increase the k the variance will increases B) When you decrease the k the variance will increases C) Can't say D) None of these
B) When you decrease the k the variance will increases Simple model will be consider as less variance model
Following are the two statements given for k-NN algorithm, which of the statement(s) is/are true? 1. We can choose optimal value of k with the help of cross validation 2. Euclidean distance treats each feature as equally important A) 1 B) 2 C) 1 and 2 D) None of these
C) 1 and 2
Below are two statements given. Which of the following will be true both statements? 1. k-NN is a memory-based approach is that the classifier immediately adapts as we collect new training data. 2. The computational complexity for classifying new samples grows linearly with the number of samples in the training dataset in the worst-case scenario. A) 1 B) 2 C) 1 and 2 D) None of these
C) 1 and 2 Both are true and self explanatory
