Sid Final Quiz Questions
A die is rolled. Let A be the event that it's 6 and B be the event that it's 1. What is P(A|B)?
0
The following table gives the number of households that have 0, 1, 2, and 3 or more cars. cars-household 0-100 1-500 2-300 3+-100 What is the probability that a randomly selected household has at least 2 cars?
0.4
Based on the above training data, what will be the predicted class (Y) of X = {Size=Small, Color=Blue} based on the Naïve bayes algorithm?
1
When using 3-NN for numeric prediction, if the three nearest neighbors of an unlabeled data point are labeled 11, 23 and 23, then what will be the predicted value for the unlabeled data point?
19
Consider the following figure for answering the following question. In the figure, X1 and X2 are the two features, and the data points are represented by dots (-1 is negative class and +1 is a positive class). And you first split the data based on feature X1 (say splitting point is x11) which is shown in the figure using vertical line. Every value less than x11 will be predicted as positive class and greater than x will be predicted as negative class. How many more splits do you require on X1 to perfectly classify all the points (i.e., such that there is no error)?
2
In the dendrogram below, if you draw a horizontal line on y-axis for y=2. What will be the number of clusters formed?
2
What should be the best choice of no. of clusters based on the following results?
3
If P(A) = 0.20, P(B|AC) = 0.60 and P(B|AC) = 0.25, then what is P(A|B)? Hint: AC means "not A".
3/8
Suppose you build a classifier to predict which customers will respond to a promotional mail. The (black) cumulative response curve or gains chart (depicted above) was obtained by running the classifier on a testing sample of 10,000 customers out of which 1% respond. The horizontal axis denotes the percentage of solicitation mailings ordered by the likelihood of response (greatest to lowest). Based on the vertical axis, how many more customers will respond if you contact top 20% of the customers using the classifier, compared to randomly mailing 20% of the customers?
30
If we know the support of itemset {a, b} is 10, which of the following numbers are the possible supports of itemset {a, b, c}? [Multiple Answers Question]
9, 10
Which of the following is/are true about PCA? 1. It searches for the directions where data have the largest variance 2. Number of principal components <= number of features 3. All principal components are orthogonal to each other
All of the Above
The ROC curve is drawn for a specific ________ .
Class
Trying to predict whether a stock price will go up or down tomorrow is an example of _____________ .
Classification
Which metric do we consider to check if two itemsets are not bought together just by chance?
Lift
Which of the following metrics do we use to best fit the data in Logistic Regression?
Maximum Likelihood
Which of the following evaluation metrics cannot be applied in the case of logistic regression?
Mean-Squared-Error
Are the results of K-Means clustering stable, i.e., do they always lead to the same clusters?
No
A high R2 of a regression model does not necessarily imply great predictive power.
True
If events A and B are independent, then P(B|A) = P(B).
True
It is not necessary to have a target variable for applying dimensionality reduction algorithms such as PCA.
True
Lasso regression can be used for feature selection.
True
Normalization is not necessary for k-NN when there is only one feature.
True
The confusion matrix is used when the outcome of the classifier has two or more classes.
True
Unlike the filter approach, the wrapper approach selects features based on the performance on the validation data.
True
We can choose the optimal value of k with the help of cross-validation.
True
We require both a distance measure and a linkage measure to perform hierarchical clustering.
True
Increasing the minimum number of observations for a split to be attempted at a decision node ________ the complexity of a decision tree.
decreases
The confusion matrices of the two classifiers are given as above. For the rare class, Classifier ___ has a higher precision, and Classifier ___ has a higher recall.
I, II
In the following figures, the red region represents the target, and the blue points are the predictions. Then, relatively speaking, the figures present a scenario of:
High variance, Low Bias; Low variance, High Bias
Association Rule X->Y implies that buying itemset X causes one to buy more of itemset Y.
False
Classification accuracy alone can be misleading if classes are well balanced in the data.
False
Entropy at the root node is the lowest among all the decision nodes.
False
Let X, Y and Z be three random variables. If X and Y are independent given Z, i.e., X⊥Y∣Z, then p(x,y|z)=p(x)*p(y) always holds true.
False
Normalization of features is required before training a Logistic Regression.
False
One should not use the Naïve Bayes algorithm for classification if they believe that, given the class, the features are not independent (i.e., the conditional independence assumption does not hold).
False
Roughly equal eigenvalues indicate that the PCA is helpful in dimensionality reduction.
False
To avoid overfitting, one should use all the labeled data points for learning the predictive relationship between the features and the target variable.
False
When facing the imbalanced data problem, one oversamples the rare class in both the training and testing data.
False
k-NN algorithm takes significantly more time as k increases.
False
While training regression trees, we minimize ____________ to decide splits.
Sum of squared deviations
