Sid Final Quiz Questions

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

A die is rolled. Let A be the event that it's 6 and B be the event that it's 1. What is P(A|B)?

The following table gives the number of households that have 0, 1, 2, and 3 or more cars. cars-household 0-100 1-500 2-300 3+-100 What is the probability that a randomly selected household has at least 2 cars?

0.4

Based on the above training data, what will be the predicted class (Y) of X = {Size=Small, Color=Blue} based on the Naïve bayes algorithm?

When using 3-NN for numeric prediction, if the three nearest neighbors of an unlabeled data point are labeled 11, 23 and 23, then what will be the predicted value for the unlabeled data point?

Consider the following figure for answering the following question. In the figure, X1 and X2 are the two features, and the data points are represented by dots (-1 is negative class and +1 is a positive class). And you first split the data based on feature X1 (say splitting point is x11) which is shown in the figure using vertical line. Every value less than x11 will be predicted as positive class and greater than x will be predicted as negative class. How many more splits do you require on X1 to perfectly classify all the points (i.e., such that there is no error)?

In the dendrogram below, if you draw a horizontal line on y-axis for y=2. What will be the number of clusters formed?

What should be the best choice of no. of clusters based on the following results?

If P(A) = 0.20, P(B|AC) = 0.60 and P(B|AC) = 0.25, then what is P(A|B)? Hint: AC means "not A".

3/8

Suppose you build a classifier to predict which customers will respond to a promotional mail. The (black) cumulative response curve or gains chart (depicted above) was obtained by running the classifier on a testing sample of 10,000 customers out of which 1% respond. The horizontal axis denotes the percentage of solicitation mailings ordered by the likelihood of response (greatest to lowest). Based on the vertical axis, how many more customers will respond if you contact top 20% of the customers using the classifier, compared to randomly mailing 20% of the customers?

If we know the support of itemset {a, b} is 10, which of the following numbers are the possible supports of itemset {a, b, c}? [Multiple Answers Question]

9, 10

Which of the following is/are true about PCA? 1. It searches for the directions where data have the largest variance 2. Number of principal components <= number of features 3. All principal components are orthogonal to each other

All of the Above

The ROC curve is drawn for a specific ________ .

Class

Trying to predict whether a stock price will go up or down tomorrow is an example of _____________ .

Classification

Which metric do we consider to check if two itemsets are not bought together just by chance?

Lift

Which of the following metrics do we use to best fit the data in Logistic Regression?

Maximum Likelihood

Which of the following evaluation metrics cannot be applied in the case of logistic regression?

Mean-Squared-Error

Are the results of K-Means clustering stable, i.e., do they always lead to the same clusters?

A high R2 of a regression model does not necessarily imply great predictive power.

True

If events A and B are independent, then P(B|A) = P(B).

True

It is not necessary to have a target variable for applying dimensionality reduction algorithms such as PCA.

True

Lasso regression can be used for feature selection.

True

Normalization is not necessary for k-NN when there is only one feature.

True

The confusion matrix is used when the outcome of the classifier has two or more classes.

True

Unlike the filter approach, the wrapper approach selects features based on the performance on the validation data.

True

We can choose the optimal value of k with the help of cross-validation.

True

We require both a distance measure and a linkage measure to perform hierarchical clustering.

True

Increasing the minimum number of observations for a split to be attempted at a decision node ________ the complexity of a decision tree.

decreases

The confusion matrices of the two classifiers are given as above. For the rare class, Classifier ___ has a higher precision, and Classifier ___ has a higher recall.

I, II

In the following figures, the red region represents the target, and the blue points are the predictions. Then, relatively speaking, the figures present a scenario of:

High variance, Low Bias; Low variance, High Bias

Association Rule X->Y implies that buying itemset X causes one to buy more of itemset Y.

False

Classification accuracy alone can be misleading if classes are well balanced in the data.

False

Entropy at the root node is the lowest among all the decision nodes.

False

Let X, Y and Z be three random variables. If X and Y are independent given Z, i.e., X⊥Y∣Z, then p(x,y|z)=p(x)*p(y) always holds true.

False

Normalization of features is required before training a Logistic Regression.

False

One should not use the Naïve Bayes algorithm for classification if they believe that, given the class, the features are not independent (i.e., the conditional independence assumption does not hold).

False

Roughly equal eigenvalues indicate that the PCA is helpful in dimensionality reduction.

False

To avoid overfitting, one should use all the labeled data points for learning the predictive relationship between the features and the target variable.

False

When facing the imbalanced data problem, one oversamples the rare class in both the training and testing data.

Sid Final Quiz Questions

Ensembles d'études connexes

Chapter 19 Study Guide

MG 417 - Midterm

Chapter 23

Finance Interview Prep

AP Gov Chapter 4: The Executive Branch

Biology 11. DNA Synthesis Multiple Choice

Twenty one pilots song lyrics (uncomplete)

Angles

ATI: Neurological System Part 1

DMI 10 Homework #1

Final Exam - Cell Biology

CCRN CARDIAC ( score 100% on cardiac section with this)

Chapter 20

Drug for infertility

Business Law Unit 6

Pharm Final part 3

ITN 261 - Network Security FINAL

Wound Care

Nursing Management: Patients With Upper Respiratory Tract Disorders

Chapter 09: Experimental and Quasi-Experimental Designs