Business Analytics - All Quizzes
Entropy at the root node is the lowest among all the decision nodes
False
The choice of the number of clusters to partition a set of data into:
depends on why you are clustering the data
All the points on a ROC curve represent FPR-TPR for the same ________ .
Class
Trying to predict whether a stock price will go up or down tomorrow is an example of _____________ .
Classification
Association Rule X->Y implies that buying itemset X causes one to buy more of itemset Y.
False
If you increase the number of hidden layers in a neural network, the classification error of test data always decreases.
False
In boosting, individual learners (models) are independent of each other.
False
Let X, Y and Z be three random variables. If X and Y are independent given Z, i.e., X⊥⊥Y|Z, then p(x,y|z)=p(x)*p(y).
False
Normalization of features is required before training a Logistic Regression.
False
One should not use the Naïve Bayes algorithm for classification if they believe that, given the class, the features are not independent (i.e., the conditional independence assumption does not hold).
False
Roughly equal eigenvalues indicate that the PCA is helpful in dimensionality reduction.
False
To avoid overfitting, one should use all the labeled data points for learning the predictive relationship between the features and the target variable.
False
In the following figures, the red region represents the target, and the blue points are the predictions. Then, the figures present a scenario of:
High variance, Low Bias; Low variance, High Bias
Which metric do we consider to check if two item sets are not bought together just by chance?
Lift
Which of the following metrics do we use to best fit the data in Logistic Regression?
Likelihood
Cross-validation is more useful when you have _________ labeled data
Limited
A high R2 implies
Low training error
Which of the following evaluation metrics cannot be applied in the case of logistic regression?
Mean-Squared-Error
What if we use a learning rate that's too large while building a neural network model?
Network may not converge
Are the results of K-Means clustering stable, i.e., do they always lead to the same clusters?
No
If you remove any one of the non-circled points from the data, will the SVM decision boundary change much?
No
Random forests have as much interpretability as decision trees.
No
A fair coin is tossed. Let A be the event that it's heads and B be the event that it's tails. What is P(A|B)?
0
The following table gives the number of households that have 0, 1, 2, and 3 or more cars. Cars Households 0 100 1 200 2 300 3+ 400 What is the probability that a randomly selected household has at least 2 cars?
0.7
Size Color Y Small Red 0 Big Red 0 Small Blue 1 Big Red 1 What will be the predicted class (Y) of X = {Size=Small, Color=Blue} based on the Naïve bayes algorithm?
1
Following are the two statements given for k-NN algorithm; which of the statement(s) is/are true? We can choose optimal value of k with the help of cross validation Euclidean distance treats each feature as equally important
1 and 2
When using 3-NN for numeric prediction, if the three nearest neighbors of an unlabeled data point are labeled 11, 23 and 23, then what will be the predicted value for the unlabeled data point?
19
Consider the following figure for answering the following question. In the figure, X1 and X2 are the two features and the data point is represented by dots (-1 is negative class and +1 is a positive class). And you first split the data based on feature X1 (say splitting point is x11) which is shown in the figure using vertical line. Every value less than x11 will be predicted as positive class and greater than x will be predicted as negative class. How many more splits do you require on X1 to perfectly classify all the points?
2
In the dendrogram below, if you draw a horizontal line on y-axis for y=2. What will be the number of clusters formed?
2
Which of the following option is true? 1.You need to initialize parameters in PCA 2.You don't need to initialize parameters in PCA
2
What should be the best choice of no. of clusters based on the following results?
3
A fair coin is tossed twice. What is the probability that the first one is heads OR the second one is tails?
3/4
If P(A) = 0.20, P(B|A) = 0.60 and P(B|AC) = 0.25, then what is P(A|B)? Hint: AC means "not A".
3/8
What are the steps for using a gradient descent algorithm?1.Calculate error between the actual value and the predicted value 2.Reiterate until you find the best weights of network 3.Pass an input through the network and get values from output layer 4.Initialize random weight and bias 5.Go to each neuron (node) which contributes to the error and change its respective values to reduce the error
4, 3, 1, 5, 2
Consider the above figure depicting '+' and '-' classes in a 2-feature space. Which of the following values of k in k-NN would give a lower leave-one-out cross-validation (LOOCV) error? Hint: LOOCV is a special case of cross-validation where the number of folds equals the instances in the dataset, i.e., in each fold you hold out one instance for validation.
5
Suppose you build a classifier to predict which customers will respond to a promotional mail. The (black) cumulative response curve or gains chart (depicted above) was obtained by running the classifier on a test sample of 100,000 customers out of which 1% respond. The horizontal axis denotes the percentage of solicitation mailings. Based on the vertical axis, how many customers will respond if you contact 20% of the customers using the classifier?
500
If we know the support of itemset {a, b} is 10, which of the following numbers are the possible supports of itemset {a, b, c}? [Multiple Answers Question]
9, 10
Which of the following is/are true about PCA? 1.It searches for the directions that data have the largest variance 2.Number of principal components <= number of features 3.All principal components are orthogonal to each other
All of the above
How can we assign the weights to output of different models in anensemble? 1. Use analgorithm to return the optimal weights 2. Choose the weights using cross validation 3. Give high weights to more accurate models
All ofabove
Classification accuracy alone can be misleading if you have ________ number of observations in each class.
An unequal
Increasing the minimum number of observations for a split to be attempted at a decision node ________ the complexity/size of a decision tree.
Decreases or doesn't affect
Generally, an ensemble method works better, if the individual (base) models have ____________?
Less correlation among predictions
Which of the following statement(s) is/are true for Gradient Decent (GD) and Stochastic Gradient Decent (SGD)? 1.In GD and SGD, you update a set of parameters in an iterative manner to minimize the error function. 2.In SGD, you have torun through all the samples in your training set for a single update of a parameter in each iteration.
Only 1
While training regression trees, we minimize ____________ to decide splits.
Sum of Squared deviations
Increasing the minimum confidence threshold may decrease the number of rules found.
True
It is not necessary to have a target variable for applying dimensionality reduction algorithms such as PCA.
True
Lasso regression can be used for feature selection.
True
The confusion matrix is used when there are two or more classes as the output of the classifier.
True
We require both a distance measure and a linkage measure to perform hierarchical clustering.
True
k-NN algorithm does more computation on test time rather than train time.
True
While training SVM, as the cost of misclassification increases, the margin is likely to ________ .
become narrower