ML final CECS 456
Quiz 2 Q 7 Consider the following binary vectors: x1 = (1, 1, 1, 1, 1) x2 = (1, 1, 1, 0, 0) What is the Euclidean distance for the pair of vectors x1 and x2?
\sqrt{2} We can calculate the Euclidean distance step-by-step: Subtract the corresponding elements (X1 - X2): (1−1,1−1,1−1,1−0,1−0)=(0,0,0,1,1) Square each of these differences: 0^2,0^2,0^2,1^2,1^2 = 0,0,0,1,1 Sum the squared differences: 0+0+0+1+1=2 Take the square root of the sum: \sqrt{2}
Quiz 5 Q 9
https://csulb.instructure.com/courses/67060/quizzes/265530
Quiz 6 Q 16
https://csulb.instructure.com/courses/67060/quizzes/266225
Quiz 4 Q 10 Consider the decision trees shown in Figures (a) and (b). For each approach described below, you need to compute the generalization errors for both trees and decide which tree is better. The training data set is shown in Figure (c). If we use optimistic approach (assumes generalization error is given by the training error), then error rates for trees A and B are [0%, 10%, 20%, 30%] and [0%, 10%, 20%, 30%] respectively. So tree [A, B, They have equal generalization errors!, They cannot be determined!] is better.
10%, 20%, A,
Quiz 4 Q 11 Consider the decision trees shown in Figures (a) and (b). For each approach described below, you need to compute the generalization errors for both trees and decide which tree is better. The training data set is shown in Figure (c). If we use pessimistic approach (assumes generalization error is given by the training error and model complexity), then error rates for trees A and B are [1/10, 6/10, 10/10, 11/10] and [2/10, 6/10, 10/10, 11/10], respectively. So tree [A, B, They have equal generalization errors!, They cannot be determined!] is better. Assume trade-off hyper-parameter Ω=2.
11/10, 10/10, B
Quiz 7 Q 6 What is backpropagation in the context of neural networks? A method for preprocessing input data A method for visualizing neural network architectures A method for updating network weights based on the gradient of the loss function A method for regularizing neural networks
A method for updating network weights based on the gradient of the loss function
Quiz 3 Q 5 In decision trees, what does a leaf node represent? A decision based on a feature A root node A split in the dataset A prediction or classification
A prediction or classification
Quiz 6 Q 7 Which performance metric is most influenced by the class distribution in the dataset? Precision Recall Accuracy F1 Score
Accuracy
Quiz 8 Q 6 Which optimization technique combines momentum and RMSProp? Stochastic Gradient Decent (SGD) Adadelta Adam AdaGrad
Adam
Quiz 1 Q 5 We have Stock market data, which include the prices and volumes of various stocks on different trading days. Determine which machine learning task the following task belong to: Identify unusual trading days for a given stock (e.g., unusually high volume) Anomaly detection Association rule mining Clustering Classification
Anomaly detection
Quiz 2 Q 2 Consider the following dataset that contains the age and gender information for 9 users who visited a given website. Suppose you apply equal frequency approach to discretize the Age attribute into 3 bins. What will be the size of each of the 3 bins.
Bin 1: 3 Bin 2: 3 Bin 3: 3 Since there are 9 users and 3 bins, every bin must contain 3 users. Bin 1: 1, 2, 3 Bin 2: 4, 5, 6 Bin 3: 7, 8, 9
Quiz 2 Q 1 Consider the following dataset that contains the age and gender information for 9 users who visited a given website. Suppose you apply equal interval width approach to discretize the Age attribute into 3 bins. What will be the size of each of the 3 bins.
Bin 1: 5 Bin 2: 3 Bin 3: 1 Bin width =(68-17)/3 = 1 Bin 1: 1, 2, 3, 4, Bin 2: 6, 7, Bin 3: 9
Quiz 6 Q 6 Which ensemble learning method combines predictions from multiple models using a weighted sum? a) b) c) d) Blending Stacking Bagging error-correcting output coding Boosting
Boosting
Quiz 4 Q 7 In K-nearest neighbor (KNN) classification, how is the class label of a new data point determined? By fitting a linear decision boundary based on its K nearest neighbors By calculating the mean of the features of its K nearest neighbors By assigning the majority class label among its K nearest neighbors By selecting the class label of its farthest neighbor
By assigning the majority class label among its K nearest neighbors
Quiz 8 Q 2 How does batch normalization contribute to faster training? By reducing learning rate By initializing parameters more effectively By eliminating the need for data augmentation By ensuring consistent gradients throughout training
By ensuring consistent gradients throughout training
Quiz 8 Q 12 How can adding momentum to SGD help avoid overshooting? By preventing weights from changing too rapidly. By increasing the learning rate. By allowing smaller steps in areas with steep gradients. By adjusting the batch size.
By preventing weights from changing too rapidly.
Quiz 1 Q 1 We have Stock market data, which include the prices and volumes of various stocks on different trading days. Determine which machine learning task the following task belong to: Predict whether the stock price will go up or down the next trading day. Anomaly detection Association rule mining Clustering Classification
Classification
Quiz 1 Q 3 We have Ambulatory Medical Care data1, which contains the demographic and medical visit information for each patient (e.g., gender, age, duration of visit, physician's diagnosis, symptoms, medication, etc.). Determine which machine learning task the following task belong to: Diagnose whether a patient has a disease Anomaly detection Association rule mining Clustering Classification
Classification
Quiz 1 Q 2 We have Database of Major League Baseball (MLB). Determine which machine learning task the following task belong to: Identify groups of players with similar statistics Anomaly detection Association rule mining Clustering Classification
Clustering
Quiz 1 Q 7 Classify the following attributes as: discrete or continuous. qualitative or quantitative nominal, ordinal, interval, or ratio Average number of hours a user spent on the Internet in a week is [discrete or continuous], [qualitative or quantitative], and [nominal, ordinal, interval, or ratio].
Continuous, Quantitative, Ratio
Quiz 2 Q 10 Suppose we have two time series data which have same minimum and maximum values. Which one of the following proximity measures works best to identify lagged relationships between these time series? Euclidean Distance Correlation Jaccard Coefficient Cosine
Correlation
Quiz 2 Q 12 Which measure is most appropriate to compare the similarity of items bought by customers at a grocery store. Assume each customer is represented by a integer value vector of items (where each integer value means the customer had previously bought the item that many times). Correlation Euclidean distance Simple Matching Coefficient Cosine similarity
Cosine similarity
Quiz 7 Q 13 Which technique is used to combat overfitting by creating synthetic training examples? Parameter Sharing L1 Regularization Data Augmentation Dropout
Data Augmentation
Quiz 7 Q 18 Which technique involves adding noise to the input data to improve the generalization ability of the model? Dropout Data augmentation Early stopping L1 Regularization
Data augmentation
Quiz 5 Q 6 What is a support vector in SVM? Data points closest to the hyperplane Data points located on the hyperplane Data points farthest from the hyperplane All data points in the dataset
Data points closest to the hyperplane
Quiz 2 Q 4 Consider a data set from an online social media Web site that contains information about the age and number of friends for 5,000 users. Suppose the covariance between age and number of friends calculated using the 4,000 users (with no missing values) is 20. If youreplace the missing values for age with the average age of the 4,000 users, would the covariance between age and number of friends increases, decreases, or stays the same (as 20)? Assume that the average number of followers for all 5,000 users is the same as the average for 4,000 users. Stays the same! Decrease Increase Can not be determined!
Decrease
Quiz 3 Q 4 How is information gain calculated in Hunt's algorithm? Difference between the Gini impurity of the parent node and the weighted sum of the Gini impurities of the child nodes Ratio of the entropy of the parent node to the entropy of the child nodes Difference between the entropy of the parent node and the weighted sum of the entropies of the child nodes Ratio of the Gini impurity of the parent node to the Gini impurity of the child nodes
Difference between the Gini impurity of the parent node and the weighted sum of the Gini impurities of the child nodes You should note that for the Hunt's algorithm we use Gini index instead of entropy!
Quiz 1 Q 6 Classify the following attributes as: discrete or continuous. qualitative or quantitative nominal, ordinal, interval, or ratio Movie ratings provided by users is [discrete or continuous], [qualitative or quantitative], and [nominal, ordinal, interval, or ratio].
Discrete, Quantitative, Ordinal
Quiz 1 Q 8 Classify the following attributes as: discrete or continuous. qualitative or quantitative nominal, ordinal, interval, or ratio Age in years is [discrete or continuous], [qualitative or quantitative], and [nominal, ordinal, interval, or ratio].
Discrete, Quantitative, Ratio
Quiz 6 Q 4 For which application, recall is the best metric? Spam detection Medical diagnosis Fraud detection Disease detection
Disease detection
Quiz 7 Q 16 Which technique is used to monitor the performance of a neural network during training and stop the training process when performance stops improving? Weight Initialization Early Stopping Backpropagation Gradient Descent
Early Stopping
Quiz 5 Q 8 Consider a training set with 2 features, X1, and X2 for a binary classification problem. The class distribution is shown in the table below. X1 || X2 || Number of positive examples || Number of negative examples 1 || 1 || 20 || 8 1 || 0 || 20 || 17 0 || 1 || 5 || 8 0 || 0 || 5 || 17 True or False: Based on the information above, X1 and X2 are independent of each other. Hint: You only need to find an example to approve or disapprove the statement.
False If X1 and X2 are independent then P(X1, X2) = P(X1) * P(X2) for all values test X1 = 1 & X2 = 1 P(X1 = 1, X2 = 1) = (20 + 8) / 100 = 0.28 P(X1 = 1) * P(X2 = 1) = = ((20+8) + (20+17))/100 * ((20+8) + (5+8))/100 = 0.2665 0.28 != 0.2665, X1 and X2 are NOT independent from each other
Quiz 8 Q 3 What can a high learning rate lead to during training? Increased weight regularization Decreased model complexity Slow convergence Faster convergence and potential overshooting of minima
Faster convergence and potential overshooting of minima
Quiz 7 Q 1 Which algorithm is commonly used for optimizing the weights of a neural network during backpropagation? K-Means Decision Trees Support Vector Machines Gradient Descent
Gradient Descent
Quiz 7 Q 3 In a multi-layer perceptron (MLP), what are the layers other than input and output layers called? Transparent layers Accessible layers Hidden layers Visible layers
Hidden layers
Quiz 8 Q 10 Which of the following can lead to faster convergence during training? Large batch sizes Low momentum Using a constant learning rate High learning rates
High learning rates
Quiz 7 Q 20 Which of the following is a common problem associated with overfitting in neural networks? Underfitting Unstable gradients High bias High variance
High variance
Quiz 5 Q 2 How does Naive Bayes handle missing values in the dataset? Removes the instances with missing values Imputes missing values using a specific strategy Substitutes missing values with the mean Ignores missing values during calculation None of the above
Ignores missing values during calculation
Quiz 7 Q 15 Which of the following is a drawback of using dropout regularization? Increased risk of underfitting Inability to prevent overfitting Decreased model complexity Increased training time
Increased risk of underfitting
Quiz 4 Q 1 What effect does increasing the depth of a decision tree typically have on overfitting? Eliminates overfitting entirely Reduces overfitting Has no effect on overfitting Increases overfitting
Increases overfitting
Quiz 8 Q 15 How does momentum help in training neural networks? It prevents overfitting. It adjusts the batch size. It accelerates updates and smooths the path to convergence. It increases the learning rate over time.
It accelerates updates and smooths the path to convergence.
Quiz 5 Q 7 Which of the following statements is true about the soft-margin SVM classifier? (soft-margin is referred to the case that the dataset is not separable) It allows for misclassification of training examples It does not support nonlinear decision boundaries It has no regularization parameter It has a larger margin compared to the hard-margin classifier
It allows for misclassification of training examples
Quiz 4 Q 6 In k-fold cross-validation, what is the significance of the value of k? It determines the number of folds into which the dataset will be divided. It determines the size of the test set. It determines the size of the training set. It determines the number of times the model will be trained.
It determines the number of folds into which the dataset will be divided.
Quiz 5 Q 5 What is the purpose of the kernel trick in SVM? It efficiently computes dot products in high-dimensional space It prevents the model from overfitting It transforms the data to a lower-dimensional space None of the above It regularizes the model to handle noisy data
It efficiently computes dot products in high-dimensional space
Quiz 4 Q 3 Which of the following is a characteristic of post-pruning? It is computationally expensive It involves removing branches from the tree after it has been fully grown It tends to result in overly complex trees It stops the growth of the tree based on certain conditions during training
It involves removing branches from the tree after it has been fully grown
Quiz 3 Q 1 Which of the following statements about Hunt's algorithm is true? It works best with linearly separable data. It guarantees an optimal decision tree for any dataset. It is a recursive partitioning method that aims to find the best splits in the feature space based on impurity measures. It always produces the same tree structure regardless of the dataset.
It is a recursive partitioning method that aims to find the best splits in the feature space based on impurity measures.
Quiz 4 Q 9 How does increasing the value of K in K-nearest neighbor classification affect the decision boundary? It has no effect on the decision boundary It makes the decision boundary less sensitive to noise It makes the decision boundary more linear It makes the decision boundary more complex
It makes the decision boundary less sensitive to noise
Quiz 4 Q 8 How does the choice of distance metric impact the performance of the K-nearest neighbor algorithm? It only affects the computational efficiency of the algorithm It determines the optimal value of K for the dataset It has no impact on performance It may affect the sensitivity of the algorithm to feature scales
It may affect the sensitivity of the algorithm to feature scales
Quiz 8 Q 11 Why might a large batch size be problematic in training neural networks? It may require more frequent weight updates. It may cause overfitting. It may lead to more noisy updates. It may result in slower convergence for large dataset sizes.
It may result in slower convergence for large dataset sizes.
Quiz 4 Q 4 Which of the following is a characteristic of pre-pruning? It is computationally expensive It tends to result in overly complex trees It stops the growth of the tree based on certain conditions during training It is only applied after the tree has been fully grown
It stops the growth of the tree based on certain conditions during training
Quiz 2 Q 11 Which measure is most appropriate to compare how similar are the locations visited by tourists at an amusement park. Assume the location information is stored as binary yes/no attributes (yes means a location was visited by the tourist and no means a location has not been visited). Jaccard Coefficient Cosine Euclidean Distance Simple Matching Coefficient
Jaccard Coefficient Jaccard. Here places visited by the tourists should play a more significant role in computing similarity than places they did not visit.
Quiz 7 Q 14 Which type of regularization technique encourages sparsity in the neural network by penalizing the absolute values of weights? L2 Regularization Early Stopping Dropout L1 Regularization
L1 Regularization
Quiz 7 Q 4 Which of the following is a commonly used regularization technique for neural networks? Gradient Descent Random Forest L1 Regularization Feature Scaling
L1 Regularization
Quiz 7 Q 7 Which of the following regularization techniques adds a penalty term to the loss function based on the squared magnitude of weights? Dropout L2 Regularization L1 Regularization Early Stopping
L2 Regularization
Quiz 8 Q 4 Which of the following is a challenge when training deep neural networks? Loss of gradient Overparameterization Weight constraints Underfitting
Loss of gradient
Quiz 5 Q 4 SVM aims to find the hyperplane that: Is orthogonal to the axes Maximizes the margin between classes Divides the dataset into multiple hyperplanes None of the above Minimizes the margin between classes
Maximizes the margin between classes
Quiz 3 Q 2 Which of the following is NOT a step in Hunt's algorithm? Select the feature that maximizes information gain. Repeat the process recursively for each subset until stopping criteria are met. Merge the decision tree with a random forest ensemble. Split the dataset into training and test sets.
Merge the decision tree with a random forest ensemble. Merging the decision tree with a random forest ensemble is not a step in Hunt's algorithm.
Quiz 3 Q 3 What is the main objective of the Hunt's algorithm? Minimize tree depth Minimize impurity in tree nodes Maximize tree complexity Minimize information gain at each split
Minimize impurity in tree nodes The main objective of Hunt's algorithm is to minimize impurity at each node of the decision tree, usually by maximizing information gain or minimizing Gini impurity or entropy.
Quiz 6 Q 15 Model A: Predicted (Negative) Predicted (Positive) Actual (Negative) 800 100 Actual (Positive) 20 80 Model B: Predicted (Negative) Predicted (Positive) Actual (Negative) 750 150 Actual (Positive) 10 90 Which of the following statements is correct when comparing the performance of Model A and Model B? Model A has higher precision and higher recall compared to Model B Model A has lower precision and lower recall compared to Model B Model A has higher precision but lower recall compared to Model B Model A has lower precision but higher recall compared to Model B
Model A has higher precision but lower recall compared to Model B Model A Precision = 80/ 80+100 = 0.44 Model B Precision = 90/90+150 = 0.375 Model A Recall = 80/80+20 = 0.8 Model B Recall = 90/90+10 = 0.9
Quiz 2 Q 8 What one of the following is NOT a property for a metric measure? Symmetry Negativity Triangle Inequality Positivity
Negativity
Quiz 1 Q 9 State the type of each attribute given below before and after we have performed the following transformation. Hair color of a person is mapped to the following values: black = 0, brown = 1, red = 2, blonde = 3, grey = 4, white = 5. nominal ordinal interval ratio
Nominal
Quiz 2 Q 9 Suppose you are given a census data, where every data object corresponds to a household and the following continuous attributes are used to characterize each household: total household income, number of household residents, property value, number of bedrooms, and number of vehicles owned. Suppose we are interested in clustering the households based on these attributes. Which one of the proximity measures best suits for the above problem. None of the proximity measures are suitable unless we standardize the data first. Euclidean Distance Cosine Corrolation
None of the proximity measures are suitable unless we standardize the data first.
Quiz 3 Q 6 For the continuous attributes, you want to choose the splitting value, i.e., v. One approach is to scan the database to gather count matrix and compute its Gini index. What is the running time of this approach? Consider n to be number of splitting values. O(2^n) O(n) O(n^2) O(n log n)
O(n^2) For each candidate v, the data set is scanned once to count the number of records with annual income less than or greater than v. We then compute the Gini index for each candidate and choose the one that gives the lowest value. This approach is computationally expensive because it requires O(N) operations to compute the Gini index at each candidate split position. Since there are N candidates, the overall complexity of this task is O(N^2).
Quiz 6 Q 3 F1 Score is the harmonic mean of: Precision and Recall Sensitivity and Specificity Accuracy and Recall Precision and Specificity
Precision and Recall
Quiz 8 Q 7 Which of the following is true about parameter initialization in neural networks? Parameter initialization has no impact on model performance. Improper initialization can lead to faster convergence. Zero initialization works best in most deep learning models. Proper initialization can help avoid vanishing and exploding gradients.
Proper initialization can help avoid vanishing and exploding gradients.
Quiz 4 Q 2 Which of the following techniques can be used to combat overfitting in decision trees? Pruning the tree Using a smaller training dataset Increasing the maximum depth of the tree Adding more noise to the training data
Pruning the tree
Quiz 8 Q 1 Which optimization algorithm only relies on the average of past gradients to smooth out parameter updates? AdaGrad AdaBoost RMSProp Adam
RMSProp
Quiz 6 Q 2 In which one of the following methods, we create group of classifiers by manipulating the input features? Boosting Random forest Bagging error-correcting output coding
Random forest
Quiz 6 Q 1 Bagging is a technique used in ensemble learning primarily for: Reducing variance Increasing interpretability Reducing bias Improving accuracy
Reducing variance
Quiz 8 Q 9 How does Stochastic Gradient Descent (SGD) differ from batch gradient descent? SGD uses the entire dataset for each update. SGD does not adjust the learning rate. SGD uses a subset of data points for each update. SGD's convergence trajectory is less noisy than batch gradient decent.
SGD uses a subset of data points for each update.
Quiz 7 Q 9 Which activation function is commonly used in the output layer of a binary classification neural network? ReLU Sigmoid Softmax Tanh
Sigmoid
Quiz 8 Q 14 How can batch size affect the stability and performance of a neural network during training? Larger batches allow more frequent parameter updates Larger batches may reduce generalization capability. Smaller batches tend to increase the learning rate. Smaller batches generally lead to less stable and noisier updates.
Smaller batches generally lead to less stable and noisier updates.
Quiz 6 Q 8 Which metric measures the proportion of true negatives correctly identified? Precision F1 Score Recall Specificity
Specificity
Quiz 2 Q 3 Consider a data set from an online social media Web site that contains information about the age and number of friends for 5,000 users. Suppose the number of friends for each user is known. However, only 4000 out of 5000 users provide their age information. Theaverage age of the 4,000 users is 30 years old. If you replace the missing values for age with the value 30, will the average age computed for the 5,000 users increases, decreases, or stays the same (as 30)? Stays the same! Decrease Increase Can not be determined!
Stays the same!
Quiz 8 Q 13 Which method updates model parameters using a randomly sampled subset of data in each iteration? Stochastic Gradient Descent (SGD) Newton's Method Batch Gradient Descent Coordinate Descent
Stochastic Gradient Descent (SGD)
Quiz 6 Q 10 To draw ROC graph, which metrics are used: TNR and FNR TPR and FNR TNR and FPR TPR and FPR
TPR and FPR True Positive Ration and False Positive Ratio
Quiz 6 Q 11 Consider a binary classification model with the following ROC curve: Which of the following statements is true regarding the performance of the model (i.e., red curve) based on the ROC curve? The model's performance is better than random guessing The model has a sensitivity of 0.5 The model's performance is worse than random guessing The model has perfect discrimination between positive and negative classes
The model's performance is better than random guessing
Quiz 5 Q 1 What assumption does Naive Bayes make about the features? They are conditionally independent They are dependent None of the above They are normally distributed They are uncorrelated
They are conditionally independent
Quiz 5 Q 3 How are conditional dependencies represented in a BBN? Through directed edges Through bidirectional edges None of the above Through undirected edges Through nodes
Through directed edges
Quiz 8 Q 8 What is the role of momentum in optimization algorithms like SGD with momentum? To decrease the learning rate. To maintain a constant learning rate. To reduce the weight decay. To accelerate updates and dampen oscillations.
To accelerate updates and dampen oscillations.
Quiz 8 Q 5 What is the purpose of learning rate in optimization algorithms? To measure the loss function. To adjust the step size based on the value of the gradient. To determine the number of layers in the network. To control the model complexity.
To adjust the step size based on the value of the gradient.
Quiz 7 Q 10 In the context of neural networks, what is the purpose of the learning rate? To define the size of the input data To control the amount by which weights are updated during training To measure the complexity of the model To determine the number of layers in the network
To control the amount by which weights are updated during training
Quiz 7 Q 5 What is the primary objective of regularization techniques in deep learning? To overfit the training data To increase model complexity To decrease model complexity To fit training data perfectly
To decrease model complexity
Quiz 7 Q 2 What is the purpose of the activation function in a neural network? To normalize the input data To introduce non-linearity To calculate the loss function To control the learning rate
To introduce non-linearity
Quiz 7 Q 17 What is the purpose of the loss function in a neural network? To determine the number of layers in the network To initialize the weights To calculate the accuracy of the model To measure the difference between predicted and actual values
To measure the difference between predicted and actual values
Quiz 7 Q 12 What is the purpose of the softmax function in the output layer of a multi-class classification neural network? To introduce non-linearity To prevent overfitting To normalize the output probabilities To calculate the loss function
To normalize the output probabilities
Quiz 7 Q 8 What is the purpose of dropout regularization in neural networks? To decrease the learning rate To increase the number of neurons in the network To increase the weight regularization To randomly remove neurons during training to prevent overfitting
To randomly remove neurons during training to prevent overfitting
Quiz 4 Q 5 What is the purpose of a validation set in machine learning? To test the model's performance To evaluate the model on unseen data after training To train the model To tune hyperparameters and assess model performance during training
To tune hyperparameters and assess model performance during training
Quiz 6 Q 12 Consider a binary classification problem with the following confusion matrix: Predicted (Negative) Predicted (Positive) Actual (Negative) 850 50 Actual (Positive) 30 70 What is the precision of the model?
0.5833 Precision = True Positive / (True Positive + False Positive) = 70/ 70 + 50 = 0.5833
Quiz 7 Q 19 In which phase of training does the model update its weights based on the calculated gradients? Initialization phase Evaluation phase Forward pass Backward pass
Backward pass
Quiz 6 Q 9 Which ensemble learning method trains multiple models on different subsets of the data and averages their predictions? Bagging Stacking Boosting Blending
Bagging
Quiz 2 Q 5 Consider the following binary vectors: x1 = (1, 1, 1, 1, 1) x2 = (1, 1, 1, 0, 0) What is the Jaccard coefficient for the pair of vectors x1 and x2?
0.6 Jaccard coefficient = f11/(f01+f10+f11) = 3/(2+0+3) = 0.6 f11 (X:1, Y:1) f01 (X:0, Y:1) f10 (X:1, Y:0) F00 (X:0, Y:0)
Quiz 2 Q 6 Consider the following binary vectors: x1 = (1, 1, 1, 1, 1) x2 = (1, 1, 1, 0, 0) What is the Simple Matching Coefficient for the pair of vectors x1 and x2?
0.6 SMC = (f00+f11)/(f00+f01+f10+f11) = (0+3)/(0+0+2+3) = 0.6 f11 (X:1, Y:1) f01 (X:0, Y:1) f10 (X:1, Y:0) F00 (X:0, Y:0)
Quiz 3 Q 7 Consider a dataset with the following target classes and corresponding frequencies: Class A: 20 instances Class B: 30 instances Class C: 50 instances Calculate the Gini index for this dataset.
0.62 First, calculate the Gini impurity for each class: Gini(A) = = 1 - (20/100)^2 - (30/100)^2 - (50/100)^2 = 1 - 0.04 - 0.09 - 0.25 = 0.62
Quiz 6 Q 14 Consider a binary classification problem with the following confusion matrix: Predicted (Negative) Predicted (Positive) Actual (Negative) 850 50 Actual (Positive) 30 70 What is the F1 score of the model?
0.6403 F1 Score = 2 * ((Precision * Recall)/(Precision + Recall)) = 2 * ((0.5833 * 0.7) / (0.5833 + 0.7)) = 0.6403
Quiz 6 Q 13 Consider a binary classification problem with the following confusion matrix: Predicted (Negative) Predicted (Positive) Actual (Negative) 850 50 Actual (Positive) 30 70 In the same binary classification problem, what is the recall (sensitivity) of the model?
0.7 Recall = TP / (TP + FN) = 70 / (70 + 30) = 0.7
Quiz 1 Q 4 We have Ambulatory Medical Care data1, which contains the demographic and medical visit information for each patient (e.g., gender, age, duration of visit, physician's diagnosis, symptoms, medication, etc.). Determine which machine learning task the following task belong to: Identify the symptoms and medical conditions that co-occur together frequently Anomaly detection Association rule mining Clustering Classification
Association rule mining
Quiz 6 Q 5 A confusion matrix is used to: Visualize the performance of a classification model Measure the goodness of fit of a time series model Evaluate the performance of a clustering algorithm Compute the accuracy of a regression model
Visualize the performance of a classification model
Quiz 7 Q 11 Which of the following is NOT a step in the backpropagation algorithm? Forward Pass Gradient Descent Error Computation Weight Initialization
Weight Initialization
Quiz 3 Q 8 Consider the following data set that contains 60 training examples (30 labeled as positive class while the remainder labeled as negative class). X || Y || No. of + Examples || No. of - Examples 1 || 1 || 10 || 0 1 || 0 || 20 || 0 0 || 1 || 0 || 15 0 || 0 || 0 || 15 Based on the dataset above, attribute X should be selected as the splitting attribute based on Gini index. The Gini index value for attribute X is 0.5 while for attribute Y is 0.47 .
X, 0, 0.47 If we choose X, we get X || N1 || N2 + || 30 || 0 _ || 0 || 30 If we choose Y, we get Y || N1 || N2 + || 10 || 15 _ || 20 || 15 Gini index is lower while splitting with X. Gini index for attribute X is 0 (it is easy to show). However, for the attribute Y is calculated as follows: For N1: 1-((10/30)^2 + (20/30)^2) = 1-(1/9 +4/9) = 4/9 For N2: 1-((15/30)^2 + (15/30)^2) = 1-(1/4 + 1/4) = 1/2 Weighted sum: (30/60)*4/9 + (30/60)*1/2 = 17/36 = 0.47