ML final CECS 456

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Quiz 2 Q 7 Consider the following binary vectors: x1 = (1, 1, 1, 1, 1) x2 = (1, 1, 1, 0, 0) What is the Euclidean distance for the pair of vectors x1 and x2?

\sqrt{2} We can calculate the Euclidean distance step-by-step: Subtract the corresponding elements (X1 - X2): (1−1,1−1,1−1,1−0,1−0)=(0,0,0,1,1) Square each of these differences: 0^2,0^2,0^2,1^2,1^2 = 0,0,0,1,1 Sum the squared differences: 0+0+0+1+1=2 Take the square root of the sum: \sqrt{2}​

Quiz 5 Q 9

https://csulb.instructure.com/courses/67060/quizzes/265530

Quiz 6 Q 16

https://csulb.instructure.com/courses/67060/quizzes/266225

Quiz 4 Q 10 Consider the decision trees shown in Figures (a) and (b). For each approach described below, you need to compute the generalization errors for both trees and decide which tree is better. The training data set is shown in Figure (c). If we use optimistic approach (assumes generalization error is given by the training error), then error rates for trees A and B are [0%, 10%, 20%, 30%] and [0%, 10%, 20%, 30%] respectively. So tree [A, B, They have equal generalization errors!, They cannot be determined!] is better.

10%, 20%, A,

Quiz 4 Q 11 Consider the decision trees shown in Figures (a) and (b). For each approach described below, you need to compute the generalization errors for both trees and decide which tree is better. The training data set is shown in Figure (c). If we use pessimistic approach (assumes generalization error is given by the training error and model complexity), then error rates for trees A and B are [1/10, 6/10, 10/10, 11/10] and [2/10, 6/10, 10/10, 11/10], respectively. So tree [A, B, They have equal generalization errors!, They cannot be determined!] is better. Assume trade-off hyper-parameter Ω=2.

11/10, 10/10, B

Quiz 7 Q 6 What is backpropagation in the context of neural networks? A method for preprocessing input data A method for visualizing neural network architectures A method for updating network weights based on the gradient of the loss function A method for regularizing neural networks

A method for updating network weights based on the gradient of the loss function

Quiz 3 Q 5 In decision trees, what does a leaf node represent? A decision based on a feature A root node A split in the dataset A prediction or classification

A prediction or classification

Quiz 6 Q 7 Which performance metric is most influenced by the class distribution in the dataset? Precision Recall Accuracy F1 Score

Accuracy

Quiz 8 Q 6 Which optimization technique combines momentum and RMSProp? Stochastic Gradient Decent (SGD) Adadelta Adam AdaGrad

Adam

Quiz 1 Q 5 We have Stock market data, which include the prices and volumes of various stocks on different trading days. Determine which machine learning task the following task belong to: Identify unusual trading days for a given stock (e.g., unusually high volume) Anomaly detection Association rule mining Clustering Classification

Anomaly detection

Quiz 2 Q 2 Consider the following dataset that contains the age and gender information for 9 users who visited a given website. Suppose you apply equal frequency approach to discretize the Age attribute into 3 bins. What will be the size of each of the 3 bins.

Bin 1: 3 Bin 2: 3 Bin 3: 3 Since there are 9 users and 3 bins, every bin must contain 3 users. Bin 1: 1, 2, 3 Bin 2: 4, 5, 6 Bin 3: 7, 8, 9

Quiz 2 Q 1 Consider the following dataset that contains the age and gender information for 9 users who visited a given website. Suppose you apply equal interval width approach to discretize the Age attribute into 3 bins. What will be the size of each of the 3 bins.

Bin 1: 5 Bin 2: 3 Bin 3: 1 Bin width =(68-17)/3 = 1 Bin 1: 1, 2, 3, 4, Bin 2: 6, 7, Bin 3: 9

Quiz 6 Q 6 Which ensemble learning method combines predictions from multiple models using a weighted sum? a) b) c) d) Blending Stacking Bagging error-correcting output coding Boosting

Boosting

Quiz 4 Q 7 In K-nearest neighbor (KNN) classification, how is the class label of a new data point determined? By fitting a linear decision boundary based on its K nearest neighbors By calculating the mean of the features of its K nearest neighbors By assigning the majority class label among its K nearest neighbors By selecting the class label of its farthest neighbor

By assigning the majority class label among its K nearest neighbors

Quiz 8 Q 2 How does batch normalization contribute to faster training? By reducing learning rate By initializing parameters more effectively By eliminating the need for data augmentation By ensuring consistent gradients throughout training

By ensuring consistent gradients throughout training

Quiz 8 Q 12 How can adding momentum to SGD help avoid overshooting? By preventing weights from changing too rapidly. By increasing the learning rate. By allowing smaller steps in areas with steep gradients. By adjusting the batch size.

By preventing weights from changing too rapidly.

Quiz 1 Q 1 We have Stock market data, which include the prices and volumes of various stocks on different trading days. Determine which machine learning task the following task belong to: Predict whether the stock price will go up or down the next trading day. Anomaly detection Association rule mining Clustering Classification

Classification

Quiz 1 Q 3 We have Ambulatory Medical Care data1, which contains the demographic and medical visit information for each patient (e.g., gender, age, duration of visit, physician's diagnosis, symptoms, medication, etc.). Determine which machine learning task the following task belong to: Diagnose whether a patient has a disease Anomaly detection Association rule mining Clustering Classification

Classification

Quiz 1 Q 2 We have Database of Major League Baseball (MLB). Determine which machine learning task the following task belong to: Identify groups of players with similar statistics Anomaly detection Association rule mining Clustering Classification

Clustering

Quiz 1 Q 7 Classify the following attributes as: discrete or continuous. qualitative or quantitative nominal, ordinal, interval, or ratio Average number of hours a user spent on the Internet in a week is [discrete or continuous], [qualitative or quantitative], and [nominal, ordinal, interval, or ratio].

Continuous, Quantitative, Ratio

Quiz 2 Q 10 Suppose we have two time series data which have same minimum and maximum values. Which one of the following proximity measures works best to identify lagged relationships between these time series? Euclidean Distance Correlation Jaccard Coefficient Cosine

Correlation

Quiz 2 Q 12 Which measure is most appropriate to compare the similarity of items bought by customers at a grocery store. Assume each customer is represented by a integer value vector of items (where each integer value means the customer had previously bought the item that many times). Correlation Euclidean distance Simple Matching Coefficient Cosine similarity

Cosine similarity

Quiz 7 Q 13 Which technique is used to combat overfitting by creating synthetic training examples? Parameter Sharing L1 Regularization Data Augmentation Dropout

Data Augmentation

Quiz 7 Q 18 Which technique involves adding noise to the input data to improve the generalization ability of the model? Dropout Data augmentation Early stopping L1 Regularization

Data augmentation

Quiz 5 Q 6 What is a support vector in SVM? Data points closest to the hyperplane Data points located on the hyperplane Data points farthest from the hyperplane All data points in the dataset

Data points closest to the hyperplane

Quiz 2 Q 4 Consider a data set from an online social media Web site that contains information about the age and number of friends for 5,000 users. Suppose the covariance between age and number of friends calculated using the 4,000 users (with no missing values) is 20. If youreplace the missing values for age with the average age of the 4,000 users, would the covariance between age and number of friends increases, decreases, or stays the same (as 20)? Assume that the average number of followers for all 5,000 users is the same as the average for 4,000 users. Stays the same! Decrease Increase Can not be determined!

Decrease

Quiz 3 Q 4 How is information gain calculated in Hunt's algorithm? Difference between the Gini impurity of the parent node and the weighted sum of the Gini impurities of the child nodes Ratio of the entropy of the parent node to the entropy of the child nodes Difference between the entropy of the parent node and the weighted sum of the entropies of the child nodes Ratio of the Gini impurity of the parent node to the Gini impurity of the child nodes

Difference between the Gini impurity of the parent node and the weighted sum of the Gini impurities of the child nodes You should note that for the Hunt's algorithm we use Gini index instead of entropy!

Quiz 1 Q 6 Classify the following attributes as: discrete or continuous. qualitative or quantitative nominal, ordinal, interval, or ratio Movie ratings provided by users is [discrete or continuous], [qualitative or quantitative], and [nominal, ordinal, interval, or ratio].

Discrete, Quantitative, Ordinal

Quiz 1 Q 8 Classify the following attributes as: discrete or continuous. qualitative or quantitative nominal, ordinal, interval, or ratio Age in years is [discrete or continuous], [qualitative or quantitative], and [nominal, ordinal, interval, or ratio].

Discrete, Quantitative, Ratio

Quiz 6 Q 4 For which application, recall is the best metric? Spam detection Medical diagnosis Fraud detection Disease detection

Disease detection

Quiz 7 Q 16 Which technique is used to monitor the performance of a neural network during training and stop the training process when performance stops improving? Weight Initialization Early Stopping Backpropagation Gradient Descent

Early Stopping

Quiz 5 Q 8 Consider a training set with 2 features, X1, and X2 for a binary classification problem. The class distribution is shown in the table below. X1 || X2 || Number of positive examples || Number of negative examples 1 || 1 || 20 || 8 1 || 0 || 20 || 17 0 || 1 || 5 || 8 0 || 0 || 5 || 17 True or False: Based on the information above, X1 and X2 are independent of each other. Hint: You only need to find an example to approve or disapprove the statement.

False If X1 and X2 are independent then P(X1, X2) = P(X1) * P(X2) for all values test X1 = 1 & X2 = 1 P(X1 = 1, X2 = 1) = (20 + 8) / 100 = 0.28 P(X1 = 1) * P(X2 = 1) = = ((20+8) + (20+17))/100 * ((20+8) + (5+8))/100 = 0.2665 0.28 != 0.2665, X1 and X2 are NOT independent from each other

Quiz 8 Q 3 What can a high learning rate lead to during training? Increased weight regularization Decreased model complexity Slow convergence Faster convergence and potential overshooting of minima

Faster convergence and potential overshooting of minima

Quiz 7 Q 1 Which algorithm is commonly used for optimizing the weights of a neural network during backpropagation? K-Means Decision Trees Support Vector Machines Gradient Descent

Gradient Descent

Quiz 7 Q 3 In a multi-layer perceptron (MLP), what are the layers other than input and output layers called? Transparent layers Accessible layers Hidden layers Visible layers

Hidden layers

Quiz 8 Q 10 Which of the following can lead to faster convergence during training? Large batch sizes Low momentum Using a constant learning rate High learning rates

High learning rates

Quiz 7 Q 20 Which of the following is a common problem associated with overfitting in neural networks? Underfitting Unstable gradients High bias High variance

High variance

Quiz 5 Q 2 How does Naive Bayes handle missing values in the dataset? Removes the instances with missing values Imputes missing values using a specific strategy Substitutes missing values with the mean Ignores missing values during calculation None of the above

Ignores missing values during calculation

Quiz 7 Q 15 Which of the following is a drawback of using dropout regularization? Increased risk of underfitting Inability to prevent overfitting Decreased model complexity Increased training time

Increased risk of underfitting

Quiz 4 Q 1 What effect does increasing the depth of a decision tree typically have on overfitting? Eliminates overfitting entirely Reduces overfitting Has no effect on overfitting Increases overfitting

Increases overfitting

Quiz 8 Q 15 How does momentum help in training neural networks? It prevents overfitting. It adjusts the batch size. It accelerates updates and smooths the path to convergence. It increases the learning rate over time.

It accelerates updates and smooths the path to convergence.

Quiz 5 Q 7 Which of the following statements is true about the soft-margin SVM classifier? (soft-margin is referred to the case that the dataset is not separable) It allows for misclassification of training examples It does not support nonlinear decision boundaries It has no regularization parameter It has a larger margin compared to the hard-margin classifier

It allows for misclassification of training examples

Quiz 4 Q 6 In k-fold cross-validation, what is the significance of the value of k? It determines the number of folds into which the dataset will be divided. It determines the size of the test set. It determines the size of the training set. It determines the number of times the model will be trained.

It determines the number of folds into which the dataset will be divided.

Quiz 5 Q 5 What is the purpose of the kernel trick in SVM? It efficiently computes dot products in high-dimensional space It prevents the model from overfitting It transforms the data to a lower-dimensional space None of the above It regularizes the model to handle noisy data

It efficiently computes dot products in high-dimensional space

Quiz 4 Q 3 Which of the following is a characteristic of post-pruning? It is computationally expensive It involves removing branches from the tree after it has been fully grown It tends to result in overly complex trees It stops the growth of the tree based on certain conditions during training

It involves removing branches from the tree after it has been fully grown

Quiz 3 Q 1 Which of the following statements about Hunt's algorithm is true? It works best with linearly separable data. It guarantees an optimal decision tree for any dataset. It is a recursive partitioning method that aims to find the best splits in the feature space based on impurity measures. It always produces the same tree structure regardless of the dataset.

It is a recursive partitioning method that aims to find the best splits in the feature space based on impurity measures.

Quiz 4 Q 9 How does increasing the value of K in K-nearest neighbor classification affect the decision boundary? It has no effect on the decision boundary It makes the decision boundary less sensitive to noise It makes the decision boundary more linear It makes the decision boundary more complex

It makes the decision boundary less sensitive to noise

Quiz 4 Q 8 How does the choice of distance metric impact the performance of the K-nearest neighbor algorithm? It only affects the computational efficiency of the algorithm It determines the optimal value of K for the dataset It has no impact on performance It may affect the sensitivity of the algorithm to feature scales

It may affect the sensitivity of the algorithm to feature scales

Quiz 8 Q 11 Why might a large batch size be problematic in training neural networks? It may require more frequent weight updates. It may cause overfitting. It may lead to more noisy updates. It may result in slower convergence for large dataset sizes.

It may result in slower convergence for large dataset sizes.

Quiz 4 Q 4 Which of the following is a characteristic of pre-pruning? It is computationally expensive It tends to result in overly complex trees It stops the growth of the tree based on certain conditions during training It is only applied after the tree has been fully grown

It stops the growth of the tree based on certain conditions during training

Quiz 2 Q 11 Which measure is most appropriate to compare how similar are the locations visited by tourists at an amusement park. Assume the location information is stored as binary yes/no attributes (yes means a location was visited by the tourist and no means a location has not been visited). Jaccard Coefficient Cosine Euclidean Distance Simple Matching Coefficient

Jaccard Coefficient Jaccard. Here places visited by the tourists should play a more significant role in computing similarity than places they did not visit.

Quiz 7 Q 14 Which type of regularization technique encourages sparsity in the neural network by penalizing the absolute values of weights? L2 Regularization Early Stopping Dropout L1 Regularization

L1 Regularization

Quiz 7 Q 4 Which of the following is a commonly used regularization technique for neural networks? Gradient Descent Random Forest L1 Regularization Feature Scaling

L1 Regularization

Quiz 7 Q 7 Which of the following regularization techniques adds a penalty term to the loss function based on the squared magnitude of weights? Dropout L2 Regularization L1 Regularization Early Stopping

L2 Regularization

Quiz 8 Q 4 Which of the following is a challenge when training deep neural networks? Loss of gradient Overparameterization Weight constraints Underfitting

Loss of gradient

Quiz 5 Q 4 SVM aims to find the hyperplane that: Is orthogonal to the axes Maximizes the margin between classes Divides the dataset into multiple hyperplanes None of the above Minimizes the margin between classes

Maximizes the margin between classes

Quiz 3 Q 2 Which of the following is NOT a step in Hunt's algorithm? Select the feature that maximizes information gain. Repeat the process recursively for each subset until stopping criteria are met. Merge the decision tree with a random forest ensemble. Split the dataset into training and test sets.

Merge the decision tree with a random forest ensemble. Merging the decision tree with a random forest ensemble is not a step in Hunt's algorithm.

Quiz 3 Q 3 What is the main objective of the Hunt's algorithm? Minimize tree depth Minimize impurity in tree nodes Maximize tree complexity Minimize information gain at each split

Minimize impurity in tree nodes The main objective of Hunt's algorithm is to minimize impurity at each node of the decision tree, usually by maximizing information gain or minimizing Gini impurity or entropy.

Quiz 6 Q 15 Model A: Predicted (Negative) Predicted (Positive) Actual (Negative) 800 100 Actual (Positive) 20 80 Model B: Predicted (Negative) Predicted (Positive) Actual (Negative) 750 150 Actual (Positive) 10 90 Which of the following statements is correct when comparing the performance of Model A and Model B? Model A has higher precision and higher recall compared to Model B Model A has lower precision and lower recall compared to Model B Model A has higher precision but lower recall compared to Model B Model A has lower precision but higher recall compared to Model B

Model A has higher precision but lower recall compared to Model B Model A Precision = 80/ 80+100 = 0.44 Model B Precision = 90/90+150 = 0.375 Model A Recall = 80/80+20 = 0.8 Model B Recall = 90/90+10 = 0.9

Quiz 2 Q 8 What one of the following is NOT a property for a metric measure? Symmetry Negativity Triangle Inequality Positivity

Negativity

Quiz 1 Q 9 State the type of each attribute given below before and after we have performed the following transformation. Hair color of a person is mapped to the following values: black = 0, brown = 1, red = 2, blonde = 3, grey = 4, white = 5. nominal ordinal interval ratio

Nominal

Quiz 2 Q 9 Suppose you are given a census data, where every data object corresponds to a household and the following continuous attributes are used to characterize each household: total household income, number of household residents, property value, number of bedrooms, and number of vehicles owned. Suppose we are interested in clustering the households based on these attributes. Which one of the proximity measures best suits for the above problem. None of the proximity measures are suitable unless we standardize the data first. Euclidean Distance Cosine Corrolation

None of the proximity measures are suitable unless we standardize the data first.

Quiz 3 Q 6 For the continuous attributes, you want to choose the splitting value, i.e., v. One approach is to scan the database to gather count matrix and compute its Gini index. What is the running time of this approach? Consider n to be number of splitting values. O(2^n) O(n) O(n^2) O(n log n)

O(n^2) For each candidate v, the data set is scanned once to count the number of records with annual income less than or greater than v. We then compute the Gini index for each candidate and choose the one that gives the lowest value. This approach is computationally expensive because it requires O(N) operations to compute the Gini index at each candidate split position. Since there are N candidates, the overall complexity of this task is O(N^2).

Quiz 6 Q 3 F1 Score is the harmonic mean of: Precision and Recall Sensitivity and Specificity Accuracy and Recall Precision and Specificity

Precision and Recall

Quiz 8 Q 7 Which of the following is true about parameter initialization in neural networks? Parameter initialization has no impact on model performance. Improper initialization can lead to faster convergence. Zero initialization works best in most deep learning models. Proper initialization can help avoid vanishing and exploding gradients.

Proper initialization can help avoid vanishing and exploding gradients.

Quiz 4 Q 2 Which of the following techniques can be used to combat overfitting in decision trees? Pruning the tree Using a smaller training dataset Increasing the maximum depth of the tree Adding more noise to the training data

Pruning the tree

Quiz 8 Q 1 Which optimization algorithm only relies on the average of past gradients to smooth out parameter updates? AdaGrad AdaBoost RMSProp Adam

RMSProp

Quiz 6 Q 2 In which one of the following methods, we create group of classifiers by manipulating the input features? Boosting Random forest Bagging error-correcting output coding

Random forest

Quiz 6 Q 1 Bagging is a technique used in ensemble learning primarily for: Reducing variance Increasing interpretability Reducing bias Improving accuracy

Reducing variance

Quiz 8 Q 9 How does Stochastic Gradient Descent (SGD) differ from batch gradient descent? SGD uses the entire dataset for each update. SGD does not adjust the learning rate. SGD uses a subset of data points for each update. SGD's convergence trajectory is less noisy than batch gradient decent.

SGD uses a subset of data points for each update.

Quiz 7 Q 9 Which activation function is commonly used in the output layer of a binary classification neural network? ReLU Sigmoid Softmax Tanh

Sigmoid

Quiz 8 Q 14 How can batch size affect the stability and performance of a neural network during training? Larger batches allow more frequent parameter updates Larger batches may reduce generalization capability. Smaller batches tend to increase the learning rate. Smaller batches generally lead to less stable and noisier updates.

Smaller batches generally lead to less stable and noisier updates.

Quiz 6 Q 8 Which metric measures the proportion of true negatives correctly identified? Precision F1 Score Recall Specificity

Specificity

Quiz 2 Q 3 Consider a data set from an online social media Web site that contains information about the age and number of friends for 5,000 users. Suppose the number of friends for each user is known. However, only 4000 out of 5000 users provide their age information. Theaverage age of the 4,000 users is 30 years old. If you replace the missing values for age with the value 30, will the average age computed for the 5,000 users increases, decreases, or stays the same (as 30)? Stays the same! Decrease Increase Can not be determined!

Stays the same!

Quiz 8 Q 13 Which method updates model parameters using a randomly sampled subset of data in each iteration? Stochastic Gradient Descent (SGD) Newton's Method Batch Gradient Descent Coordinate Descent

Stochastic Gradient Descent (SGD)

Quiz 6 Q 10 To draw ROC graph, which metrics are used: TNR and FNR TPR and FNR TNR and FPR TPR and FPR

TPR and FPR True Positive Ration and False Positive Ratio

Quiz 6 Q 11 Consider a binary classification model with the following ROC curve: Which of the following statements is true regarding the performance of the model (i.e., red curve) based on the ROC curve? The model's performance is better than random guessing The model has a sensitivity of 0.5 The model's performance is worse than random guessing The model has perfect discrimination between positive and negative classes

The model's performance is better than random guessing

Quiz 5 Q 1 What assumption does Naive Bayes make about the features? They are conditionally independent They are dependent None of the above They are normally distributed They are uncorrelated

They are conditionally independent

Quiz 5 Q 3 How are conditional dependencies represented in a BBN? Through directed edges Through bidirectional edges None of the above Through undirected edges Through nodes

Through directed edges

Quiz 8 Q 8 What is the role of momentum in optimization algorithms like SGD with momentum? To decrease the learning rate. To maintain a constant learning rate. To reduce the weight decay. To accelerate updates and dampen oscillations.

To accelerate updates and dampen oscillations.

Quiz 8 Q 5 What is the purpose of learning rate in optimization algorithms? To measure the loss function. To adjust the step size based on the value of the gradient. To determine the number of layers in the network. To control the model complexity.

To adjust the step size based on the value of the gradient.

Quiz 7 Q 10 In the context of neural networks, what is the purpose of the learning rate? To define the size of the input data To control the amount by which weights are updated during training To measure the complexity of the model To determine the number of layers in the network

To control the amount by which weights are updated during training

Quiz 7 Q 5 What is the primary objective of regularization techniques in deep learning? To overfit the training data To increase model complexity To decrease model complexity To fit training data perfectly

To decrease model complexity

Quiz 7 Q 2 What is the purpose of the activation function in a neural network? To normalize the input data To introduce non-linearity To calculate the loss function To control the learning rate

To introduce non-linearity

Quiz 7 Q 17 What is the purpose of the loss function in a neural network? To determine the number of layers in the network To initialize the weights To calculate the accuracy of the model To measure the difference between predicted and actual values

To measure the difference between predicted and actual values

Quiz 7 Q 12 What is the purpose of the softmax function in the output layer of a multi-class classification neural network? To introduce non-linearity To prevent overfitting To normalize the output probabilities To calculate the loss function

To normalize the output probabilities

Quiz 7 Q 8 What is the purpose of dropout regularization in neural networks? To decrease the learning rate To increase the number of neurons in the network To increase the weight regularization To randomly remove neurons during training to prevent overfitting

To randomly remove neurons during training to prevent overfitting

Quiz 4 Q 5 What is the purpose of a validation set in machine learning? To test the model's performance To evaluate the model on unseen data after training To train the model To tune hyperparameters and assess model performance during training

To tune hyperparameters and assess model performance during training

Quiz 6 Q 12 Consider a binary classification problem with the following confusion matrix: Predicted (Negative) Predicted (Positive) Actual (Negative) 850 50 Actual (Positive) 30 70 What is the precision of the model?

0.5833 Precision = True Positive / (True Positive + False Positive) = 70/ 70 + 50 = 0.5833

Quiz 7 Q 19 In which phase of training does the model update its weights based on the calculated gradients? Initialization phase Evaluation phase Forward pass Backward pass

Backward pass

Quiz 6 Q 9 Which ensemble learning method trains multiple models on different subsets of the data and averages their predictions? Bagging Stacking Boosting Blending

Bagging

Quiz 2 Q 5 Consider the following binary vectors: x1 = (1, 1, 1, 1, 1) x2 = (1, 1, 1, 0, 0) What is the Jaccard coefficient for the pair of vectors x1 and x2?

0.6 Jaccard coefficient = f11/(f01+f10+f11) = 3/(2+0+3) = 0.6 f11 (X:1, Y:1) f01 (X:0, Y:1) f10 (X:1, Y:0) F00 (X:0, Y:0)

Quiz 2 Q 6 Consider the following binary vectors: x1 = (1, 1, 1, 1, 1) x2 = (1, 1, 1, 0, 0) What is the Simple Matching Coefficient for the pair of vectors x1 and x2?

0.6 SMC = (f00+f11)/(f00+f01+f10+f11) = (0+3)/(0+0+2+3) = 0.6 f11 (X:1, Y:1) f01 (X:0, Y:1) f10 (X:1, Y:0) F00 (X:0, Y:0)

Quiz 3 Q 7 Consider a dataset with the following target classes and corresponding frequencies: Class A: 20 instances Class B: 30 instances Class C: 50 instances Calculate the Gini index for this dataset.

0.62 First, calculate the Gini impurity for each class: Gini(A) = = 1 - (20/100)^2 - (30/100)^2 - (50/100)^2 = 1 - 0.04 - 0.09 - 0.25 = 0.62

Quiz 6 Q 14 Consider a binary classification problem with the following confusion matrix: Predicted (Negative) Predicted (Positive) Actual (Negative) 850 50 Actual (Positive) 30 70 What is the F1 score of the model?

0.6403 F1 Score = 2 * ((Precision * Recall)/(Precision + Recall)) = 2 * ((0.5833 * 0.7) / (0.5833 + 0.7)) = 0.6403

Quiz 6 Q 13 Consider a binary classification problem with the following confusion matrix: Predicted (Negative) Predicted (Positive) Actual (Negative) 850 50 Actual (Positive) 30 70 In the same binary classification problem, what is the recall (sensitivity) of the model?

0.7 Recall = TP / (TP + FN) = 70 / (70 + 30) = 0.7

Quiz 1 Q 4 We have Ambulatory Medical Care data1, which contains the demographic and medical visit information for each patient (e.g., gender, age, duration of visit, physician's diagnosis, symptoms, medication, etc.). Determine which machine learning task the following task belong to: Identify the symptoms and medical conditions that co-occur together frequently Anomaly detection Association rule mining Clustering Classification

Association rule mining

Quiz 6 Q 5 A confusion matrix is used to: Visualize the performance of a classification model Measure the goodness of fit of a time series model Evaluate the performance of a clustering algorithm Compute the accuracy of a regression model

Visualize the performance of a classification model

Quiz 7 Q 11 Which of the following is NOT a step in the backpropagation algorithm? Forward Pass Gradient Descent Error Computation Weight Initialization

Weight Initialization

Quiz 3 Q 8 Consider the following data set that contains 60 training examples (30 labeled as positive class while the remainder labeled as negative class). X || Y || No. of + Examples || No. of - Examples 1 || 1 || 10 || 0 1 || 0 || 20 || 0 0 || 1 || 0 || 15 0 || 0 || 0 || 15 Based on the dataset above, attribute X should be selected as the splitting attribute based on Gini index. The Gini index value for attribute X is 0.5 while for attribute Y is 0.47 .

X, 0, 0.47 If we choose X, we get X || N1 || N2 + || 30 || 0 _ || 0 || 30 If we choose Y, we get Y || N1 || N2 + || 10 || 15 _ || 20 || 15 Gini index is lower while splitting with X. Gini index for attribute X is 0 (it is easy to show). However, for the attribute Y is calculated as follows: For N1: 1-((10/30)^2 + (20/30)^2) = 1-(1/9 +4/9) = 4/9 For N2: 1-((15/30)^2 + (15/30)^2) = 1-(1/4 + 1/4) = 1/2 Weighted sum: (30/60)*4/9 + (30/60)*1/2 = 17/36 = 0.47


Ensembles d'études connexes

BENIGNI I MALIGNI TUMORI BUBREGA

View Set

W GEO - 7. THE EUROPEAN POPULATION (Lesson 7, Unit 4)

View Set

Chapter 5: Database and Cloud Security

View Set

Wisconsin Accident and Health Insurance Exam 3

View Set