Machine Learning Algorithms: Supervised Learning Tip to Tail

Ace your homework & exams now with Quizwiz!

The magnitude of weights in the loss function

1 and L2 regularizers penalize: 1 point The magnitude of weights in the loss function The magnitude of training data The distance between the line and the training data. The lambda parameter

By adding non-linear feature expansions.

How do you add model complexity to a linear model? 1 point By adding features that are weighted sums of other features. By adding features at random to see what works. By adding non-linear feature expansions. You can't, linear models are always linear models.

For the learning phase, you use "fit" and for the operational phase you use "predict".

How do you build and use classifiers in scikit-learn? 1 / 1 point For the learning phase you use "classify" and for the operational phase you use "label". Pass the training data to the classifier constructor and then use "predict". For the learning phase, you use "fit" and for the operational phase you use "predict". For the learning phase you use "label" and for the operational phase you use "classify". For the learning phase you use "predict", for the operational phase you use "fit".

The training error is low but the test error is high.

How do you know when your learning algorithm has overfit a model? 1 point The test error is low The training error is low. The performance is good. The training error is low but the test error is high. The training error and test error are high.

Set the derivative of the loss function equal to zero and solve. Use an iterative method like gradient descent.

How might a learning algorithm find a best line? 1 point Set the derivative of the loss function equal to zero and solve. Plot all possible lines and pick the one that looks best. Brute force search Use an iterative method like gradient descent. Trial and error.

AC = 10 units

If we are measuring the distance between three points (A, B, and C), and distance from A to B is 5 units and the distance from B to C is 6 units, what else might be true? 1 point CA = 12 units CB = 4 units AC = 10 units AB = -8 units Nothing because it depends on what distance function you're using.

A horizontal line

In a two-dimensional graph, what line has a slope of zero? 1 point A horizontal line A line that goes up and to the right A vertical line

Low bias, high variance

Overfitting usually means: 1 point Overfitting doesn't have anything to do with bias and variance. High bias, low variance Low bias, low variance High bias, high variance Low bias, high variance

Generalization ability goes up until it starts overfitting, and then it goes down.

Question 1 Given the same data, what does increasing model complexity do for the ability of the final QuAM to generalize? 1 point Generalization ability always increases. Generalization ability always decreases. Generalization ability goes up until it starts overfitting, and then it goes down. Generalization ability goes up until it starts underfitting, and then it goes down.

Penalizes incorrectly labelled data points that are further from the decision boundary more than those that are closer Is a convex surrogate for misclassification loss

Question 1 Hinge loss... 1 point Penalizes incorrectly labelled data points that are further from the decision boundary more than those that are closer Changes angles for different points Is a convex surrogate for misclassification loss Is a kind of regression loss Is a differentiable surrogate for misclassification loss

Converts the output of a regression function to a class label

Question 2 What does a transfer function do? 1 point Returns the sign of the output of a regression function Converts the output of a regression function to a class label Translates an example from one class to another. It depends, what do you want it to do? Lets you use regression for classification

Transfer functions break loss functions. WRONG Regression doesn't work for binary values. WRONG Classifications are categorical rather than numeric values. WRONG

Question 3 What's the problem with doing regression to find numeric class labels directly? 1 point Classification isn't convex. You can't actually convert class labels to numbers. Transfer functions break loss functions. Regression doesn't work for binary values. It just works better to separate classes. Classifications are categorical rather than numeric values.

The model is performing worse than random guessing

Question 4 If you have an ROC curve with AUC value of 0.4, what would this indicate ? 1 point The model is misclassifying everything The model is performing worse than random guessing The model is as good as making random guesses The model is performing well

The set of flat hyperplanes The set of straight lines.

Question 5 What is the hypothesis space of linear regression? 1 point The set of straight lines. The set of flat hyperplanes The best-fit line All hypothesis that give numbers instead of classes. The set of curved lines.

When your dataset is small and you want to use as much data for your training and validation

Question 6 Under what circumstances would you use Cross Validation? 1 point When your dataset is small and you want to use as much data for your training and validation When your dataset is large and you don't care if you waste more data Because you don't need test data You never Cross validate, because Cross Validation is a myth

Model complexity Randomness in training data An overly simple hypothesis space

The bias/variance tradeoff is impacted by (select all that apply) 1 point Bad dart throwers Model complexity Randomness in training data An overly simple hypothesis space

The number needs to be chosen carefully when there are three or more classes. The number shouldn't be too big, to prevent influence from very dissimilar points. The number shouldn't be too small, to prevent influence from local, minute variation.

What do you need to keep in mind when picking a "k" for k-Nearest Neighbours? 1 point The number should be large to prevent bias. The number doesn't matter that much and you can use whatever you feel like. The number needs to be chosen carefully when there are three or more classes. The number shouldn't be too small, to prevent influence from local, minute variation. The number shouldn't be too big, to prevent influence from very dissimilar points. The number should be odd to prevent ties. The number should be four.

It has captured details in the training data that are irrelevant to the question.

What does it mean if your model has overfit the data? 1 point It has memorized the correct answers to the test. It has captured details in the test data that are irrelevant to the question. It has captured details in the training data that are irrelevant to the question. It hasn't captured enough detail from the training data about the question. It hasn't captured enough detail from the test data about the question.

The loss function used for optimization is convex with respect to the model function.

What does it mean to have a matching loss? 1 point The loss function and model function are both linear The loss function used for optimization is convex with respect to the model function. The loss function and model function share the same hypothesis space The loss function looks good in a graph.

The line separating one class from another

What is a decision boundary? 1 point The line separating one class from another The function that returns the correct class for a given example The border at which you must choose your destiny The function that chooses the best action

A set of hypotheses that might answer a given question.

What is a hypothesis space?Question 3 What is a hypothesis space? 1 / 1 point All hypothesis that are supported by the evidence. A set of hypotheses that might answer a given question. A set of hypothesis that might answer any question. Another name for the scientific method. The offset typesetters use for an M-dash

It is a way of mathematically measuring the number and magnitude of the errors made by a specific hypothesis. WRONG It is a way of mathematically measuring the number and magnitude of the correct predictions made by a specific hypothesis. WRONG

What is a loss function? 1 point A function that decreases in value every time a mistake is made. It is a way of mathematically measuring the errors made by a specific hypothesis. A function that quantifies financial loss. It is a way of mathematically measuring the number and magnitude of the correct predictions made by a specific hypothesis. It is a way of mathematically measuring the number and magnitude of the errors made by a specific hypothesis. It is a way of mathematically measuring the number and magnitude of the errors made by the best hypothesis.

They all output class labels They all use optimization They all use linear functions

What is common between the regression-based classification algorithms in this module? Specifically Logistic Regression, Neural Networks, and SVMs. Select all that apply. 2 points They all output class labels They all use the L2 norm They all use optimization They all use a regularized loss function They all use linear functions

Create the split that makes the biggest difference in the resulting data set.

What is the decision tree learning algorithm trying to do at each node in the tree? Split the data to achieve complete separation in nodes. Create the split that minimizes the difference in the resulting sets. Create the split that makes the biggest difference in the resulting data set. Find a binary question that tells you whether an email is spam.

For any two points on a graph, the line connecting the points is on or above the line of the graph.

What is the definition of a convex function? 1 point A function with neither local minima or global minima. For any two points on a graph, the line connecting the points is on or above the line of the graph. For any two points on a graph, the line connecting the points is on or below the line of the graph. A function with both local minima and global minima.

True Positives / (True Positives + False Negatives)

What is the formula for Recall measure ? 1 point True Positives / (True Positives + False Positives) True Negatives / (True Negatives + False Positives) True Positives / (True Positives + False Negatives) (True Positives + True Negatives) / Total examples

The set of straight lines. The set of flat hyperplanes

What is the hypothesis space of linear regression? 1 point The set of straight lines. All hypothesis that give numbers instead of classes. The set of flat hyperplanes The set of curved lines. The best-fit line

Number of misclassifications

What is the most important source of penalty when optimizing for classification? 1 point Magnitude of errors Magnitude of misclassifications Number of misclassifications Distance between points in the same class Direction of misclassifications

They penalize model complexity

What is the point of regularizers? 1 point They penalize model inaccuracy They penalize model complexity They make a loss function convex. They fix the mistakes in training data

The line is the same distance from the closest training example from each class.

What is true of the line found by hard-margin SVM on linearly separable data? 0 / 1 point It's always a two-dimensional line. The line is the same distance to multiple training examples from all the classes. It minimizes misclassifications. The line is the same distance from the closest training example from each class.

Labels are categories Labels form an unordered set

What makes classification different from regression? Select all that apply. 1 point Labels are categories Labels form an unordered set Regression builds a QuAM Labels must be supplied by a human supervisor. Classification does not require labels

Idk wag ka magalala, pasado ka naman nyan

What method does scikit-learn use to find the best classification hypothesis for the training data? 1 point

svm

What package must you import to build Support Vector Machines in scikit-learn? 1 point svm svc linear_model ensemble tree

When the classes are linearly separable

When can you use the perceptron classifier? 1 point When you're classifying observations When you are using a neural network It depends When the optimization function is differentiable When the classes are linearly separable Whenever you feel like it When the decision boundary is flat

Validation dataset

Which dataset you would use for hyperparameter tuning? 1 point Training dataset Validation dataset Test dataset All the learning data

Character recognition using 16x16 scans of characters, with one thousand training examples.

Which of the following is the best use case for SVMs? 1 point Sentiment analysis using a dataset of one million training examples, each consisting of five features. Character recognition using 16x16 scans of characters, with one thousand training examples. Earthquake detection with 100 time-series features and examples from two stations. Classifying mushrooms as poisonous or edible, using a dataset of one thousand examples, with the GPS coordinates each was found in, the shape of the cap, colour, and density.

Remove all the temporal dependencies by adding more features and then randomly split the data Use the first x% of your chronologically ordered data as train data and test on the remaining data

Which of the following statements are true regarding splitting time series data into train and test data ? 1 point You can randomly split your dataset into train and test data For time series data, it isn't necessary to split into train and test data Remove all the temporal dependencies by adding more features and then randomly split the data Use the first x% of your chronologically ordered data as train data and test on the remaining data

Because non-linear feature expansions generally increase the size of the hypothesis space.

Why do non-linear feature expansions increase model complexity? 1 point Because non-linear feature expansions are hard to calculate. Because non-linear feature expansions generally increase the size of the hypothesis space. Because non-linear feature expansions are more complicated than linear features.

Because not all loss functions are differentiable everywhere.

Why do we need iterative functions other than gradient descent to optimize loss functions? 1 point Because the L2 loss function can have sharp corners. We don't need anything but gradient descent to optimize loss functions. Because not all loss functions are differentiable everywhere.

Because deciding how to translate the output of regression into the space of class labels deserves particular consideration

Why do we need transfer functions for classification? 1 point Because every step of the classification model needs to be differentiable Because we said so Because deciding how to translate the output of regression into the space of class labels deserves particular consideration Because a number is not a class

Because we're using step-wise updates to converge on a minimum.

Why is gradient descent considered an iterative approach? 1 point Because we're using continuous updates to converge to a maximum. Because we're using continuous updates to converge to a minimum. Because we're using step-wise updates to converge to a maximum. Because we're using step-wise updates to converge on a minimum.

Because in convex optimization a local minimum is guaranteed to be the global minimum.

Why is the convexity of the loss function important for machine learning? 1 point Because in convex optimization a global minimum is guaranteed to be a local minimum. Because convexity guarantees the smoothness of our loss function. Because in convex optimization a local minimum is guaranteed to be the global minimum. Because we like lines and lines are convex.

The model will have high variance and not generalize well to new data.

Why might we *not* want our model to fit perfectly to our training data? 1 point The model will have high bias. We always want our model to fit perfectly to our training data. The model will have high variance and not generalize well to new data.

RMSE has the same units as the predicted value

Why would you use Root Mean Squared Error(RMSE) over Mean Squared Error (MSE) ? 1 point RMSE has the same units as the inputs provided to the QuAM RMSE has the same units as the predicted value MSE has the same units as the inputs provided to the QuAM MSE has the same units as the predicted value

Finding the hypothesis in a specific class of hypotheses that best labels given data.

W​hat is supervised learning doing? 1 / 1 point Finding the best linear function in the set of all possible linear functions. Filtering spam Finding the best hypothesis from the set of all possible hypotheses that label given data. Finding the best classification function from the set of all possible classifiers. Finding the hypothesis in a class of hypotheses that best clusters given data. Finding the hypothesis in a specific class of hypotheses that best labels given data.


Related study sets

11) Chapter 67: Care of Patients with Diabetes Mellitus

View Set

Hematology NCLEX style questions

View Set