Supervised Machine Learning: Regression and Classification

¡Supera tus tareas y exámenes ahora con Quizwiz!

Question 1 For linear regression, the model is f_{w,b}(x) = wx + bfw,b​(x)=wx+b. Which of the following are the inputs, or features, that are fed into the model and with which the model is expected to make a prediction? x

x The x, the input features, are fed into the model to generate a prediction f_{w,b}(x)fw,b​(x)

Of the circumstances below, for which one is feature scaling particularly helpful?

Feature scaling is helpful when one feature is much larger (or smaller) than another feature. For example, the "house size" in square feet may be as high as 2,000, which is much larger than the feature "number of bedrooms" having a value between 1 and 5 for most houses in the modern era.

Which are the two common types of supervised learning? (Choose two)

Regression and Classification Regression predicts a number among potentially infinitely possible numbers. Classification predicts from among a limited set of categories (also called classes). These could be a limited set of numbers or labels such as "cat" or "dog".

Which of the following is a valid step used during feature scaling?

Subtract the mean (average) from each value and then divide by the (max - min). This is called mean normalization. (image 4)

Suppose you have a regularized linear regression model. If you increase the regularization parameter \lambdaλ, what do you expect to happen to the parameters w_1,w_2,...,w_nw1​,w2​,...,wn​?

This will reduce the size of the parameters w_1,w_2,..., w_nw1​,w2​,...,wn​ Regularization reduces overfitting by reducing the size of the parameters w_1,w_2,...w_nw1​,w2​,...wn​.

True/False? With polynomial regression, the predicted values f_w,b(x) does not necessarily have to be a straight line (or linear) function of the input feature x.

True A polynomial function can be non-linear. This can potentially help the model to fit the training data better.

For the simplified loss function, if the label y^{(i)}=0y(i)=0, then what does this expression simplify to?(image 9)

−log(1−fw,b​(x(i)))

(image 8) In this lecture series, "cost" and "loss" have distinct meanings. Which one applies to a single training example?

Loss In these lectures, loss is calculated on a single training example. It is worth noting that this definition is not universal. Other lecture series may have a different definition.

A cat photo classification model predicts 1 if it's a cat, and 0 if it's not a cat. For a particular photograph, the logistic regression model outputs g(z)g(z) (a number between 0 and 1). Which of these would be reasonable criteria to decide whether to predict if it's a cat? (image 7)

Predict it is a cat if g(z) >= 0.5 Think of g(z) as the probability that the photo is of a cat. When this number is at or above the threshold of 0.5, predict that it is a cat.

You fit logistic regression with polynomial features to a dataset, and your model looks like this. What would you conclude? (Pick one) (image 11)

The model has high variance (overfit). Thus, adding data is likely to help The model has high variance (it overfits the training data). Adding data (more training examples) can help.

For linear regression, if you find parameters ww and bb so that J(w,b)J(w,b) is very close to zero, what can you conclude?

The selected values of the parameters w and b cause the algorithm to fit the training set really well. When the cost is small, this means that the model fits the training set well.

(image 1) Gradient descent is an algorithm for finding values of parameters w and b that minimize the cost function J. When \frac{\partial J(w,b)}{\partial w}∂w∂J(w,b)​ is a negative number (less than zero), what happens to ww after one update step?

w increases The learning rate is always a positive number, so if you take W minus a negative number, you end up with a new value for W that is larger (more positive).

Question 2 Which of these is a type of unsupervised learning?

Clustering Clustering groups data into groups or clusters based on how similar each item (such as a hospital patient or shopping customer) are to each other.

Question 1 (image 3) In the training set below, what is x_4^{(3)}x4(3)​? Please type in the number below (this is an integer such as 123, no decimal points).

20 Yes! x_4^{(3)}x4(3)​ is the 4th feature (4th column in the table) of the 3rd training example (3rd row in the table).

Which of the following are potential benefits of vectorization? Please choose the best option.

It makes your code run faster It can make your code shorter It allows your code to run more easily on parallel compute hardware Correct! All of these are benefits of vectorization!

True/False? No matter what features you use (including if you use polynomial features), the decision boundary learned by logistic regression will be a linear decision boundary.

False The decision boundary can also be non-linear, as described in the lectures.

You are helping a grocery store predict its revenue, and have data on its items sold per week, and price per item. What could be a useful engineered feature?

For each product, calculate the number of items sold times price per item. This feature can be interpreted as the revenue generated for each product.

Which of the following two statements is a more accurate statement about gradient descent for logistic regression?

The update steps look like the update steps for linear regression, but the definition of f_{\vec{w},b}(\mathbf{x}^{(i)})fw,b​(x(i)) is different. For logistic regression, f_{\vec{w},b}(\mathbf{x}^{(i)})fw,b​(x(i)) is the sigmoid function instead of a straight line.

For which case, A or B, was the learning rate α likely too large? (image 5)

case B only The cost is increasing as training continues, which likely indicates that the learning rate alpha is too large. (image 5)

If z is a large positive number, then:

(image 6) g(z) is near one (1)

Question 1 Which of the following can address overfitting?

Apply regularization Regularization is used to reduce overfitting. Collect more training data If the model trains on more data, it may generalize better to new examples. Select a subset of the more relevant features. If the model trains on the more relevant features, and not on the less useful features, it may generalize better to new examples.

True/False? To make gradient descent converge about twice as fast, a technique that almost always works is to double the learning rate alpha.

False Doubling the learning rate may result in a learning rate that is too large, and cause gradient descent to fail to find the optimal values for the parameters ww and bb.

Question 2 For linear regression, what is the update step for parameter b?

(image 2)


Conjuntos de estudio relacionados

Basic Computer Literacy and Information Literacy Concepts

View Set

Soc. Ch. 7 Rejecting Oppressive Relationships: The Logic of Cultural Pluralism for a Diverse Society

View Set

Prep U Fundamentals of Nursing CH 8

View Set

CNA Class: Test 2: Chapter 8: Emergency Care, First Aid, and Disasters

View Set

FTCE General Knowledge- English practice test 1,2,3, &4

View Set