Machine Learning Foundations for Product Managers

Ace your homework & exams now with Quizwiz!

Suppose we are building a supervised machine learning model to predict students' grades on the final exam for a math course based on their activity throughout the semester. Which of the following might we use as features for the model (select all that apply)? Students' scores on weekly quizzes and homework assignments Students' attendance at the weekly lectures Historical scores of students on the final exam Students' mathematical ability

Students' scores on weekly quizzes and homework assignments Students' attendance at the weekly lectures

Building a model which uses historical data to predict the future demand for electricity within a certain utility territory would likely be an application of which type of machine learning? Reinforcement learning Unsupervised learning Supervised learning

Supervised learning

true positive rate

TP / (TP + FN) TP True positive FN false negatives

Precision formula

TP/(TP+FP)

Each column of a data is what is commonly called

is also commonly called a feature of our data. Also referred to as a factor, a predictor, an X variable, independent variable, an attribute, or even a dimension

Mean absolute percent error, or MAPE, is calculated as

is calculated as the absolute value of the difference between the actual value and the predictions regenerating divided by the actual values. We sum that up and divide by the total number of productions to get our MAPE value. MAPE converts the error to a percentage rather than an absolute number. MAPE is typically very popular, particularly among non-technical audiences. Because it's easily understood, it's a common metrics that's used to present to customers, again, because it's easy to understand and interpret.

MAE is also influenced by ______________

is influenced by the scale of the problem, therefore impossible to compare an __________ value on one problem to another problem.

One of the challenges of mean squared error

is that's heavily influenced by outliers. When we have a particular instance of a large error because of the square term in the formula, we have a heavy penalty that's applied and as a result, we get a very high ______. _______ is also influenced by the scale of our data. Therefore, it's impossible to compare a ________ on one problem to a ____________ on another problem because we're working with completely different data sets using different scales.

For regression modeling problems, we will typically use one of three common metrics,

mean mean squared error, mean absolute error, or mean absolute percent error.

msc

mean square error

false positive rate

FP/(FP+TN)

3 factors in choosing a model

*the performance of the model or the accuracy of the model. *interpretability *computational efficiency.

Advantages of MAE over MSC (mae, msc)

- _________ is more robust to outliers or very large errors, tends to penalize large errors, much less than MSE does because it doesn't contain that square term in the formula. _______ can also be a little bit easier to interpret in the context of a problem because we don't have that square term in the formula. And therefore, ______ is tends to be on a similar scale to the value that they were trying to predict. So it's a little bit more logical for us to understand when we see an ______ value relative to an MSE value in the context of the predictions we're trying to make

We are building a model and decide to use K-folds cross validation to compare different versions of our model. We set k = 10. How many iterations would we run in our cross validation process, and during each iteration, what percentage of our data would be used for validation (and not training)? 10 iterations, 90% for validation each iteration 5 iterations, 20% for validation each iteration 5 iterations, 10% for validation each iteration 10 iterations, 10% for validation each iteration

10 iterations, 10% for validation each iteration Since k=10 we run 10 iterations, and each iteration we use 1 of the 10 folds (10%) for validation and the remaining 9/10 for training

We are building a classification model to identify manufacturing defects (a "positive" in our model) from among parts coming off a manufacturing line. In our test set we have 1000 images of parts. 50 of the 1000 contain defects ("positives") and the remaining images do not. Our model successfully identifies 40 true positives. What is the recall of our model? 95% We cannot calculated recall with the information provided 80% 4%

80% Our model succesfully identified 40 out of the 50 positives (parts with defects), and so our recall is 40/50 = 80%

Which of the following must we define in order to create and train a machine learning model (select all that apply)?

A loss function to use in optimizing our model A set of features of our data to use in our model An algorithm Values for the hyperparameters of the algorithm

area under the ROC

A measure of sensitivity that does not change with bias

Underfitting

A statistical model or a machine learning algorithm is said to have _____ when a model is too simple to capture data complexities. It represents the inability of the model to learn the training data effectively result in poor performance both on the training and testing data. In simple terms, an _____ model's are inaccurate, especially when applied to new, unseen examples. It mainly happens when we uses very simple model with overly simplified assumptions.

Which of the following are examples of potential "outcome" metrics for a machine learning project (select all that apply)?

Additional sales revenue (in $/day) Time saved (in minutes/day)

Building a model to identify whether a patient has skin cancer based on images of the patient's skin is an example of which type of supervised learning task? Regression Classification Clustering

Classification

As in the previous question, we are building a model to identify defects ("positives" in this case) within products coming off a manufacturing line. We test our model on a test set of 1000 images of products coming off the line. Our model predicted that 50 of the images were positives (had defects) and the remaining 950 had no defects. We compare our model's predictions to the actual labels and determine that our model had 45 true positives. What was the precision of our model on the test set? 95% 90% 4.5% 5%

Correct. Our model predicted 50 positives and of those 45 were true positives, so our recall was 45/50 = 90%

What is the difference between "machine learning" and "deep learning"? Deep learning is a separate field unrelated to machine learning Deep learning is a subfield within the broad field of artificial intelligence while machine learning is not Deep learning involves the use of cloud or high-performance compute power to create models while machine learning does not Deep learning is a sub-field of machine learning focused on the use of neural network models consisting of multiple layers

Deep learning is a sub-field of machine learning focused on the use of neural network models consisting of multiple layers

Question 10 If we are using cross-validation to compare multiple models and select the best one, how do we compare the models? For each model, we run cross-validation and calculate the error on the validation fold of each iteration. We then take the minimum error on a validation fold as representative of the cross-validation error of the model. We compare each model's minimum error and choose the best one For each model, we run cross-validation and calculate the error on the training folds for each iteration. We then take the average error across all training folds as representative of the model's performance, compare the average error of each model and select the model with the minimum average error For each model, we run cross-validation and calculate the error on the validation fold of each iteration. We then take the average error across iterations as representative of the model's performance, compare the averag

For each model, we run cross-validation and calculate the error on the validation fold of each iteration. We then take the average error across iterations as representative of the model's performance, compare the average error of each model and select the model with the minimum average error

We are working on a regression modeling project for a client who is very concerned about minimizing any particularly large errors of the model on outlier datapoints. To compare multiple versions of our model and select a final version for the project, which metric would we mostly likely use? Accuracy Mean squared error Mean absolute error

Mean squared error Correct. MSE penalizes large errors more heavily than MAE, and so will be a better metric for comparing models if we desire to minimize infrequent large errors occuring

False positive

How many of the negatives did the model incorrectly classify as positive

what is precision?

How many of the predicted positives are actually positives

What is the main difference between regression and classification? In regression we aim to predict one or more numerical variables, and in classification we are predicting a class or category With regression models we are making a prediction whereas with classification models we generally seek only to identify patterns in the relationships between the input and output variables For regression models we need only the input feature data to build the model, whereas for classification models we need both input feature data and a set of output targets to train the model For regression we always use a linear model, and for classification we always use a neural network

In regression we aim to predict one or more numerical variables, and in classification we are predicting a class or category

What is the purpose of the "algorithm" in building a machine learning model? It is used to evaluate the performance of the model during model training It acts as a "template" to define the form of the relationship between inputs and outputs that is used in the model It dictates which features are used in the model It is a "knob" that can be tuned during model training to adjust the performance of the model

It acts as a "template" to define the form of the relationship between inputs and outputs that is used in the model

Why is data leakage a dangerous thing in the modeling process? It can invalidate the estimated performance of our model based on the test set, and cause our performance estimate to be overly optimistic relative to the model's ability to generate predictions for new data It can significantly reduce the amount of data available for model training It can artificially reduce our model's performance on the test set, causing us to believe our model is not as good as it actually is in generating predictions for new data It commonly leads to underfitting on the training and test data, resulting in models that are too simple

It can invalidate the estimated performance of our model based on the test set, and cause our performance estimate to be overly optimistic relative to the model's ability to generate predictions for new data

Which of the following are characteristics of structured data (select all that apply)? It follows a structure consisting of a fixed number of defined fields It includes content such as video, text and images It is often stored in relational databases and works well with common tools used in organizations such as spreadsheet tools It comprises ~80% of a typical organization's total data

It follows a structure consisting of a fixed number of defined fields It is often stored in relational databases and works well with common tools used in organizations such as spreadsheet tools

What most likely happens to the recall / true positive rate of our model if we decrease the threshold value from the default of 0.5 to a value of 0.3? It goes up It goes down It stays the same

It goes up Correct. Our model would classify more points as positives, likely increasing the TPR / recall

Question 6 Why would we use cross-validation instead of a fixed validation set (select all that apply)? It is computationally less expensive It maximizes the data available for training the model, which is particularly important for smaller datasets It enables us to skip the use of a test set for evaluating model performance It provides a better evaluation of how well the model can generalize to new data because the validation performance is not biased by the choice of datapoints to use in a fixed validation set

It maximizes the data available for training the model, which is particularly important for smaller datasets It provides a better evaluation of how well the model can generalize to new data because the validation performance is not biased by the choice of datapoints to use in a fixed validation set

We would like to compare two models using R-squared as a metric. Model A has a R-squared value of 0.85, and Model B has a R-squared value of 0.2. Which model does a better job explaining the variability in our target variable? Model A Model B

Model A Model A has a much higher R-squared value, meaning it does a better job explaining the variability in the target variable

We are working on a binary classification modeling project and have developed two different models. The first model (Model A) has an Area under the ROC (AUROC) of 0.73, and the second model (Model B) has an Area under ROC of 0.43. Which model should we select if we are using AUROC as our performance evaluation criteria? Model A Model B Neither, since neither one is better than a classifier which guesses at random between the two classes

Model A Model A has the higher AUROC score of 0.73

Suppose we are building a model to predict which days of the year a normal, healthy person will have a common cold. Would accuracy be the best choice of metric to evaluate our model? Yes No

No Since we have a high class imbalance in this case because most days of the year a healthy person does not have a cold, accuracy is probably not the best choice of metric

What are the three primary considerations when selecting an algorithm to use in modeling (select three)? Performance Interpretability Validation & testing approach Computational efficiency

Performance Interpretability Computational efficiency

Suppose I am building a machine learning model to predict daily gasoline demand in a region over time given a number of factors such as population, weather, season of year, etc. I am using historical data on these factors and the corresponding actual gas demand that occurred. The array of historical dataset I have available to train my model is of size 105,000 rows by 36 columns, which includes one column which contains the target variable I am trying to predict (gas demand). Which of the following which are correct about my data (select all that apply)? My dataset contains 36 features The feature "average high temperature" is a continuous variable My dataset contains 105,000 observations The feature "season of year" is a categorical variable

The feature "average high temperature" is a continuous variable Correct The feature "season of year" is a categorical variable correct

"Underfitting" refers to the situation in modeling when: 1 / 1 point The model is overly complex and unable to generalize well to make predictions on new data The model consistently generates predictions which are lower than the target values The model is too simple to fully capture the patterns in the data The model was not trained on enough data

The model is too simple to fully capture the patterns in the data

If we determine that our model is overfitting on the data, which of the following aspects of our model might we adjust to reduce the complexity (select all that apply)? The number of features used in the model The performance metric for evaluating our model The choice of algorithm The values of the hyperparameters used in our model

The number of features used in the model The choice of algorithm The values of the hyperparameters used in our model

What does the term "variance" in modeling refer to? The sensitivity of the model to small fluctuations in the training data The error introduced by modeling a real life problem using an over-simplified model The range of the output values of the model The total error of the model predictions

The sensitivity of the model to small fluctuations in the training data

When we split our data to create a test set, how much of our data do we generally use for training and how much for testing? Typically 70-90% for training and 10-30% for testing Typically 50% for training and 50% for testing Typically 70-90% for testing and 10-30% for training Typically 95% for training and 5% for testing

Typically 70-90% for training and 10-30% for testing

Which of the following are things that machine learning cannot do well (select all that apply)? Understand context of situations Automate routine tasks Determine causation Find solutions to problems

Understand context of situations Determine causation Find solutions to problems

Building a model which organizes news articles from the daily paper into groups by subject (e.g. sports, business, politics) using only the text of the articles, without being trained on previous labeled articles, would likely be an application of which type of machine learning? Supervised learning Unsupervised learning Reinforcement learning

Unsupervised learning

k-cross validation benefits

Using a single fixed validation subset, we remove that subset from use in training our model. Whereas in cross-validation, because the validation set rotates each time, we're able to use all of the data available to us at some point during one of the iterations for training our model. Secondly, cross-validation generally provides a better evaluation of how well the model can generalize to be able to generate accurate predictions on new data it's never seen before. One of the risks with using a single fixed validation set is that we may accidentally bias the model's performance on that set, through the choice of data points to include in that single fixed validation set. In cross-validation, because our validation subset is rotating each iteration, we use every data point available to us onetime for validation.

What does the term "data leakage" mean? We use some of our training data in the test set to evaluate model performance We lose valuable data by removing rows with missing values We reduce the data available for training our model by carving a portion out to use as a test set We use our test set data at some point during the model building process and it influences the development of our model

We use our test set data at some point during the model building process and it influences the development of our model

Why do we split our data into a training set and test set, and then hold back the test set while training our model? We can then use the test set to compare versions of our model and select the best version as our final model We use the test set to calculate the performance of models with different sets of features, to help us with feature selection We select our model using the training set and then add our test set in, re-train our model, and calculate the model's performance across our full dataset We use the test set to evaluate performance of our final model as an unbiased indicator of its ability to generate quality predictions on new data

We use the test set to evaluate performance of our final model as an unbiased indicator of its ability to generate quality predictions on new data

Why does overfitting occur?

You only get accurate predictions if the machine learning model generalizes to all types of data within its domain. ______________ occurs when the model cannot generalize and fits too closely to the training dataset instead. ___________ happens due to several reasons, such as: • The training data size is too small and does not contain enough data samples to accurately represent all possible input data values. • The training data contains large amounts of irrelevant information, called noisy data. • The model trains for too long on a single sample set of data. • The model complexity is high, so it learns the noise within the training data.

Overfitting

_____________ is an undesirable machine learning behavior that occurs when the machine learning model gives accurate predictions for training data but not for new data. When data scientists use machine learning models for making predictions, they first train the model on a known data set. Then, based on this information, the model tries to predict outcomes for new data sets. An_________t model can give inaccurate predictions and cannot perform well for all types of new data.

auroc

area under the ROC

Calculate the mean squared error by

by summing up the differences between the actual target value and a predicted value squared and then dividing by the number of observations that we have.

The last column we call our __________

can be called the target, also called the label and annotation, response, a Y variable or even a dependent variable.

MAE, our mean absolute error, we're

in it, we are summing up the absolute value of the difference between the target and the prediction across all of the predictions that we make and dividing by the total number of predictions.

K-fold cross-validation

is a technique for evaluating predictive models. The dataset is divided into k subsets or folds. The model is trained and evaluated k times, using a different fold as the validation set each time. Performance metrics from each fold are averaged to estimate the model's generalization performance.

Each row of a data is also what's called

is also called an observation of a data. You'll also see it referred to as an instance of the data, an example or a feature vector.

In the matrix of input variable data to a supervised learning model (often referred to as X), the rows of the matrix represent the ____ and the columns of the matrix represent the ____. features, targets observations, features features, observations observations, targets

observations, features

ROC Curve

plots the true positive rate versus the false positive rate for a variety of different threshold values.

where in r squared of one would indicate ______ R squared of zero would mean

r squared of ___ indicates a perfect model that's able to completely explain all of the variants found in the Y values or the target values. r squared of ______ would indicate that the model is explaining none of the variants found in our y or target values.

Outcome metrics

refer to the desired business impact of the model or the broader product that we're trying to create either for our own organization or for our customers. Typically the business impact is stated in terms of dollars, so it might be dollars of costs saved, might be dollars of revenue generated. Sometimes it can be a time as well, but typically it's referring to some sort of an impact on a customer or our own business operations.

Output metrics

refer to the desired output from our model. These are typically stated in terms of one of our model performance metrics that we're going to learn about later in this lesson. Typically the _________ for a model are not communicated to the customer except in rare cases. What our customer really cares about is the outcome that we're delivering to them. Not so much the output from the model itself. ________________ are also generally set after we've defined the desired outcome and we allow the choice of outcome metric to then dictate our selection of _______________ that we use to evaluate our model.

Class imbalance

refers to a situation in a dataset where the distribution of classes or categories is highly uneven.

true positive rate is also

sensitivity

One of the challenges with MAPE is

that it's skewed by high percentage errors for low values of y. So consider a case when we have a very low value of a target, we may have a very small error, but relative to the low value of a target. When we convert that small area to a percentage, it ends up being a very high percent.

confusion matrix

true is y, predicted is y-hat


Related study sets

MEYERS UNIT 3 Practice test questions (robb)

View Set

Chapter 2 Prep U (Study Guide for Health Promotion Exam 1)

View Set

DMBOK - ch 8 - Ref and Master Data Mgt

View Set

Contracts Final (13): Material Breach

View Set