Deep Learning Quiz #1
Match the term with the definition using the following values in a classification task : Accuracy
(TP+TN)/(TP+TN+FP+
Match the method of normalization with the result new_score = (old_score - mean) / standard_deviation
0 mean and unit variance
Match the cross-validation technique with the description : Nested cross-validation
A cross-validation
Match the advanced machine learning concept to an example of it : Clustering
Babies learning that 'r'
Here is an analogy: "Rose" is to "Flower" as "Porsche" is to "Automobile", because the first word is a type of the second word. "North" is to "South" as "Black" is to "White" because second word is the opposite of the first word. and so on... The following is analogy can be said for four important concepts in machine learning. Fill in the blank. Classification is to regression in supervised learning as _____________ is to dimensionality reduction in unsupervised learning. Or more succinctly Classification is to regression as ___________________ is to dimensionality reduction
Clustering
A friend in your machine learning class created a movie rating prediction system that judges how many stars (out of 5) a person would rate a movie they haven't seen yet given their ratings for other movies. They stated their rating system is 100% accurate according to their data. What is the best question to ask them?
Did you remember to separate your training set from your test set?
Asking a thousand people hundreds of questions about their personalities, you can use which technique to find numbers which may approximate the "Big 5" personality characteristics.
Dimensionality reduction
Normalization is not important in k-NN classification because the features with the larger range should always have a larger influence on the k-NN distance metric than other features.
False
When cross-validation is performed in the validation set, the score of the best fitted model hyperparameters in that set is on average lower than the the score of that best fitted model on a separate test set.
False
When you use cross-validation to select the right hyperparameter, you need a separate set of data to properly measure the accuracy of the model with that hyperparameter (due to potential overfitting). Unfortunately, many scientists don't do this (although they should)
False
When you use cross-validation to select the right hyperparameters, you do not need a separate set of test data to properly measure the quality of the model because cross-validation already separates training from testing.
False
Match the advanced machine learning concept to an example of it : Feature selection
Including "IQ"
K-fold cross-validation will lead to lower accuracies than expected with the full training set because only (K-1)/K % of the data is being used for training (e.g. 4/5ths for K=5). The way to improve this is by increasing K. But what is a problem with increasing K?
K models have to be trained which takes more time as K increases
There are three kinds of people who build machine learning models. Person A doesn't separate training from testing, and just fits the model to all the data, Person B uses cross-validation over the entire data set to pick the best hyperparameters and reports the quality of the model on that data set. Person C uses cross-validation on a validation set for hyperparameters and uses a separate test set for evaluating the model.
Person C
The proportion of correctly identified samples of class A, among the test samples that were identified as belonging to class A, is called...
Precision
Select all scenarios that are examples of supervised learning
Predicting a buyer's chance of clicking on an online advertisement based on the previous behavior of similar online shoppers. Netflix using their database of user ratings to predict how you would rate a movie you haven't seen
In a given binary classification problem, Out of all samples of class A in the test set, the proportion of those which are correctly identified as class A by the classifier is called...
Recall
Match the cross-validation technique with the description : Leave one out cross-validation
Same as K-fold cross-validation where K = the size of the data set
If I want to test my voice recognition software to see how well it will works on a new person it has not yet been trained for, what type of cross-validation would give me the best sense of accuracy?
Subject-wise cross-validation
Match the term with the definition using the following values in a classification task : Specificity (Recall for the negative case)
TN / (TN + FP)
Match the term with the definition using the following values in a classification task : Sensitivity (Recall for the positive case)
TP / (TP + FN)
Match the term with the definition using the following values in a classification task : Precision (for the positive case)
TP / (TP + FP)
Match the term with the definition using the following values in a classification task : F1 Score
The harmonic mean
Weather forecasters in Denton decided to build a model that predicts tomorrow's high temperature from the previous 30 day's high temperatures. To do this, they used the past year's weather data to train the model. They had perfect accuracy in predicting when using last year's data for testing. However, when they applied the same model to predict the weather the next day, they found it was off by 10 degrees.Select all statements that are likely to apply to their model.
The model is overfitting (it is too complex, too many variables) They should have used separate sets of data for training and for testing to pick the right model
What is the purpose of regularization in linear regression?
To improve prediction accuracy on a future test set better than ordinary linear regression To decrease the coefficient values for irrelevant terms in the regression model To diminish the contribution of irrelevant features to the resulting model, effectively performing automated feature selection during learning
Dimensionality reduction is useful to lower the number of features in a systematic way. Which is NOT a reason why it may be useful to reduce the dimensionality of your feature set?
To project the data into a higher dimensional space to create a linear separating hyperplane
Match the cross-validation technique with the description :K-fold cross-validation
Train your model on K-1 groups of the data set, and test of the Kth portion. Repeat the process by changing the test set to be each of the other K groups
Match the cross-validation technique with the description : Subject-wise cross-validation
When you use data
Match the advanced machine learning concept to an example of it : Deep Learning
a type of machine learning that automatically creates features from the data to help with more complex decisions. Often this is done with neural networks with multiple layers. Useful for difficult machine learning problems like speech recognition and visual object identification. The new hotness in complex machine learning problems currently.
Match the advanced machine learning concept to an example of it : Feature Engineering
creating 'day of the week' as a feature from date strings ("Sep 20, 2017") when trying to predict if someone will be driving to work that day.
If you are picking among many different model variants you generally split your data into three different groups. Match the group to a property of that group : Training set
data used to create the models
If k-nearest neighbors was your model of how you make decisions, which value of k would be more likely to be superstitious (lead to poor generalization, fit to the noise, be "too complex" of a model...)
k=1
If you are picking among many different model variants you generally split your data into three different groups. Match the group to a property of that group : Validation set
like a test set but it is used to decide which model variant is selected to later apply to the test set
Match the method of normalization with the result new_score = (old_score - min) / (max - min)
range between 0 and 1
If you are picking among many different model variants you generally split your data into three different groups. Match the group to a property of that group : Test set
the data set used to evaluate the selected model that has been trained on the other two data sets