Machine Learning- Exam one

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

univariate multivariable regression

A model with one outcome and several explanatory variables (most common).

Type of classifications: Binary: Multiclass (Multinomial): Multilabel (multiple binary): Multioutput (multiple multiclass):

Binary: • [Digit is 5 or not] Multiclass (Multinomial): • [Digit is 0, 1, ...,9] Multilabel (multiple binary): • [Odd or not]; [Greater than 5 or not] Multioutput (multiple multiclass): • 28x28 labels for each image and each label value is 1 to 256

___________ is constraining a model to make it simpler and reduce the risk of overfitting.

Regularization

_________ uses many small validation sets, evaluate each model once per validation set after it is trained on the rest of the training data. Calculate the average of all evaluations.

Repeated cross-validation

Multiclass (multinomial) classifiers

distinguish between more than two classes.

We have a dataset called course_enrollements that has the following columns: course, instructor, student, room. We would like to know how many enrolments each course has. Use a method of DataFrame structure to get the number of students in each course.

course_enrollment['course'].value_counts()

You have a multi-class classification problem with k classes, using one vs rest method, how many different logistic regression classifiers will you end up training?

k

For the following group of data: 200, 400, 800, 1000, 2000, 2200, scale them with min-max.

0.02, 0.04, 0.08, 0.1, 0.2, 0.22 or find percentage of total.

There are two types of performance functions

1) Utility (or fitness) function: Measures how good your model is. 2) Cost function: Measures how bad it is.

Name 5 different sub tasks you need to perform while you are exploring the data.

1. Create a copy of the data for exploration (sampling it down to a manageable size if necessary). 2. Create a Jupyter notebook to keep a record of your data exploration. 3. Study each attribute and its characteristics (% missing values, noise, distribution etc.): 4. For supervised learning tasks, identify the target attribute(s). 5. Visualize the data. 6. Study the correlations between attributes. 7. Study how you would solve the problem manually. 8. Identify the promising transformations you may want to apply. 9. Identify extra data that would be useful (go back to "Get the Data"). 10. Document what you have learned.

Name 3 sub tasks of data preparation step of machine learning.

1. Data cleaning: • Fix or remove outliers (optional). • Fill in missing values (e.g., with zero, mean, median...) or drop their rows (or columns). 2. Feature selection (optional): • Drop the attributes that provide no useful information for the task. 3. Feature engineering, where appropriate: • Discretize continuous features. • Decompose features (e.g., categorical, date/time, etc.). • Add promising transformations of features (e.g., log(x), sqrt(x), x2, etc.). • Aggregate features into promising new features. 4. Feature scaling: • Standardize or normalize features.

Select the multiclass classification: - Assigning a tag to an email from one of the following: Promotion, Social, Primary - Assigning a patent one of these: not ill, cold, flu - Assigning the weather as one of these: sunny, rain, snow, cloudy - Analyzing a picture and assigning both young/old and male/female options

- Assigning a tag to an email from one of the following: Promotion, Social, Primary - Assigning a patent one of these: not ill, cold, flu - Assigning the weather as one of these: sunny, rain, snow, cloudy

Name 5 of the main steps in a machine learning project:

1. Frame the problem and look at the big picture. 2. Get the data. 3. Discover and visualize the data to gain insights. 4. Prepare the data to better expose the underlying data patterns to Machine Learning algorithms. 5. Explore many different models and shortlist the best ones. (Select a model and train it.) 6. Fine-tune your models and combine them into a great solution. 7. Present your solution. 8. Launch, monitor, and maintain your system.

Name 4 main challenges of machine learning and briefly explain:

1. Insufficient Quantity of Training Data 2. Nonrepresentative Training Data 3. Poor-Quality Data 4. Irrelevant Features 5. Overfitting the Training Data 6. Underfitting the Training Data

Name 5 sub tasks of shortlisting the promising models step of a machine learning project.

1. Train many quick-and-dirty models from different categories (e.g., linear, naïve Bayes, SVM, Random Forest, neural net, etc.) using standard parameters. 2. Measure and compare their performance. • For each model, use N-fold cross-validation and compute the mean and standard deviation of the performance measure on the N folds. 3. Analyze the most significant variables for each algorithm. 4. Analyze the types of errors the models make. • What data would a human have used to avoid these errors? 5. Perform a quick round of feature selection and engineering. 6. Perform one or two more quick iterations of the five previous steps. 7. Shortlist the top three to five most promising models, preferring models that make different types of errors.

Given the following confusion matrix for a two-class problem. Calculate the Following measures: Predicted + Predicted - True + 100 40 True - 60 300 a. Precision b. Recall c. F1 measure d. False positive

A. Precision = TP/(TP+FP) 100/(100+40) = 0.715 B. Recall = TP/(TP+FN) = 100/(100+60) = 0.625 C. F1 measure = 2 x ((Precision x Recall)/(Prescision + Recall)) = 2 x(0.446875 / 1.34) = 0.6669 D. False Positive = 40.

In a medical trial, we train a model with weight, age, and race features, and we get predictions for variables blood pressure and cholesterol with the same model. Which of the following is true? a. It is a multivariate multivariable (multiple) regression b. It is a univariate multivariable (multiple) regression c. It is a multivariate univariable regression d. None of the options listed.

A. it is a multivariate multivariable (multiple) regression.

What are the types of the Gradient Descent technique? Briefly explain the differences.

Batch, Stochastic, Mini-batch. Data used: - Batch used the whole data set - Mini-batch used only a subset of the whole data set - Stochastic uses one random data/example Speed: - Stochastic > Mini-batch > Batch Global Minimum - Reaches the global minimum and then stops - Stochastic and Mini-batch walks around the minimum

If you are creating a classifier to filter bad videos for kids (4-6 years) and your classifier predicts the bad videos. Would you willing to tolerate high number of false positive or high false negatives?

High number of false positives.

We would like to predict the grade that students would get in Machine Learning class based on the following features: GPA, study_hours, math_grade, programing_grade We notice that the minimum GPA is 1.5 and max GPA is 4.0 and the mean value is 3.1. How would you apply standardization and normalization for scaling the GPA feature?

Normalization = rescaled from 0 to 1. Standardization = subtract mean value (3.1) and divide by standard deviation.

__________ is the Generalization of Logistic Regression to support multiple classes directly without having to train and combine multiple binary classifiers.

Softmax Regression

univariate univariable regression

One outcome, one explanatory variable (often used as the introdocutory).

We would like to use binary classifiers to detect a letter from the alphabet. We use OvR strategy, how many binary classes do we need to train? If we use OvO strategy how many binary classifiers do we need to train?

OvR strategy -> 26 OvO strategy -> (26 * 25) / 2

What is happening in the following code? from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler num_pipeline = Pipeline([('imputer', SimpleImputer(strategy="median")), ('attribs_adder', CombinedAttributesAdder()), ('std_scaler', StandardScaler()),]) housing_num_tr = num_pipeline.fit_transform(housing_num)

Pipeline steps. All but the last estimator must be transformers (i.e., must have fit_transform method). In the above code, we have transformers SimpleImputer, CombinedAttributesAdder and StandardScaler. SimpleImputer uses median value to fill in missing values. When you call the pipeline's fit() method, it calls fit_transform() sequentially on all transformers, passing the output of each call as the parameter to the next call until it reaches the final estimator, for which it calls the fit() method

A validation dataset is used to compare models a. True b. False

True

If a model performs great on the training data but generalizes poorly to new instances, the model is likely overfitting the training data (T/F)

True

Machine learning systems improve their performance in a given task with more and more experience or data. True/False?

True

Unless there are very few hyperparameter values to explore, prefer random search over grid search. True/False?

True

A pipeline is a. A sequence of data processing components b. A sequence that combines all datasets c. A sequence that samples a portion of a dataset d. A sequence that randomly processes a dataset

a. A sequence of data processing components

Which of the following is true for constraining weights for regularization? a. The term added to the cost function during training. Once the model is trained, use the unregularized performance measure to evaluate b. The hyperparameter α controls how much you want to regularize the model. If α is very high, then all weights end up very close to zero and the result is a flat line going through the data's mean. c. Lasso Regression tends to eliminate the weights (set to zero) of the least important features. d. All of the options listed

a. All of the options listed

Which of the followings is correct for regularization of linear regression? a. We should avoid plain linear regression b. Ridge regression is a good default c. We should use Lasso or Elastic Net if you expect that only a few features are actually useful d. All of the options listed.

a. All of the options listed.

Linear regression predicts ______________while logistics regression predicts ______________ a. Continuous values, classes b. Classes, continuous values c. Classes, Values close to the mean d. Classes, Outlier values away from the mean

a. Continuous values, classes

Gradient Descent will converge when training a Logistic Regression model because the cost function is a. Convex b. Complex c. Collocated d. Core optimized

a. Convex

Gradient decent is used for the following purpose a. Find the parameters that minimize the cost function b. Evaluate how good the predictions are. c. Split the dataset in training and test sets. d. Compute the recall

a. Find the parameters that minimize the cost function

To fine-tune your model and perform iterative evaluations, you may use a. Grid search b. Full text search c. Sample search d. Spot search

a. Grid search

This type of learning method is capable of adapting rapidly to both changing data and autonomous systems, and of training on very large quantities of data. a. Online learning b. Offline learning c. Reinforcement learning d. Semisupervised learning

a. Online learning

The code below: X_train, X_test, y_train, y_test = X[:60000], X[6000:], y[:60000], y[6000:] a. Partitions the data into training and test datasets b. Partitions the data into two training and two test datasets c. Partitions the data into four datasets d. Partitions the data into two datasets randomly

a. Partitions the data into training and test datasets

The best learning type to teach a robot to learn to walk in various unknown terrains is a. Reinforcement learning b. Supervised learning c. Semisupervised learning d. Other types of learning

a. Reinforcement learning

Which of the following is not correct for softmax regression? a. It is the Generalization of Logistic Regression b. Softmax regression does not train and combine multiple binary classifiers c. Softmax regression should only be used for mutually exclusive classes d. Softmax is a multioputput classifier

a. Softmax is a multioputput classifier

What indicates underfitting?

a. The model performs poor on the training data and also generalizes poorly. b. The training and validation learning curves reach a plateau and they are close and fairly high. c. Adding more training data does not help improving the performance on the training data d. We need more complex model or come up with better features

What indicates overfitting?

a. The model performs well on the training data but generalizes poorly according to the cross-validation metrics. b. The error on the training data is low but considerable high on the validation data c. There is a gap between the learning curves for training and validation data

A hyperparameter is a parameter of the learning algorithm itself, not of the model a. True b. False

a. True

An online learning system can learn incrementally a. True b. False

a. True

Test dataset is used to estimate the generalization error that a model will make on new instances, before the model is launched in production a. True b. False

a. True

To classify pictures as outdoor or indoor and daytime or nighttime we may train a. Two Logistic Regression classifiers b. Two Linear Regression classifiers c. Four Logistic Regression classifiers d. Four Linear Regression classifiers

a. Two Logistic Regression Classifiers

Clustering is a type of a. Unsupervised learning task b. Supervised learning task c. Regression learning task d. Batch learning task

a. Unsupervised learning task

Excessively simple models ________________ while excessively complex models _____________ a. underfit the training data, overfit the training data b. overfit the training data, underfit the training data c. underfit the training data, optimally fit the training data d. optimally fit the training data, underfit the training data

a. underfit the training data, overfit the training data

For linear models, regularization is typically achieved by _________ of the model.

constraining the weights

A labeled training set is a. A dataset that contains specific names b. A dataset that contains the desired solution c. A dataset that contains boolean instances d. A dataset that contains sufficient instances

b. A dataset that contains the desired solution

Customer segmentation into group is a type of a. Classification task b. Clustering task c. Regression task d. Reinforcement task

b. Clustering task

Batch learning systems learn dynamically a. True b. False

b. False

A validation dataset is used to compare a. Datasets b. Models c. Features d. Labels

b. Models

A univariable regression uses a. Two features to predict the outcome b. One feature to predict the outcome c. One feature in its dataset d. At least two features in its dataset

b. One feature to predict the outcome

Given a training set with millions of features, the fastest algorithm to use to perform a search for a global minimum a. The Normal Equation b. Stochastic Gradient Descent c. Mini-batch Gradient Descent d. Batch Gradient Descent

b. Stochastic Gradient Descent

One hot encoder is technique fora. encode continuos features b. categorical features c. Similar to min-max normalization to improve the convergency of gradient decent d. imputation system to fill out missing values

b. categorical features

Root Mean Square Error (RMSE) is a measure of how much _______________ the system typically makes in its predictions a. confidence b. error c. bias d. variance

b. error

Gradient Descent cannot get stuck in a _______________ when training a Logistic Regression model a. local minimum b. global minimum c. plateau d. summit

b. global minimum

The logistic regression approach is used for a. Regression b. Clustering c. Classification d. Data segmentation

c. Classification

The code below: From sklearn.linear_model import SGDClassifier Sgd_clf = SGDClassifier (random_state = 42) Sgd_clf.fit(X_train, y_train) a. Instantiates and validates a classifier b. Instantiates and regularizes a classifier c. Instantiates and trains a classifier d. Instantiates and optimizes a classifier

c. Instantiates and trains a classifier

Mean Absolute Error is a preferred performance measure for data with many a. Instances b. Features c. Outliers d. Classes

c. Outliers

A performance measure for regression is: a. recall b. precision c. Root Square Mean Error (RSME) d. F1-score

c. Root Square Mean Error (RSME)

If you are using a learning algorithm to estimate the price of houses in a city, you may want one of your features xi to capture age of the houses. In your training set, all the houses have an age between 10 to 35 with an average of 17. Which of the following would you use as features if you use normalization for feature scaling: a. xi = age of house b. xi = (age of house)/35 c. xi = (age of house - 10)/25 d. xi = (age of house - 17)/25

c. xi = (age of house - 10)/25

Which of the following is true for Normal Equation, Batch Gradient Descent (GD), Stochastic GD and Mini-Batch GD? a) After training, all these algorithms end up with very similar models and make predictions in exactly the same way. b) Batch GD's path actually stops at the minimum, while both Stochastic GD and Mini-batch GD continue to walk around global minimum. c) Mini-batch GD will end up walking around a bit closer to the minimum than Stochastic GD - but it may be harder for it to escape from local minimal d) All of the options listed

d) All of the options listed

Which of the following is not a way of constraining weight for regularization? a. Ridge Regression b. Lasso Regression c. Elastic Net d. Softmax

d. Softmax

A learning algorithm tries to find optimal values for its model parameters such that a. The model generalizes well to training instances b. The model generalizes well to outlier instances c. The model generalizes well to large instances d. The model generalizes well to new instances

d. The model generalizes well to new instances

The validation set and the test set must be as representative as possible of the data you expect to use in production. We would have unexpected generalization errors due to ______

data mismatch.

To avoid ___________, we should not look at the test set. If we look, we may see an interesting pattern in the test data that leads you to select a particular kind of Machine Learning model. Since your model will perform well on the test set because of this selection, you might get an unexpected generalization error.

data snooping bias

If we would like to get the count, mean, std, etc. values of numeric fields of a dataframe, we can utilize the

describe() method.

During the normalization/min-max

feature scaling technique; values are shifted and rescaled so that they end up ranging from 0 to 1.

During standardization

feature scaling, we subtract the mean value and then it divides by the standard deviation so that the resulting distribution has unit variance.

The error rate on new cases is called the _________ (or out-of-sample error), and by evaluating your model on the test set, you get an estimate of this error.

generalization error

info() method of dataframe structure of Pandas library is useful to

get a quick description of the data, in particular the total number of rows, each attribute's type, and the number of nonnull values.

multiple dependent variables

multi-variate regression

multivariate multivariable regression

multiple outcomes, multiple explanatory variables.

multivariate univariable regression

multiple outcomes, single explanatory variable.

multiple independent variables indicate

multiple regression or multi-variable regression

Multioutput-multiclass classification

or simply multioutput classification) is simply a generalization of multilabel classification where each label can be multiclass (i.e., it can have more than two possible values).

If you set the regularization hyperparameter to a very large value, you will get an almost flat model (a slope close to zero); the learning algorithm will almost certainly not _______ the training data, but it will be less likely to find a good solution.

overfit

To save your model, you can utilize ____ or ____ libraries.

pickle, joblib

We perform ___________ to guarantee that the test set is representative of the overall population. During __________, the population is divided into homogeneous subgroups called strata, and the right number of instances are sampled from each stratum

stratified sampling

(One-variable regression) Consider the plot below corresponding to h(x) = theta0 + theta1x What are theta0 and theta1?

theta0 = y-intercept, theta1 = slope

When should we stop training to avoid overfitting?

when RMSE is below 1

Which one of these are not one of the feature engineering steps: • Discretize continuous features. • Decompose features (e.g., categorical, date/time, etc.). • Add promising transformations of features (e.g., log(x), sqrt(x), x2, etc.). • Aggregate features into promising new features.

• Discretize continuous features. • Decompose features (e.g., categorical, date/time, etc.). • Add promising transformations of features (e.g., log(x), sqrt(x), x2, etc.). • Aggregate features into promising new features.

One-versus-the-Rest (OvR) (one-versus-all)

• Get the decision score from each classifier for that image and select the class whose classifier outputs the highest score.

One-versus-One (OvO)

• Train a binary classifier for every pair of binary classifier: one to distinguish 0s and 1s, another to distinguish 0s and 2s, another for 1s and 2s, and so on. If there are N classes, you need to train N × (N - 1) / 2 classifiers.


संबंधित स्टडी सेट्स

Chapter 7 - Health Insurance Underwriting

View Set

Restorative Art Ch. 2 Bones of the Head and Face

View Set

Chapter 2 - Understanding Identity and Access Management

View Set

Human Engineering Week 9 (Anthropometry)

View Set

Completing the Application, Underwriting, and Delivering the Policy

View Set