Machine Learning CP 1

Ace your homework & exams now with Quizwiz!

When is a Stochastic Gradient Descent Preferable to a Batch Gradient Descent?

When the cost function is very irregular

What is the difference between min-max scaling (normalization) and standardization

Standardization is z score normalization, meaning that at 0 is the mean, and std is always one. Standardization is better when there's lots of outliers, normalization might be better if the data isn't bell shaped.

How does a confusion matrix help with classification?

A confusion matrix counts the number of times instances of class A are classified as class B

Elastic Net

Middle ground between lasso and ridge regression. You can actually control both params.

Your data is underfitting. How do you fix this?

More sophisticated model most likely

What can SVM do?

Linear and nonlinear classification regression outlier detection

What is a hyper parameter

It controls the amount of regularization to apply during learning

What is unsupervised data mining?

P hacking

Make a series with correlation between objects

Step one: make a new dataframe that calculates all the correlations: dataframe.corr() Then tease out the one you want corr_matric['column']

What is the F1 score?

The F1 measure is the harmonic mean, or weighted average, of the precision and recall scores. Also called the f-measure or the f-score, the F1 score is calculated using the following formula: F1 = 2(PR/(P + R)) The F1 measure penalizes classifiers with imbalanced precision and recall scores, like the trivial classifier that always predicts the positive class. A model with perfect precision and recall scores will achieve an F1 score of one.

You're given ordered data, how should you treat it?

You need to make sure to always shuffle ordered data. 1. K-Fold validations will be sure to have each necessary classification/ category 2. You don't want a bunch of the same category or similar in a row

What's a great pandas tool to see correlation, linear and nonlinear

scatter_matrix

What is unsupervised learning good for?

- Classifying data into different areas - potentially like real review from fake reviews - feed it review length, product price, review frequency over time. - Anomaly detection

What is stratified sampling?

- If you're interviewing 1,000 people, then make sure that 513 are female and 487 are male, just like the pop you're trying to extrapolate to

In sklearn, you want to work with categories, how do?

1. Make a dataframe only of the categories (like df_categories) just call LabelBinarizer encoder = LabelBinarizer() housing_cat_1hot = encoder.fit_transform(housing_cat)

What does a real data pipeline workflow look like?

1. Run a pipeline on your numbers 2. Run a pipeline on your categories 3. Run a FeatureUnion with the transformer_list of the two+ pipelines

What are the two worst parts of decision trees?

1. They are senstitive to small variations in training data 2. They are sensitive to training data rotation because their orthogonal decision boundaries

you want to create calculated fields to pass through to sklearn. How do you do this?

1. Use FunctionTransformer 2. Pass through the dataframe as an arg 3. Make the fields The reason we make this as a function is to make it extensible. We want to reuse this code as often as possible.

What does it mean to be ε -insensitive

Adding more training instances does not affect the model's prediction

What is reinforcement learning?

An agent observes an environment, and gets rewards or penalties for actions. It generates a policy.

Why is CART a greedy algorithm

CART, or Classification and Regression Tree, searches for an optimum split at the top level. It looks for the top level factor that will produce the most "pure" results for a decision tree right away, rather than the complete split that will lead to the best solution downstream

What is the difference between Closed form and GA

Closed for directly computes model parameters that fit the model to the training set GA is an iterative approach that gradually tweaks model parameters to minimize the cost function over the training set.

What is non representative training data?

Data that doesn't match the new data you want to make predictions on

You're training a shload of models. How do you save them for later analysis?

Either pickle or sklearn joblib.

What is Bias and Variance Tradeoff?

Error Bias: Assuming the data is linear when it's quadratic (or similar mistake). Leads to underfitting training data. Variance: When a model is excessively sensitive to small variations in a data model and overfits the data.

Irreducible Error

Error given to the model from the noisiness of the data itself.

Association rule learning

Examining data to discover new and interesting relationships among attributes that can be stated as business rules.

You're using an SVM, which two kernels should you try first?

First: Linear Second: Gaussian RBF

Ridge Regression

Forces the model to keep weights as small as possible. Should only be used during training. More useful for noisey data

What is a pure node in a decision tree (use the keyword, gini)

Gini measures purity. If all the remaining data in a decision tree are in one class or another, it is pure, and the gini is 0

What is logistic regression

Gives a probability that the instance belongs to a class

Hard margin vs soft margin classification

Hard margin in SVM only works if data is clearly seperated. It forces data points to be outside the optimal hyperplane to be classified. Soft margin does not.

What is an optimal hyperplane?

In SVM's there are many ways to seperate clusters of data points. SVM's find the parallel lines that give the maximum margin between two clusters of data.

What is an epoch?

In gradient descent it's each round it goes through.

What is online learning?

Incrementally feed data so it continues to learn.

What is model based learning?

Indirect learning that breaks down an environment and uses observations of data to attempt to predict

How do you effectively grid search a Gradient descent?

Interrupt the algorithm when the gradient vector drops below a tolerance. This means that the gradient descent has almost reached a minimum.

What is a kernel trick?

It allows you to pass through a polynomial function to an SVM

What is a dot product used for?

It calculates the angle between two non-zero vectors.

What is an NP-complete problem?

It can only be solved by brute force, meaning every single option needs to be tried first

What is the learning rate of a gradient descent?

It determines how many iterations it takes to get to the bottom of the RMSE. Too small and it's short. Too large and you could jump across a valley

What is instance based learning

It gets a "positive" and then takes a measure of similarity to determine if the new point of data is similar enough to the other positives.

What is a Decision Tree good for?

It helps finding complex nonlinear relationships in data

What is ensemble learning?

It is a model built on other models. Like using the random forest model on multiple sets and taking the mean

What does logistic regression output?

It is commonly used for classification and gives a probability of something belonging to a class or category. Also used for predicting values based on known inputs.

What is a decision threshold?

It is just what it sounds like. At what percentage of certainty do I classify something?

How does a linear regression model get trained?

It just finds the parameters that minimize RMSE

What is a cost function?

It measures how bad a model is.

What is a utility function?

It measures how good a function is.

What is a similarity function?

It measures how much an instance resembles a certain landmark. It helps discover a linearly seperable model with an SVM

What is an ROC curve?

It plots the true positive rate against the false positive rate Receiver operating characteristic

What is k-fold cross-validation?

It randomly splits data into subsets and trains and evaluates the model a defined number of times, giving us a number of scores.

What is a standard scaler?

It removes variance by standardizing data

What is dimensionality reduction?

It simplifies data without losing too much information. It will merge several correlated features into one.

Why is batch gradient descent slow?

It uses the whole batch of training data at every step

What is a stochastic gradient descent?

It's a gradient descent that picks a random instance at every step and computes the gradients based only on that instance. It's very fast but will always bounce around.

What is polynomial regression

It's a linear model with an extended set of features. Just an equation with powers.

Tell me about the Predictor in the sklearn API

It's an estimator that can be intelligent. It has the .predict() method and the .score() method to meausre quality.

Tell me about the Estimators in a sklearn API

It's any object that can estimate parameters based on a dataset, like imputer which fills na with a value. It takes a .fit() method.

How does a decision tree work?

It's basically a bunch of IF statements lol

What's a graph to see if a model is underfitting or overfitting, based on how large the training set size is? Why use this?

Learning curve. It shows you that for as long as you're adding more training data, how much more effective the algorithm gets.

Lasso Regression

Least Absolute Shrinkage and Selection Operator Regression. - Tends to completely eliminate weights of the least important features. that's the key feature that differentiates it from ridge regression

Why do you need feature scaling?

ML doesn't do well when there's some data that has big numbers, and some that doesn't. You need to feature scale so that these dumb algos get it right.

You're about to do data exploration, what should you do?

Make a copy of the dataframe boi!!!

What do you want to use if your dataset has a lot of outliers and you're using regression?

Mean Absolute Error

What is a softmax regression?

Multi category logistic regression

In order to keep results and testing consistent, even when you refresh your data, your test set...

Needs to remain the same always. Save the UUID's or whatever somehow

What is batch learning?

Offline learning- the model is trained before put to production.

When creating categories, you often find that the arbitrarily assigned numbers to the categories wrongly influence machine learning algo's. How do you fix this with sklearn?

One Hot Encoding. Rather than having one column with numerical categories, you have multiple with a Boolean field for each category. To do this, use LabelBinarizer directly

How do you make your model better?

One way is through gridsearch, which typically looks for combinations of hyperparameter values. Import it via GridSearchCV

When should you prefer an ROC curve to a PR curve?

PR - Precision/Recall curve PR is better when the positive class is rare or when you care more about false positives than false negatives (don't classify things as positive, incorrectly)

How can you find if you want to choose between RMSE and MAE (plus, what are they?)

RMSE = Root Mean Square Error MAE = Mean Absolute Error If your data is closer to normalized, use RMSE, if it is not (lots of outliers), use MAE

What is the standard performance measure for regression problems?

Root Mean Square Error

How do you measure effectiveness of an algo with RMSE

Root Mean squared error # Get your predicted values based off of input prediction_set = lin_reg.predict(cleaned_data) # Make a new object of the mean squared error linear_mse = mean_squared_error(answer_key, prediction_set) # take the squareroot linear_rmse = np.sqrt(linear_mse) return linear_rmse

How do you see density in a scatter plot with overlapping values?

Set the alpha to 0.1

What's one way to make stochastic gradient descent better? What are the pitfalls of the method?

Simulated Annealing. It starts with a fast learning schedule but slows down over time. This can help get to the global minimum. Too fast and you can end up in a local minimum, or frozen halfway down. Too slow and you can jump around the minimum for too long.

What is early stopping?

Stopping a model when it begins to overfit a training model, and works best on the validation set.

How do you do stratified based sampling in sklearn?

StratifiedShuffleSplit

What is supervised learning?

Supervised learning is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal).

You want to predict a target numeric value, or classify data, what kind of machine learning do you use?

Supervised learning.

Define SVM

Support Vector Machine

What is a sklearn data pipeline? Give an example

The idea is that in order to work with data you need to clean it in a certain way. Pipelines allow you to run functions in order, in a predefined way. An example is: 1. Run imputer with a strategy to fillna 2. Run FunctionTransformer to add calculated fields 3. Run a StandarScaler() to normalize or standardize the data mypipeline =Pipeline([methods]) my_data = mypipeline.fit_transform(old_data)

After you run something from the sklearn API, like a imputer function, how do you see what it returned?

These are classes that learn stuff. Just place a _ at the end of a value. Like: imputer.statistics_

What are support vectors?

They are data points close to the hyperplane that influce the position of the hyperplane in an SVM algo

Tell me about Transformers in the sklearn API

They are estimators that can change datasets, like an imputer. The transformation is performed by the .transform() method on the tranformer itself, returning an ndarray

Why are decision trees non-parametric models? Why can this be bad?

They develop their own parameters from the data, rather than being built into an algorithm like a linear regression. It can lead to overfitting.

What is recall?

True Positive / True Positive + False Negative A lower value means you're missing what should properly be classified

What is Precision?

True Positive / True Positive + False Positive A lower value means you are overclassifying positives

What is mini-batch gradient descent

Uses random small sets of insances called mini-baches. It's typically very fast as you can use your GPU.

What is unsupervised learning?

Unsupervised learning is the machine learning task of inferring a function to describe hidden structure from unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution. This distinguishes unsupervised learning from supervised learning and reinforcement learning.

You want to test 1,000 hyperparameter methods in sklearn. How do?

Use RandomizedSearchCV

You want to pass through a dataframe to a sklearn function, how do?

Write a function to get the data you want returned as a np.array Typically you just take in the df, the type of data you want, and return the values

What's one way to ensure that your data is representative, given a column for salary? What pandas method should be used here?

You can break down your column into categories, and ensure that each category is represented as it should be. Make sure the categories are discrete. If there are a lot of outliers, let's say people between 100k-100,000,000, just assign those to one category. To do this in pandas, use pd.cut()

What is the sklearn API for cross-validation?

cross_val_score

How do you find the sklearn most important features in a grid search?

grid_search.best_estimator_.feature_importances_

Make a new linear regression machine learning model with sklearn

my_lin_reg = LinearRegression() my_lin_reg.fit(cleaned_data_with_attributes, answer_key)

Make a simple prediction with sklearn after the data is cleaned

my_predictor_instance = ML_Model() my_predictions = my_predictor_instance.predict(cleaned_data) # Then calculate RMSE # Then look at test data

Make a super simple bar chart based on a series

series.hist()

Can you use null values in a ML algo? What should you do?

sklearn has imputer to fill things. Drop all non-numerical columns and use it.

How do you split out training data in sklearn?

train_test_split()


Related study sets

IB French B SL Complex Structures

View Set

Поняття про штучний інтелект

View Set

What does it mean to be an outsider?

View Set