Machine Learning: Basics

Ace your homework & exams now with Quizwiz!

What is overfitting?

Overfitting when a model makes much better predictions on known data (data included in the training set) than unknown data (data not included in the training set).

How can you combat overfitting?

A few ways of combatting overfitting are: simplify the model (often done by changing the hyperparameters), select a different model, use more training data, or gather better quality data to combat overfitting.

What is cross validation and why is it useful?

Cross validation is a technique for more accurately training and validating models. It rotates what data is held out from model training to be used as the validation data. Several models are trained and evaluated, with every piece of data being held out from one model. The average performance of all the models is then calculated. It is a more reliable way to validate models but is more computationally costly, e.g. 5-fold cross validation requires training and validating 5 models instead of 1.

What is labeled data and what is it used for?

Labeled data is data that has the information about target variable for each instance. Labeled data allows us to train supervised machine learning algorithms.

What are some common applications of machine learning?

Machine learning algorithms are often used to learn and automate human processes, optimize outcomes, predict outcomes, model complex relationships, and to learn patterns in data (among many other uses!)

What is machine learning?

Machine learning is the field of science that studies algorithms that approximate functions increasingly well as they are given more observations.

What is the difference between online and offline learning?

Online learning refers to updating models incrementally as they gain more information. Offline learning refers to learning by batch processing data. If new data comes in, an entire new batch (including all the old and new data) must be fed into the algorithm to learn from the new data.

What is reinforcement learning?

Reinforcement learning describes a set of algorithms that learn from the outcome of each decision. For example, a robot could use reinforcement learning to learn that walking forward into a wall is bad, but turning away from a wall and walking is good.

What are the most common types of algorithms that use supervised learning?

The most common uses of supervised learning are regression and classification.

What are the most common types of algorithms that use unsupervised learning?

The most common uses of unsupervised machine learning are clustering, dimensionality reduction, and association-rule mining.

What is training data and what is it used for?

Training data is a set of examples that will be used to train the machine learning model. For supervised machine learning, this training data must have a labeled target, i.e. what you are trying to predict must be defined. For unsupervised machine learning, the training data will contain only features and will use no labeled targets, i.e. what you are trying to predict is not defined.

What is the simplest way to describe unsupervised learning vs. supervised learning?

Udacity (Introduction to Machine Learning) Supervised Learning = approximation Unsupervised Learning = Description

Explain how a ROC curve works.

https://en.wikipedia.org/wiki/Receiver_operating_characteristic The ROC curve is a graphical representation of the contrast between true positive rates and the false positive rate at various thresholds. It's often used as a proxy for the trade-off between the sensitivity of the model (true positives) vs the fall-out or the probability it will trigger a false alarm (false positives).

What's a feature vector?

A feature vector is an n-dimensional vector that contains essential information that describes the characteristics of an object. For example, it can be an object's numerical features or a list of numbers taken from the output of a neural network layer. In AI and data science, feature vectors can be used to represent numeric or symbolic characteristics of an object in mathematical terms for seamless analysis. Let's break this down. A data set is usually organized into multiple examples where each example will have several features. However, a feature vector won't have the same feature for numerous examples. Instead, each example will correspond to one feature vector that will contain all the numerical values for that example object. Feature vectors are often stacked into a design matrix. In this scenario, each row will be a feature vector for one example. Each column will feature all the examples that correspond to that particular feature. This means that it will be like a matrix, but with just one row and multiple columns (or a single column and multiple rows) like [1,2,3,5,6,3,2,0]. https://becominghuman.ai/extract-a-feature-vector-for-any-image-with-pytorch-9717561d1d4c

What is the difference between a model parameter and a learning hyperparameter?

A model parameter describes the final model itself, e.g. slope in a linear model. A learning hyperparameter describes the way in which a model parameter is learned, e.g. learning rate, penalty terms, number of features to include in a weak predictor.

What is a test set and why use one?

A test set is a set of data not used during training or validation. The model's performance is evaluated on the test set to predict how well it will generalize to new data.

What is a validation set and why use one?

A validation set is a set of data that used to evaluate a model's performance during training/model selection. After models are trained, they are evaluated on the validation set to select the best possible model. It must never be used for directly for training the model. It must also not be used as the test data set because we've biased our model selection toward working well on this data, even though the model was not directly trained on it.

What is Bayes' Theorem? How is it useful in a machine learning context?

https://betterexplained.com/articles/an-intuitive-and-short-explanation-of-bayes-theorem/ Bayes' Theorem gives you the posterior probability of an event given what is known as prior knowledge. Mathematically, it's expressed as the true positive rate of a condition sample divided by the sum of the false positive rate of the population and the true positive rate of a condition. Say you had a 60% chance of actually having the flu after a flu test, but out of people who had the flu, the test will be false 50% of the time, and the overall population only has a 5% chance of having the flu. Would you actually have a 60% chance of having the flu after having a positive test? Bayes' Theorem says no. It says that you have a (.6 * 0.05) (True Positive Rate of a Condition Sample) / (.6*0.05)(True Positive Rate of a Condition Sample) + (.5*0.95) (False Positive Rate of a Population) = 0.0594 or 5.94% chance of getting a flu. Bayes' Theorem is the basis behind a branch of machine learning that most notably includes the Naive Bayes classifier. That's something important to consider when you're faced with machine learning interview questions.

What's the trade-off between bias and variance?

https://en.wikipedia.org/wiki/Bias-variance_tradeoff Bias is error due to erroneous or overly simplistic assumptions in the learning algorithm you're using. This can lead to the model underfitting your data, making it hard for it to have high predictive accuracy and for you to generalize your knowledge from the training set to the test set. Variance is error due to too much complexity in the learning algorithm you're using. This leads to the algorithm being highly sensitive to high degrees of variation in your training data, which can lead your model to overfit the data. You'll be carrying too much noise from your training data for your model to be very useful for your test data. The bias-variance decomposition essentially decomposes the learning error from any algorithm by adding the bias, the variance and a bit of irreducible error due to noise in the underlying dataset. Essentially, if you make the model more complex and add more variables, you'll lose bias but gain some variance — in order to get the optimally reduced amount of error, you'll have to tradeoff bias and variance. You don't want either high bias or high variance in your model.

Define precision and recall.

https://en.wikipedia.org/wiki/Precision_and_recall Recall is also known as the true positive rate: the amount of positives your model claims compared to the actual number of positives there are throughout the data. Precision is also known as the positive predictive value, and it is a measure of the amount of accurate positives your model claims compared to the number of positives it actually claims. It can be easier to think of recall and precision in the context of a case where you've predicted that there were 10 apples and 5 oranges in a case of 10 apples. You'd have perfect recall (there are actually 10 apples, and you predicted there would be 10) but 66.7% precision because out of the 15 events you predicted, only 10 (the apples) are correct.

When should you use classification over regression?

https://math.stackexchange.com/questions/141381/regression-vs-classification Classification produces discrete values and dataset to strict categories, while regression gives you continuous results that allow you to better distinguish differences between individual points. You would use classification over regression if you wanted your results to reflect the belongingness of data points in your dataset to certain explicit categories (ex: If you wanted to know whether a name was male or female rather than just how correlated they were with male and female names.)

How do you ensure you're not overfitting with a model?

https://www.quora.com/How-can-I-avoid-overfitting This is a simple restatement of a fundamental problem in machine learning: the possibility of overfitting training data and carrying the noise of that data through to the test set, thereby providing inaccurate generalizations. There are three main methods to avoid overfitting: 1- Keep the model simpler: reduce variance by taking into account fewer variables and parameters, thereby removing some of the noise in the training data. 2- Use cross-validation techniques such as k-folds cross-validation. 3- Use regularization techniques such as LASSO that penalize certain model parameters if they're likely to cause overfitting.

What is the difference between supervised and unsupervised machine learning?

https://www.quora.com/What-is-the-difference-between-supervised-and-unsupervised-learning-algorithms Supervised learning requires training labeled data. For example, in order to do classification (a supervised learning task), you'll need to first label the data you'll use to train the model to classify data into your labeled groups. Unsupervised learning, in contrast, does not require labeling data explicitly.

Why is "Naive" Bayes naive?

https://www.quora.com/Why-is-naive-Bayes-naive?share=1 Despite its practical applications, especially in text mining, Naive Bayes is considered "Naive" because it makes an assumption that is virtually impossible to see in real-life data: the conditional probability is calculated as the pure product of the individual probabilities of components. This implies the absolute independence of features — a condition probably never met in real life. As a Quora commenter put it whimsically, a Naive Bayes classifier that figured out that you liked pickles and ice cream would probably naively recommend you a pickle ice cream.


Related study sets

Essential Elements of a Valid Contract

View Set

Economics Review (Chapter 12) - Money Growth and Inflation

View Set

Sensors, Optical Encoders, LVDTs

View Set

Management of Patients with Oncologic Disorders (Chapter 15)

View Set