Machine Learning Terms

¡Supera tus tareas y exámenes ahora con Quizwiz!

Induction

A bottoms-up approach to answering questions or solving problems. A logic technique that goes from observations to theory. E.g. We keep observing X, so we infer that Y must be True.

Observation

A data point, row, or sample in a dataset. Another term for instance.

Instance

A data point, row, or sample in a dataset. Another term for observation.

Model

A data structure that stores a representation of a dataset (weights and biases). These are created/learned when you train an algorithm on a dataset.

Feature Vector

A list of features describing an observation with multiple attributes. In Excel we call this a row.

Transfer Learning

A machine learning method where a model developed for a task is reused as the starting point for a model on a second task. In this style of learning, we take the pre-trained weights of an already trained model (one that has been trained on millions of images belonging to 1000's of classes, on several high power GPU's for several days) and use these already learned features to predict new classes.

Algorithm

A method, function, or series of instructions used to generate a machine learning model. Examples include linear regression, decision trees, support vector machines, and neural networks.

Universal Approximation Theorem

A neural network with one hidden layer can approximate any continuous function but only for inputs in a specific range. If you train a network on inputs between -2 and 2, then it will work well for inputs in the same range, but you can't expect it to generalize to other inputs without retraining the model or adding more hidden neurons.

Receiver Operating Characteristic Curve

A plot of the true positive rate against the false positive rate at all classification thresholds. This is used to evaluate the performance of a classification model at different classification thresholds. The area under the ROC curve can be interpreted as the probability that the model correctly distinguishes between a randomly chosen positive observation (e.g. "spam") and a randomly chosen negative observation (e.g. "not spam").

Attribute

A quality describing an observation (e.g. color, size, weight). In Excel terms, these are column headers.

Test Set

A set of observations used at the end of model training and validation to assess the predictive power of your model. How generalizable is your model to unseen data?

Validation Set

A set of observations used during model training to provide feedback on how well the current parameters generalize beyond the training set. If training error decreases but validation error increases, your model is likely overfitting and you should pause training.

Training Set

A set of observations used to generate machine learning models.

Convergence

A state reached during the training of a model when the loss changes very little between each iteration.

Deduction

A top-down approach to answering questions or solving problems. A logic technique that starts with a theory and tests that theory with observations to derive a conclusion. E.g. We suspect X, but we need to test our hypothesis before coming to any conclusions.

Bias term

Allow models to represent patterns that do not pass through the origin. For example, if all my features were 0, would my output also be zero? Is it possible there is some base value upon which my features have an effect? This typically accompany weights and are attached to neurons or filters.

Recall

Also called sensitivity. In the context of binary classification (Yes/No), this measures how "sensitive" the classifier is at detecting positive instances. In other words, for all the true observations in our sample, how many did we "catch." We could game this metric by always classifying observations as positive. R=TruePositivesTruePositives+FalseNegatives

Outlier

An observation that deviates significantly from other observations in the dataset.

True Positive Rate

Another term for recall, i.e. TPR=TruePositivesTruePositives+FalseNegatives. This rate forms the y-axis of the ROC curve.

Noise

Any irrelevant information or randomness in a dataset which obscures the underlying pattern.

Null Accuracy

Baseline accuracy that can be achieved by always predicting the most frequent class ("B has the highest frequency, so lets guess B every time").

Type 2 Error

False Negatives. The candidate was great but the company passed on him.

Type 1 Error

False Positives. Consider a company optimizing hiring practices to reduce false positives in job offers. This error occurs when candidate seems good and they hire him, but he is actually bad.

Segmentation

Grouping items by like characteristics.

Variance

How tightly packed are your predictions for a particular observation relative to each other?

Specificity

In the context of binary classification (Yes/No), this measures the model's performance at classifying negative observations (i.e. "No"). In other words, when the correct label is negative, how often is the prediction correct? We could game this metric if we predict everything as negative. S=TrueNegativesTrueNegatives+FalsePositives

Precision

In the context of binary classification (Yes/No), this measures the model's performance at classifying positive observations (i.e. "Yes"). In other words, when a positive value is predicted, how often is the prediction correct? We could game this metric by only returning positive for the single observation we are most confident in. P=TruePositivesTruePositives+FalsePositives

Extrapolation

Making predictions outside the range of a dataset. E.g. My dog barks, so all dogs must bark. In machine learning we often run into trouble when we do this outside the range of our training data.

Machine Learning

Mitchell (1997) provides a succinct definition: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E." In simple language this is a field in which human made algorithms have an ability learn by itself or predict future for unseen data.

Accuracy

Percentage of correct predictions made by the model.

Classification

Predicting a categorical output.

Regression

Predicting a continuous output (e.g. price, sales).

Normalization

Restriction of the values of weights in regression to avoid overfitting and improving computation speed.

Recall vs Precision

Say we are analyzing Brain scans and trying to predict whether a person has a tumor (True) or not (False). We feed it into our model and our model starts guessing. Precision is the % of True guesses that were actually correct! If we guess 1 image is True out of 100 images and that image is actually True, then our precision is 100%! Our results aren't helpful however because we missed 10 brain tumors! We were super precise when we tried, but we didn't try hard enough. Recall, or Sensitivity, provides another lens which with to view how good our model is. Again let's say there are 100 images, 10 with brain tumors, and we correctly guessed 1 had a brain tumor. Precision is 100%, but recall is 10%. Perfect recall requires that we catch all 10 tumors!

Confusion Matrix

Table that describes the performance of a classification model by grouping predictions into 4 categories: True Positives: we correctly predicted they do have diabetes; True Negatives: we correctly predicted they don't have diabetes; False Positives: we incorrectly predicted they do have diabetes (Type I error); False Negatives: we incorrectly predicted they don't have diabetes (Type II error)

Label

The "answer" portion of an observation in supervised learning. For example, in a dataset used to classify flowers into different species, the features might include the petal length and petal width, while this would be the flower's species.

Categorical Variables

Variables with a discrete set of possible values. Can be ordinal (order matters) or nominal (order doesn't matter).

Continuous Variables

Variables with a range of possible values defined by a number scale (e.g. sales, lifespan).

Bias metric

What is the average difference between your predictions and the correct value for that observation?

Feature

With respect to a dataset, this term represents an attribute and value combination. Color is an attribute. "Color is blue" is this term. In Excel terms, these are similar to cells. This term has other definitions in different contexts.

Dimension

This term for machine learning and data scientist is different from physics, using this term with data means how many features you have in you data ocean(data-set). e.g in case of object detection application, flatten image size and color channel(e.g 28*28*3) is a feature of the input set. In case of house price prediction, if house size is the data-set we would call it one-_____ data.

False Positive Rate

This term forms the x-axis of the ROC curve. Defined as: FPR=1−Specificity=FalsePositivesFalsePositives+TrueNegatives

High Variance

This variance (with low bias) suggests your model may be overfitting and reading too deeply into the noise found in every training set.

Low Variance

This variance suggests your model is internally consistent, with predictions varying little from each other after every iteration.

Unsupervised Learning

Training a model to find patterns in an unlabeled dataset (e.g. clustering).

Reinforcement Learning

Training a model to maximize a reward via iterative trial and error.

Supervised Learning

Training a model using a labeled dataset.

Clustering

Unsupervised grouping of data into buckets.

Classification Threshold

The lowest probability value at which we're comfortable asserting a positive classification. For example, if the predicted probability of being diabetic is > 50%, return True, otherwise return False.

Learning Rate

The size of the update steps to take during optimization loops like Gradient Descent. With this being high we can cover more ground each step, but we risk overshooting the lowest point since the slope of the hill is constantly changing. With this being very low, we can confidently move in the direction of the negative gradient since we are recalculating it so frequently. With this being low the process is more precise, but calculating the gradient is time-consuming, so it will take us a very long time to get to the bottom.

Hyperparameters

These are higher-level properties of a model such as how fast it can learn (learning rate) or complexity of a model. The depth of trees in a Decision Tree or number of hidden layers in a Neural Networks are examples of these.

Neural Networks

These are mathematical algorithms modeled after the brain's architecture, designed to recognize patterns and relationships in data.

Parameters

These are properties of training data learned by training a machine learning model or classifier. They are adjusted using optimization algorithms and unique to each experiment. Examples include: weights in an artificial neural network; support vectors in a support vector machine; coefficients in a linear or logistic regression

Loss

This = true_value(from data-set)- predicted value(from ML-model) The lower this is, the better a model (unless the model has over-fitted to the training data). This is calculated on training and validation and its interpretation is how well the model is doing for these two sets. Unlike accuracy, this is not a percentage. It is a summation of the errors made for each example in training or validation sets.

High Bias

This bias (with low variance) suggests your model may be underfitting and you're using the wrong architecture for the job.

Low Bias

This bias could mean every prediction is correct. It could also mean half of your predictions are above their actual values and half are below, in equal proportion, resulting in low average difference.

Multi-Class Classification

This classification predicts one of multiple possible outcomes (e.g. is this a photo of a cat, dog, horse or human?)

Binary Classification

This classification predicts one of two possible outcomes (e.g. is the email spam or not spam?)

Regularization

This is a technique utilized to combat the overfitting problem. This is achieved by adding a complexity term to the loss function that gives a bigger loss for more complex models

Deep Learning

This is derived from one machine learning algorithm called perceptron or multi layer perceptron that gains more and more attention nowadays because of its success in different fields like, computer vision to signal processing and medical diagnosis to self-driving cars. As all other AI algorithms deep learning is from decades, but now today we have more and more data and cheap computing power that make this algorithm really powerful to achieve state of the art accuracy. In modern world this algorithm knowns as artificial neural network. This is much more than traditional artificial neural network. But it was highly influenced by machine learning's neural network and perceptron network.

Feature Selection

This is the process of selecting relevant features from a data-set for creating a Machine Learning model.

Overfitting

This occurs when your model learns the training data too well and incorporates details and noise specific to your dataset. You can tell a model is doing this when it performs great on your training/validation set, but poorly on your test set (or new real-world data).

Underfitting

This occurs when your model over-generalizes and fails to incorporate relevant variations in your data that would give your model more predictive power. You can tell a model is ____________ when it performs poorly on both training and test sets.

Epoch

This term describes the number of times the algorithm sees the entire data set.


Conjuntos de estudio relacionados

Chapter 21: Antidepressant Agents

View Set

Match each definition to the correct term:

View Set

Chapter 39: Pediatric Variations of Nursing Interventions Perry: Maternal Child Nursing Care, 6th Edition

View Set

Chapter 11: Long-Term Debt Financing

View Set

Nutrition in infancy, childhood, and adolescence 15

View Set

Missed CAPM Questions Modules 9.0 - 13.0

View Set

Chapter 18: Speaking on special occasions

View Set

Honors Chemistry - Chapters 1-5 Short/Long Answers

View Set

Passpoint: Pharmacology and Medication Management Week 2

View Set

Abnormal Psy; Chapter 15 - Disorders of Childhood and Adolescence

View Set