Machine Learning
clustering algorithm (intro)
type of unsupervised learning algorithm that groups data together based on similarities
vectors (conventions)
Data science convention for bold font
model-based-learning
ML system: build a model of learning examples, then use that model to make predictions
RMSE term for The system's prediction function
h in RMSE
features
supervised learning term for predictors
batch learning
ML system: Machine is incapable of learning incrementally: it must be trained using all the available data. First the system is trained, and then it is launched into production and runs without learning anymore, and just applies what its learning.
instance-based learning
ML system: system learns examples by heart, then generalizes to new cases using similarity measure
unsupervised learning
ML system: system tries to learn without labeled training data
online learning
ML system: train the system incrementally by feeding it data instances sequentially, either individually or by small groups called mini-batches. Each learning step is fast and cheap.
semisupervised learning
ML system: training data set with a lot of unlabeled data and a little bit of labeled data. Usually a combination of unsupervised and supervised learning algorithms
supervised learning
ML system: training data you feed to the algorithm includes the desired solutions
Mean Absolute Error (aka Average Absolute Deviation)
Machine learning performance measure substitute for RMSE when data sets have a lot of outliers. Measures distance between prediction and target value vectors
Root mean square error
Machine learning performance measure that typically is used for regression which gives an idea of how much error the system typically makes in its predictions, with higher weight for large errors. Measures distance between prediction and target value vectors
no free lunch theorem (NFL)
Machine learning saying coined by David Wolpert that if you make no assumptions about the data, then there is no reason to prefer one model over any other.
machine learning (engineering definition)
Machine learning term for a computer program learns from experience E, with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E
hyperparameter
Machine learning term for a parameter of the learning algorithm and not the model itself
signal
Machine learning term for a piece of information fed to a machine learning system
data pipeline
Machine learning term for a sequence of data processing components
data mining
Machine learning term for applying ML techniques to dig into large amounts of data that helps discover unapparent patterns
Data snooping
Machine learning term for looking at the test set, recognizing a pattern and choosing a model based on that
training instance
Machine learning term for one training set or sample
training data
Machine learning term for the data that the machine experiences to learn
training set
Machine learning term for the examples that a system uses to learn
feature engineering
Machine learning term for the process in machine learning where you come up with a good set of features to train on which involves feature selection, feature extraction, and creating new features by gathering data
machine learning
Machine learning term for the science of art of programming computers so they can learn from data
overfitting
Machine learning term for when a model performs well on the training data set, but does not generalize well. Usually happens when the model is too complex relative to the amount and noisiness of the training data.
underfitting
Machine learning term for when your model is too simple to learn the underlying structure of the data (e.g. a linear model of life satisfaction, reality is too complex)
generalization error (aka out-of-sample error)
Machine learning term in testing for the error rate on new cases when comparing a training set to a test set
supervised, unsupervised, semisupervised, or reinforcement learning
Machine learning terms for types of ML systems that are or are not supervised
instance-based or model-based learning
Machine learning terms for types of ML systems that either compare new data points to known data points or detect patterns in the training data and build a predictive model
online or batch learning
Machine learning terms for types of ML systems that learn incrementally or on the fly
RMSE term for Matrix containing all feature values(excluding labels) of all instances in dataset
X in RMSE
RMSE term for Number of instances a dataset you are measuring the RMSE on (e.g 2000 districts ; m = 2000)
m in RMSE
cost function
model-based-learning term for type of performance measure that measures how bad the model is. (e.g. measuring distance between linear model predictions and training examples, try to minimize distance)
utility function (aka fitness function)
model-based-learning term for type of performance measure that measures how good the model is.
inference
model-based-learning term for when you apply the model to make predictions on new cases.
learning rate
online learning term for parameter on how fast a system is set to adapt to changing data
mini-batches
online learning term for small groups of data instances that are learned.
feature extraction (w/ feature engineering)
part of feature engineering where you combine existing features to produce a more useful one
feature selection (w/ feature engineering)
part of feature engineering where you select the most useful features to train on among existing features
policy
semisupervised learning term in reinforcement learning for the best strategy the system learns to reap the most rewards.
agent
semisupervised learning term in reinforcement learning for the learning system.
model selection
step in model-based-learning process where you select the type of model for a system to generalize with (e.g. linear model)
regression
supervised learning term for a task to predict a target numeric value given a set of predictors (features) (e.g. value of a car given mileage, age, brand etc. and their values)
classification
supervised learning term for a task where the system must separate data into classes based on examples of those classes
predictors
supervised learning term for an attribute plus its value (e.g. Mileage = 15K)
labels
supervised learning term for the desired solutions fed into the training data (e.g. price of a car)
target
supervised learning term for the goal value you are trying to predict
attribute
supervised learning term for the part of the predictor that holds a value (e.g Mileage)
offline learning
type of batch learning where, due to the long duration and large amount of computer resources needed, machines are not trained in production.
out-of-core-learning
type of online learning where systems are trained on huge datasets that cannot fit in on machines main memory. Algorithm loads part of data, runs a training step on that data, and repeats the process until it has run on all of the data. **Whole process is done offline, which may be confusing, so it can be thought of as incremental learning
logistic regression
type of regression algorithm that can be used for classification in supervised learning. it can output a value that corresponds to the probability of belonging to a given class (e.g. 20% chance of an email being spam)
deep belief networks (DBNs)
type of semisupervised learning system based on restricted Boltzmann machines (RBMs) stacked on top of one another. They are trained sequentially in an unsupervised manner, and then the system if fine-tuned using supervised learning techniques
reinforcement learning
type of semisupervised learning system where an agent can observe the environment, select and perform actions, and get rewards or penalties in return. It then learns by itself the best policy to get the most rewards.
dimensionality reduction algorithm
type of unsupervised learning algorithm in which the system simplifies data by merging relational data without losing too much information (e.g. merging cars age and mileage into wear and tear)
visualization algorithm
type of unsupervised learning algorithm where you feed the system a lot of complex and unlabeled data, and it outputs a 2D or 3D representation of your data that can easily be plotted. These algorithms try to preserve as much structure as possible to avoid clutter.
hierarchal clustering algorithm (intro)
type of unsupervised learning algorithm, a subtype of the clustering algorithm, that groups similar data together and subdivides each group into smaller groups
- clustering ---- k-means ---- hierarchal cluster analysis (HCA) ---- expectation maximization - visualization and dimensionality reduction ---- principal component analysis (PCA) ---- kernel PCA ---- locally-linear embedding (LLE) ---- t-distributed stochastic neighbor embedding (t-SNE) - association rule learning ---- apriori ---- Eclat
types(3) of unsupervised learning algorithms in book and their subtypes (3, 4, 2)
- k-nearest-neighbors - linear regression - logistic regression - support vector machines (SVMs) - decision tree and random forests - neural networks
types(6) of supervised learning covered in book
anomaly detection
unsupervised learning task for comparing data points to find unusual data points or outliers (e.g. unusual credit card transactions for fraud cases)
association rule learning
unsupervised learning task for finding interesting relations in large amounts of data (e.g. a supermarket finds those who buy BBQ sauce and chips also tend to buy steak)
feature extraction
unsupervised learning term in dimensionality reduction algorithm for merging multiple features into one (e.g. merging cars age and mileage into wear and tear)
RMSE term for Vector of all feature values (excluding label) of the (i)th instance in dataset (IV)
x^(i) in RMSE
RMSE term for Label for x^(i). It is the desired output value for that dataset instance (DV)
y^(i) in RMSE
1) Simplify the model by selecting one w/ fewer params, by reducing the number of attributes in the training data or by constraining the model (e.g. linear model rather than a high-degree polynomial model) 2) To gather more training data 3) to reduce the noise in the training data (e.g. fix data errors)
3 solutions to overfitting
1) select more powerful model with more params 2) feed better features to the learning algorithm (feat engineering) 3) Reduce the constraints on the model (e.g. reduce the regularization hyperparamter
3 solutions to underfitting
scalar values and function names (conventions)
Data science convention for italic fonts
matrices (convention)
Data science convention for uppercase bold font
transpose operator
Data science operator for flipping a column vector into a row vector
RMSE = (X,h) = Sqrt[1/m 'm-over sigma over i = 1' (h(x^(i)) - y^(i))^2]
RMSE(equation)
two degrees of freedom
Overfitting saying for a model with two parameters to work with. (e.g. a linear model with param 1 as height and param 2 as slope)
regularization
Overfitting term for constraining a model to make it simpler and reduce the risk of overfitting