Lectures and tutorials
training example
A row in a machine learning dataset
Gradient Descent
A step-by-step version of linear regression. You basically compute the gradient of the loss function at one point, take a step in the negative gradient direction and repeat the process until the step length is (basically) zero
Requirements for least squares to work
1. the X matrix needs to have full column rank 2. the number of features needs to be bigger than the number of training examples
The three axioms of probability
1. 𝑃(𝐸) ≥ 0 for every E 2. 𝑃(Ω) = 1 3. If 𝐸1, 𝐸2, . . . are disjoint then 𝑃(∪𝐸𝑖) = ∑P(Ei)
Overfitting
Fitting the data too closely to the training data leading to poor generalization to new data
Regression
Prediction of continuous variables, e.g. fitting a line to data measurements
Leave-one-out cross-validation
Similar to k-fold cross validation but the validation data is instead only on instance.
X in machine learning
The input data, a matrix of the features and training examples used to create the prediction model
output variables/targets
The value we are trying to predict
Classification
assigning of objects to classes, like yes/no (basic classification) or banana/apple/orange (multi-class classification)
Sample space
Ω: the set of all possible outcomes of an experiment
Events
𝐸 ⊂ Ω: subsets of the sample space
Observations
𝜔 ∈ Ω: observed elements in the sample space
m and n (in machine learning)
m: number of features (columns) n: number of training examples (rows)
w in machine learning
the weights to be put on different features for making the predicions
Tensor
An array of number, if it has: - zero dimensions: scalar - one dimension: vector - two dimensions: matrix - more dimensions..
features
Columns in machine learning datasets
y in machine learning
The output variables we are trying to predict
Test data
Used to test the performance of the final model
3 types of unsupervised learning
1. Generative modelling 2. Clustering 3. Anomaly detection
2 types of supervised learning
1. Regression 2. Classification
3 types of machine learning
1. Supervised learning 2. Unsupervised learning 3. Reinforcement learning
How to fit a polynomial to data using linear regression
Add features corresponding to the squared (or higher order) value of the original feature and then fit the weights of the different orders using linear regression
i.i.d assumption
Assumption that the data is independently and identically distributed
Weaknesses of least squares
Computaitionally costly, O(m^3), and only works on some datasets
What is the advantages of gradient descent compared to linnear regression?
Less costly to calculate given a lot of features
k-fold cross validation
Non-test data is split into K mutual subsets. The model is trained with K-1 subsets and tested on the Kth K times changing which partition is the validation set each time. The model that produces the best average result in these tests is chosen and then trained on the entire data set before being tested on the test data.
Training data
The data the model is fitted to
Validation data
The data the model is tested on during development
Loss function
The function describing the difference between the current fitted function and a perfectly fitting function
Three partitions of data usually used in machine learning
Training, validation and test datasets
least squares regression
Way of creating a model for data by minimizing a square loss function. Works great for convex data with few features