Lectures and tutorials

Ace your homework & exams now with Quizwiz!

training example

A row in a machine learning dataset

Gradient Descent

A step-by-step version of linear regression. You basically compute the gradient of the loss function at one point, take a step in the negative gradient direction and repeat the process until the step length is (basically) zero

Requirements for least squares to work

1. the X matrix needs to have full column rank 2. the number of features needs to be bigger than the number of training examples

The three axioms of probability

1. 𝑃(𝐸) ≥ 0 for every E 2. 𝑃(Ω) = 1 3. If 𝐸1, 𝐸2, . . . are disjoint then 𝑃(∪𝐸𝑖) = ∑P(Ei)

Overfitting

Fitting the data too closely to the training data leading to poor generalization to new data

Regression

Prediction of continuous variables, e.g. fitting a line to data measurements

Leave-one-out cross-validation

Similar to k-fold cross validation but the validation data is instead only on instance.

X in machine learning

The input data, a matrix of the features and training examples used to create the prediction model

output variables/targets

The value we are trying to predict

Classification

assigning of objects to classes, like yes/no (basic classification) or banana/apple/orange (multi-class classification)

Sample space

Ω: the set of all possible outcomes of an experiment

Events

𝐸 ⊂ Ω: subsets of the sample space

Observations

𝜔 ∈ Ω: observed elements in the sample space

m and n (in machine learning)

m: number of features (columns) n: number of training examples (rows)

w in machine learning

the weights to be put on different features for making the predicions

Tensor

An array of number, if it has: - zero dimensions: scalar - one dimension: vector - two dimensions: matrix - more dimensions..

features

Columns in machine learning datasets

y in machine learning

The output variables we are trying to predict

Test data

Used to test the performance of the final model

3 types of unsupervised learning

1. Generative modelling 2. Clustering 3. Anomaly detection

2 types of supervised learning

1. Regression 2. Classification

3 types of machine learning

1. Supervised learning 2. Unsupervised learning 3. Reinforcement learning

How to fit a polynomial to data using linear regression

Add features corresponding to the squared (or higher order) value of the original feature and then fit the weights of the different orders using linear regression

i.i.d assumption

Assumption that the data is independently and identically distributed

Weaknesses of least squares

Computaitionally costly, O(m^3), and only works on some datasets

What is the advantages of gradient descent compared to linnear regression?

Less costly to calculate given a lot of features

k-fold cross validation

Non-test data is split into K mutual subsets. The model is trained with K-1 subsets and tested on the Kth K times changing which partition is the validation set each time. The model that produces the best average result in these tests is chosen and then trained on the entire data set before being tested on the test data.

Training data

The data the model is fitted to

Validation data

The data the model is tested on during development

Loss function

The function describing the difference between the current fitted function and a perfectly fitting function

Three partitions of data usually used in machine learning

Training, validation and test datasets

least squares regression

Way of creating a model for data by minimizing a square loss function. Works great for convex data with few features


Related study sets

MGMT 217 Practice Questions Exam 3

View Set

PrepU Health Assess Ch. 3 Assignment 3

View Set

Pharmacology Central Nervous System Stimulants

View Set

Ch. 13 The Eye and Ear ABBREVIATIONS

View Set