O'Reilly: Chapter 1
Validation Set
*A second holdout after the test-set*. This is used to identify the best hyper-parameter and model for the given data.
What are *causes for bad data*?
1. *Insufficient Quantity of Training Data* - current ML models need *a lot* of data in order to properly generalise. 2. *Non-representative Training Data* - the training data must be representative of the new cases it needs to generalise to. Wary of: > *Sampling Noise* - non-representative data due to *chance* > *Sampling Bias* - non-representative data due to a *flawed sampling method* 3. *Poor Quality Data* - removing of outliers, cleaning up noise, dealing with data with missing features 4. *Irrelevant Features* - having too many irrelevant features can hinder performance. > *Feature engineering* - is the task of selecting, extracting, and creating features for the task
Why use an ML algorithm?
1. *Replacing* long lists of *rules* 2. When there is *no good solution* with traditional approaches 3. *Fluctuating environments*, since an ML program can adapt 4. To *gain insight* on the a complex problem or large amounts of data.
Trade-off of Various Models
1. Speed 2. Accuracy 3. Complexity
No Free Lunch Theorem (Supervised Learning)
> David Hume pointed out that: 'even after the observation of the frequent or constant conjunction of objects, *we have no reason to draw any inference concerning any object beyond those of which we have had experience*' > i.e. If we remove all assumptions, there is no one model that can work best for every problem. > Connection to: *the problem with induction*
What is the distinction between *supervised*, *semisupervised* and *unsupervised* learning*?
> In supervised learning the training data you feed to the algorithm *includes the desired solutions*. In unsupervised learning, there is no solution. > Semisupervised learning is *a mix of both*, usually *a lot of unlabelled data and a little labeled data*. Supervised Learning Tasks: > classification > predicting target values (*regression*) Unsupervised Learning Tasks: > dimensionality reduction > anomaly detection > visualization > *association rule learning* - finds interesting relations between attributes Semisupervised Learning Tasks; > These algorithms are usually made up of unsupervised and supervised learning algorithms, ex.: >> Deep Belief Networks (DBNs) >> restricted Boltzmann machines (RBMs)
Reinforcement learning
> The learning system is referred to as an *agent*, it can *observe* the environment, *select* the correct action, and gets *rewards/penalties* in return. > From this the agent creates a strategy, or a *policies* to follow. > ex. Google's Deepmind and AlphaZero
What is *Machine Learning*?
> the *science* and art of programming computers to *learn from data* > a computer program is said to learn from *experience E* with respect to *some task T* and some *performance measure P*, if its performance on T, as measured by P, improves with experience E.
Epoch
A full training pass over the entire data set such that each example has been seen once. Thus, an epoch represents N/batch size training iterations, where N is the total number of examples. [Google]
Hyper parameter
A parameter of *the learning algorithm* rather then the model. It is set before training and remains constant. It can be used to set *constraints* on the model.
Batch learning
A system that is *incapable of learning incrementally* and thus must be trained *using all available data*. Features: > Not adaptable > Cannot handle huge amounts of data > ex. Naive Bayes
Online learning
A system that learns *incrementally*, by feeding it data instances sequentially, either individually or in mini-batches. The speed at which it learns is dependent on the *learning rate*. The speed at which it adapts comes with a cost to its memory. Thus: > High learning rate → rapid learning BUT forgets old data > Low learning rate → slow learning BUT is less sensitive to noise Features: > Good for on-the-fly learning > Can read continuous data flows > Discards learned data > Sensitive to bad data
Linear Regression Model
A type of regression model that outputs a continuous value from a linear combination of input features. [Google]
Overfitting
Creating a model that *matches the training data too closely* causing the model to fail when making correct predictions on new data. Occurs when the model is too complex relative to the amount and noisiness of the training data. Solutions: > Simplify the model with fewer parameters or by constraining it > Gather more training data > Reduce the noise in the training data
What is the difference between *instance based learning* and *model-based learning*?
In *instance based learning*, it *commits examples to memory*, then generalizes to new cases using a *similarity measure* Example Similarity Measures: > Cosine Similarity function > Euclidean Distance function In *model based learning*, it *generates a model* from the examples and *uses that model* to make predictions.. This is the general flow: 1. *Model Selection*: Pick a model 2. *Performance Measure*: Select either a: > *utility function* to measure how good it's doing > *cost function* to measure how bad it's doing 3. Training 4. Deploying
What was the conclusion of *Peter Norvig et al.*'s paper titled "The Unreasonable Effectiveness of Data"?
In it they found that with very large amounts of data (in terms of millions of data points), various machine learning algorithms *tend to perform the same*. Thus "we may want to reconsider the trade-off between spending time and money on *algorithm development versus spending it on corpus development*"
Generalization Error (out-of-sample error)
The rate of error the algorithm has on the test set. If the training error is low (i.e. few mistakes on training set) but high generalization error it means *overfitting*.
No Free Lunch Theorem (Search/Optimization)
When an increase in search/optimization performance is found, it *comes at a cost* on the performance of another class of problems.
Underfitting
When the model is *too simple* to learn the underlying structure of the data. Solutions: 1. Select a more powerful model with more parameters 2. Feeding better features 3. Reducing constraints on the model