Machine Learning
fix an underfitting test result set
Try more advanced model increase model hyperparameters Reduce amount of features Train Longer
What is the "gold standard" validation strategy?
Try on new real-world data
What is the 3rd step in ML framework
Type of Evaluation to consider
What is the 2nd step in ML framework
Types of Data / Data classification
What does transfer learning mean in the context of medical imaging?
Weights of convolutional layers learned from ImageNet transfer to medical images, so we only need learn new parameters at the top of the network.
classification metrics
accuracy precision recall
data science does what
analyse data
what is modeling step
based on our problem and data, what model should we use?
types of evaluation metrics
classification - binary/multiclass regression - try to predict a number (sale price, how many will) recommendation
fix an overfitting test result set
collect more data try less advanced model
what is structured data
columns, rows excel, csv similar format
create an enviroment from a ml file
conda env create --file enviroment.yml --name env_from_file
conda export the enviroment to a yml file
conda env export --prefex PATH > enviroment.yml
conda list enviroment
conda env list
conda install new package
conda install jupyter
what is streaming data
data that DOES change over time. Stock data, news headlines
what is static
data that doesn't change over time, subset of structured/unstructured date you want a lot of these examples. The more data, the better!
what is reinforced learning
game punishment/reward - maximise the score
what is general ai
good at multiple tasks. Far away from acheving
what is narrow ai
good at one specific spezalized task. Maybe even better than a human
what is an overfitting test result set
great performance on training but poor performance on test data. Trained too much and too precise and model doesn't' generalize well.
what is unstructured date
images, audio, email, video
normal algoryth, is what
input -> instructions -> output
machine algoryth, is what
inputs -> output : best instimations, find "best pattern"
two types of translators
interpreter - line by line (PHP) Compiler - all at once (C#, binary)
feature engineering
looking at different features of data and creating new ones, altering existing ones
regression metrics
mean absolute error(MAE) mean squared error (MSE) root mean squared error (RMSE)
what is the goal of a training model/set
minimise time between experiments in this phase. It's an interative process. Add complexity as you need. Practical results
what is the test model/set
model comparison (10-15%) how our model will perform in the real world keep the test set/data separate at ALL costs
what is an underfitting test result set
not accurate. Poor performance - the model hasn't learned properly
different features of data
numerical features (int, float) or categorical features (bool, string)
recommendation metrics
precision at k
step 1 framework when working with ML
problem definition. What problem are we trying to solve? Will simple handbased systems work
what are the framwork steps
problem defintion types of data types of evaluation features modeling experimentation
what is deep learning
technique for implimenting ML
what is generalization
the ability for a machine learning model to perform well on data it hasn't' seen before
machine learning is, what?
using an algorythm to learn about different patters in data, make future predections from that
feature coverage, how much coverage do you want?
want > 10% coverage. Ideally every sample has the same features
step 4 what are features
what do we already know about the data different forms of data in structured or unstructured data different features of data -
what is done during experimentation
what have we tried/what else can we try? Ie try a different model, change input
Main types of ML
1) supervised learning 2) unsupervised learning 3) transfer learning 4) reinforced learning Most common are 1-3
what is the tuning model/set
10-15% of data ML learning models have hyper-parametsrs you can adjust A models first result is NOT it's last (iterative process) Tuning can take place on training or validation
what is the training model/set
70-80% of data to train is standard There are my prebuilt training models
machine learning is a subset of
AI
what is the 4th step in ML framework
Features
How do we learn our network?
Gradient descent
describe the hierarchical structure of images, listed from most complex to simplest?
High-level motifs, sub-motifs, and atomic elements
what is supervised learning
I know my input and output
what is transfer learning
I think my problem might be similar to something else. Can I leverage what I have. Images
what is unsupervised learning
I'm not sure of the output, but I have the inputs. Patterns are there. You apply the label
What are necessary for supervised machine learning? (3 things)
A model Labeled training data Learning from data
Why is gradient descent computationally expensive for large data sets?
Calculating the gradient requires looking at every single data point.
Which model, when used for image classification, can exceed the performance of humans?
Convolutional neural network (CNN)
What are the two main benefits of early stopping?
It helps save computation cost. It performs better in the real world.
What best describes transfer learning in the context of document analysis?
Parameters at the bottom of the model are transferable across all people and documents, while the parameters at the top are different between individuals.
In the polynomial fitting example, which one of the following is an example of overfitting?
Eighth order polynomial
what is the 6th step in ML framework
Experimentation
What is convolved with layer 2 features, or sub-motifs?
Layer 1 feature map
In the CNN explained in this lesson with 3 layers, which of the following allows a classification decision to be made?
Layer 3 feature map
What decision boundary can logistic regression provide?
Linear
What is overfitting?
Model complexity fits too well to training data and will not generalize in the real-world.
what is the 5th step in ML framework
Modeling
Why should the test set only be used once?
More than one use can lead to bias.
How is the loss function defined?
Negative log-likelihood
What does the equation for the loss function do conceptually?
Penalize overconfidence
Which is the conceptual meaning of convolution?
Shifting a filter to every location in an image.
What technique is used to minimize loss for a large data set?
Stochastic gradient descent
Which of the following are benefits of stochastic gradient descent?
Stochastic gradient descent gets near the solution quickly. Stochastic gradient descent can update many more times than gradient descent.
What is the primary advantage of having a deep architecture?
The model shares knowledge between motifs through their shared substructures.
What is the primary advantage of using multiple filters?
This allows the model to look for subtypes of the classification.
What is the purpose of a loss function?
To define a penalty for poor predictions
Which two of the following describe the purpose of a validation set?
To pick the best performing model. ..
3 modeling sets
Train, Validation, Test