Lecture 7 - Fundamentals of Machine Learning & Intro to Keras
Three things to consider when choosing an evaluation protocol
1) Data Representativeness; 2) Arrow of Time; 3) Redundancy in the data
Two reasons why feature engineering is important.
1) Features allow for more elegant and efficient problem-solving, removing tasks that the model does not require. 2)Critical when dealing with limited data because deep-learning models rely on having lots of training data to learn on their own.
Four most common ways to prevent overfitting?
1) More training data; 2) reduce capacity of the network; 3) add weight regularization; 4) add dropout
Four key aspects of Data-Preprocessing
1) Vectorization; 2) Normalization; 3) Dealing with Missing Values; 4) Feature Extraction
The purpose of Normalization is to ___
Aid machine learning models learn more effectively and efficiently, particularly when working with data that has a wide range of values.
How is the model's capacity set?
By the number of learnable parameters in the model (the number of layers and number of units per layer).
What is the current state of Reinforcement learning?
Currently, reinforcement learning is mostly limited to research use-cases because it hasn't proven reliable accuracy
Why is dealing with Missing Values important?
Dealing with missing values is an important part of data preprocessing and analysis, as missing values can affect the accuracy and validity of statistical analyses and models. NOTE - It's important to choose the appropriate method for dealing with missing values based on the characteristics of the data and the specific analysis being conducted.
Define Feature Engineering
Feature Engineering makes algorithms work more effectively by applying hard-coded (non-learned) transformation to the data before it goes into the model
Purpose of Feature Extraction
Feature engineering is to improve performance by transforming the raw image data into more meaningful features that the model can use to make accurate predictions.
Define Generalization
Generalization is how well the train model performs on unseen data
Key goal of machine learning is to achieve a model that can____
Generalize. Generalization occurs when a model is capable of performing on unseen data
What is hold-out validation?
Holding out validation is intentionally setting aside a fraction of the data from training, to evaluate and prevent information leaks
What is mean by the Hyperparameters of a model?
Hyperparameters refers to the number of layers or size of the model
Describe hyperparameter leaks.
Information leaks. Every time user tunes a hyperparameter of the model, based on the model's performance on the validation set, some information about the validation data leaks into the model.
Describe K-Fold Validation
K-Fold validation the model automatically splits data into partitions (aka, folding data) and tests the folded data for accuracy
What is the key feature of sell-supervised learning?
No humans in the loop. Models map input data to pre-tagged targeted, give a set of examples with machine-generated tags
Define Normalization
Normalization is the process of transforming input data to ensure that it falls within a similar range or scale, typically with a mean of 0 and a standard deviation of 1.
Define Optimization
Optimization is the process of adjusting a model to get the best performance possible on the training data
Define Overfitting
Overfitting occurs when a machine learning model is trained too well on the training data and starts to fit the noise in the data rather than the underlying pattern. In other words, the model becomes too complex and starts to memorize the training data instead of learning the general patterns that can be applied to new, unseen data.
What is the parameters of a model
Parameters are the network's weight
What is dropout (overfitting context)?
Process of randomly setting to zero (aka dropping out) a number of output features of the layer during training. The dropout rate is a fraction of the features (between 0.2 - 0.5%)
What is weight regularization?
Putting constraints on complexity of a network by force the model's weights to take only small values, thereby making the distribution of weight values more regular.
Simplest way to prevent overfitting?
Reduce the network size by limiting the number of parameters in the model.
Define Regularization
Regularization is the process of fighting overfitting by regularizing the parameters that constrain, regularizes, or shrinks the coefficient estimates towards zero. Regularization discourages learning a more complex or flexible model.
Define Reinforcement Learning
Reinforcement learning is a type of ML, in which the machine receives information inputs about environment & model learns to choose actions that maximize reward
What are hyperparameter leaks a problem?
Repeatedly tuning parameters —running one experiment, evaluating on the validation set, and modifying your model as a result— many times will result in a leak of increasingly large amount of information about the validation set into the model.
List 3 theoretical examples of reinforcement learning?
Self-driving cars, robotics, resource management
Three ways to assess a model's accuracy (hint: data splitting)
Splitting the data into three sets: 1) training, 2) validation, and 3) testing; then running the model's accuracy using all three data groups.
What is the purpose of supervised learning?
Supervised learning models map input data to known targets, when given a set of tagged examples (often tagged by humans)
Purpose of Vectorization
The purpose of vectorization is to transform the data to process into tensors of floating point data.
Describe the purpose of weight regularization
To avoid overfitting, weight regularization adds the lost function of a network a cost associated with large weights.
What is the cause of Underfitting?
Underfitting is usually caused by a model that is too simple or not complex enough to capture the relationships between the input and output variables
Define Underfitting
Underfitting is when model is not able to capture the underlying patterns in the data and performs poorly both on the training data and the test data.
What is the purpose of unsupervised learning?
Unsupervised learning models seek to identify correlations between current data to create relational deductions
Define Vectorization
Vectorization is the process of transforming data that needs to be processed into tensors
Distinguish Hold-Out from K-Fold Validation
hold-out validation is a simpler and faster technique but can have a high variance, while K-fold cross-validation provides a more reliable estimate but can be computationally expensive. The choice of technique depends on the size of the dataset, the computational resources available, and the desired accuracy of the performance estimate.
Describe Feature Extraction
process of selecting and transforming raw data features to improve the performance of machine learning models. It involves creating new features, selecting relevant features, and transforming features to improve the quality of the data.