AWS Machine Learning Engineer Nanodegree
A categorical label
A categorical label has a discrete set of possible values, such as "is a cat" and "is not a cat."
Regression
A common task in supervised machine learning.
A continuous (regression) label
A continuous (regression) label does not have a discrete set of possible values, which means possibly an unlimited number of possibilities.
Stop words
A list of words removed by natural language processing tools when building your dataset. There is no single universal list of stop words used by all-natural language processing tools.
Plane
A mathematical term for a flat surface (like a piece of paper) on which two points can be joined by a straight line.
Hyperplane
A mathematical term for a surface that contains more than two planes.
Transformer
A more modern replacement for RNN/LSTMs, the transformer architecture enables training over larger datasets involving sequences of data.
Data vectorization
A process that converts non-numeric data into a numerical format so that it can be used by a machine learning model.
Silhouette coefficient
A score from -1 to 1 describing the clusters found during modeling. A score near zero indicates overlapping clusters, and scores less than zero indicate data points assigned to incorrect clusters. A
Bag of words
A technique used to extract features from the text. It counts how many times a word appears in a document (corpus), and then transforms that information into a dataset.
Discrete
A term taken from statistics referring to an outcome taking on only a finite number of values (such as days of the week).
CNN
Convolutional Neural Networks (CNN) represent nested filters over grid-organized data. They are by far the most commonly used type of model when processing images.
Hyperparameters
Hyperparameters are settings on the model which are not changed during training but can affect how quickly or how reliably the model trains, such as the number of clusters the model should identify.
Impute
Impute is a common term referring to different statistical tools which can be used to calculate missing values from your dataset.
Label
Label refers to data that already contains the solution.
Log loss
Log loss is used to calculate how uncertain your model is about the predictions it is generating.
RNNs
RNNs are based on the same principles as those behind FFNNs, which is why we spent so much time reminding ourselves of the feedforward and backpropagation steps that are used in the training phase. There are two main differences between FFNNs and RNNs. The Recurrent Neural Network uses: sequences as inputs in the training phase, and memory elements Memory is defined as the output of hidden layer neurons, which will serve as additional input to the network during next training step. The basic three layer neural network with feedback that serve as memory inputs is called the Elman Network and is depicted in the following picture: As mentioned in the History concept, here is the original Elman Network publication from 1990. This link is provided here as it's a significant milestone in the world on RNNs. To simplify things a bit, you can take a look at the following additional info. Let's continue now to the next video with more information about RNNs.
RNN/LSTM
Recurrent Neural Networks (RNN) and the related Long Short-Term Memory (LSTM) model types are structured to effectively represent for loops in traditional computing, collecting state while iterating over some object. They can be used for processing sequences of data.
Training dataset
The data on which the model will be trained. Most of your data will be here.
Test dataset
The data withheld from the model during training, which is used to test how well your model will generalize to new data.
FFNN
The most straightforward way of structuring a neural network, the Feed Forward Neural Network (FFNN) structures neurons in a series of layers, with each neuron in a layer containing weights to all neurons in the previous layer.
Keras Optimizers
There are many optimizers in Keras, that we encourage you to explore further, in this link, or in this excellent blog post. These optimizers use a combination of the tricks above, plus a few others. Some of the most common are: SGD This is Stochastic Gradient Descent. It uses the following parameters: Learning rate. Momentum (This takes the weighted average of the previous steps, in order to get a bit of momentum and go over bumps, as a way to not get stuck in local minima). Nesterov Momentum (This slows down the gradient when it's close to the solution). Adam Adam (Adaptive Moment Estimation) uses a more complicated exponential decay that consists of not just considering the average (first moment), but also the variance (second moment) of the previous steps. RMSProp RMSProp (RMS stands for Root Mean Squared Error) decreases the learning rate by dividing it by an exponentially decaying average of squared gradients.
Clustering
Unsupervised learning task that helps to determine if there are any naturally occurring groupings in the data.
Neural networks
a collection of very simple models connected together. These simple models are called neurons. The connections between these models are trainable model parameters called weights.
Outliers
are data points that are significantly different from others in the same sample.
Model parameters
are settings or configurations the training algorithm can update to change how the model behaves.
In supervised learning
every training sample from the dataset has a corresponding label or output value associated with it. As a result, the algorithm learns to predict labels or output values.
Machine learning, or ML
is a modern software development technique that enables computers to solve problems by using examples of real-world data.
model
is an extremely generic program, made specific by the data used to train it.
Model accuracy
is the fraction of predictions a model gets right. Discrete: A term taken from statistics referring to an outcome taking on only a finite number of values (such as days of the week). Continuous: Floating-point values with an infinite range of possible values. The opposite of categorical or discrete values, which take on a limited number of possible values.
Model inference
is when the trained model is used to generate predictions.
Loss function
loss function is used to codify the model's distance from this goal
In reinforcement learning
the algorithm figures out which actions to take in a situation to maximize a reward (in the form of a number) on the way to reaching a specific goal.
In unsupervised learning
there are no labels for the training data. A machine learning algorithm tries to learn the underlying patterns or distributions that govern the data.
Model training algorithms
work through an interactive process where the current model iteration is analyzed to determine what changes can be made to get closer to the goal. Those changes are made and the iteration continues until the model is evaluated to meet the goals.
In unlabeled data
you don't need to provide the model with any kind of label or solution while the model is being trained.
