AWS ML Specialty
Image classification/Object detection
Determine if an object exists within an image. Object detection identifies one or many objects within an image.
Sagemaker Early Stopping
During training, you can have your sagemaker models stop training when the model begins to stop improving performance.
Loss Function
Evaluates how effective an algorithm is at modeling data
CPU vs GPU
GPU- massive, *parallel* architecture consisting of thousands of smaller, more efficient cores designed for handling *multiple tasks simultaneously* CPU- a few cores optimized for *sequential*, serial processing
XGBoost
Gradient boosted decision trees (ensemble decision trees) Supervised learning mode for classification (binary or multi-class), regression, and ranking problems. Hyperparameters: Subsample, ETA (learning rate), gamma (the information gain required to create a split), Alpha (L1 regularization), Lambda (L2 regularization), Eval_metric, Max_depth Data: CSV, Libsvm Instances: Can use single instance GPU or CPU if multiple instances are needed
Amazon Augmented AI (A2I)
Lets you build ML workflows that integrate human intervention into your models. Good for sending low-confidence predictions for review/correction.
LSTM
Long Short-Term Memory networks were invented to prevent the vanishing gradient problem in Recurrent Neural Networks by using a memory gating mechanism. LSTMs are very good at mapping both long + short term dependencies within data.
Amazon Comprehend
NLP to derive meaningful insights from text. Can use annotations & training documents OR entity types & training docs.
Can you attach an EBS volume to a sagemaker training job?
No
Underfitting
Occurs when a machine learning model has poor predictive abilities because it did not learn the complexity in the training data. - Increase the number of domain specific or relevant features. The input data may not have enough information for the model - Decrease the amount of regularization
Hyperbolic Tangent (tanh)
similar to sigmoid function, but centered at 0. Converges faster than sigmoid function so it is generally preferred.
Sequence2Ssequence
supervised learning algorithm where the input is a sequence of tokens (for example, text, audio) and the output generated is another sequence of tokens. ex (machine translation, text-speech, text summariazation)
Blazing Text
supervised model for classifying sentences (not full documents). This uses an optimized word2vec model. Can use either skip-gram, batch skip-gram or continuous bag of word (CBOW) architectures. Instances: CPU can be used for any of the above modes, GPU can be used but not for batch skip-gram Input: Labeled text sentences
Factorization Machines
supervised model used for classification or regression with sparse data. This is often used with recommendation engines or click through rate Input: Must be record-IO Float32 Instances: CPU or GPU (most likely CPU)
k-means
unsupervised classification algorithm that groups points together into k groups (clustering)
Latent Dirichlet Allocation (LDA)
unsupervised topic modeling for text documents without a neural network (similar to neural topic modeling but less of a tendency to overfit). determines N number of topics
k-fold cross validation
used to compare models. create K folds (subsets of the data). train each model on all but one fold (k-1). Evaluate model on last fold of data. Repeat K times until each combination has been tried. Great for testing most generally successful model.
Sagemaker IP Insights
uses statistical modeling and neural networks to capture associations between online resources (for example, online bank accounts) and IPv4 addresses
Word2Vec
word2vec is an algorithm and tool to learn word embeddings (man -> boy) by trying to predict the context of words in a document.
NLP Preprocessing
* Converting to lower case * Stop word removal is the process of removing words that do not add meaning to a sentence * Word tokenization is the process of splitting up a sentence to a number of words so it can be turned into vectors in word2vec
Sagemaker Hyperparameter tuning techniques
1) Grid search 2) Random search (can outperform grid search) 3) Bayesian Optimization (optimization model that searches for best hyperparameters)
F1 Score
2 * precision * recall / (precision + recall)
sagemaker: inference pipelines
2-5 Containers strung together in sagemaker pipelines
Amazon Translate
A neural machine translation service that delivers fast, high-quality, and affordable language translation.
Neural Networks
A set of nodes, organized in layers. Hidden layers are weighted sums of previous layers. To go from one layer to the next, we use an 'Activation Function' which provides the non-linear transform. Works well in problem sets where a linear line does not well divide data.
Accuracy v.s. Precision v.s. Recall
Accuracy: How often did the model predict the right thing? Precision: Percentage of positive identifications were actually correct? (TP / TP + FP) Recall: Percentage of positive records that were classified correctly? (TP / TP + FN)
ReLu
Activation function that converts outputs into a piecewise function, range of (0, positive integer). not a probability distribution. overcomes the vanishing gradient problem (unlike tanH and sigmoid)
Amazon Lex
Amazon Lex is a service for building conversational interfaces into any application using voice and text. Conversational chat bots with both voice and text input. Uses NLU + ASR. Select the language, build the bot, and create the intent (goals of the user). Then attach Lex to your application.
Amazon Transcribe
An automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capability to their applications. Has features to remove PII (Automatic content redaction) and filter out words you dont want (Vocabulary filtering)
multinomal logistic regression
An extension of Binary Logistic Regression -> Multiclass Classification Model for Supervised learning
AUC
Area under the ROC curve (TP v.s. FP rate). Used for measuring the success of a classification model. ROC curve is a graph of the True Positive Rate vs False Positive Rate. AUC can be interpreted as the probability that a random positive value is ranked higher than a random negative value. A perfect model has a AUC = 1.
Help a CNN converge
Batch normalization, increase learning rate, normalize the images
Sagemaker Model Endpoint + API gateway
How to expose a model to the public. Sagemaker model endpoints are private (require AWS perms) to use without an API gateway.
How to update an endpoint with a new model without downtime or change to code.
If autoscaling is turned on: de-register the endpoint as a scalable target. update the endpoint using a new endpoint configuration with latest s3 path, and then re-register the endpoint as scalable target. If autoscaling is not turned on: update the endpoint using a new configuration with the new s3 path.
MICE (Multiple Imputations and Chained Equations)
Imputation technique that uses multi-variate predictive models to infill data. Better than naive approaches
Model Pruning
Removing weights that don't contribute as much to model training. Reduces deep learning model size + inference time. Different from dropout regularization, which removes nodes that don't perform well and reduces overfitting.
FSx for Lustre on Sagemaker
Speed up training + startup times with File based input data by using FSx for Lustre. This way, Sagemaker doesn't need to load all data into memory on EBS volumes.
Scaling preprocessing techniques
Standard Scaler: normalizes data in all columns to scale, shift, and center all columns. Normalizer: Normalizes only one column. Max absolute scaler: scales each column by its max value, but does not shift/venter
Improve Sagemaker Startup Times
Store dataset as protobuf RecordIO format in S3. This enables sagemaker pipe mode, where data is streamed into Sagemaker. Reduces startup time and improves throughput for faster training.
KNN
Supervised classification algorithm that groups points based on K-nearest neighbors
SVM (Support Vector Machines)
Supervised classification algorithm that uses decision boundaries.
Factorization Machines (FMs)
Supervised learning model for classification and regression
TF-IDF
Term frequency (how many times it appears in a document) / Inverse Document Frequency (Log(# of documents that contain the word / Total # of documents) * Any word that is in every document will automatically have a TF-IDF of 0 because log(1) = 0
Softmax
The softmax function is typically used to convert a vector of raw scores into class probabilities at the output layer of a Neural Network used for classification. Output may be (.1, .2, .7) for a 3-classification problem
Semantic Segmentation & Instance Segmentation
This algorithm is used in computer vision to classify each pixel. Example - separate a virtual background from a person. Instance Segmentation takes semantic one step further, by identifying potentially more than one person from the virtual background (i.e. person 1, person 2, etc).
Object2Vec
Turns full objects (sentances, documents, etc), into embeddings and finds the nearest neighbors of a document. This can be used for recommendations. Can use average pooling, CNNs or LTSMs to embed documents Input: Must be tokenized to integers Instances: Start with xlarge CPU or GPU for training. xlarge GPU is recommended for inference
Neural Topic Modeling
Unsupervised clustering of documents into similar topics (clusters) Input: RecordIO-Wrapped-Protobuf or CSV for training. Inference can be text/csv, protobufIO, Application/JSON or application/json lines Hyperparameters: vocab_size, num_topics, other NN hyperparameters Instances: GPU recommended, CPU acceptable
Learning Rate
a value that can range from 0 to 1 and controls how much learning (exploration v.s. exploitation) takes place after each trial. Too large a learning rate, and the model will not converge at the optimal solution (loss v.s. epoch looks like 1/x). Too small a learning rate, and the model will take longer and may never converge on the optimal solution (loss v.s. epoch looks like -x or extreme cases parabola).
Sigmoid
activation function squashes inputs as 0 (as more negative a number gets) or 1 (as more positive a number gets)
Collaborative Filtering
algorithm based on (user, item, rating) tuples. good for identifying patterns/recommendations amongst users, to serendipitously recommend products.
Amazon Elastic Inference
allows you to attach low-cost GPU-powered acceleration to Amazon EC2 and Sagemaker instances or Amazon ECS tasks, to reduce the cost of running deep learning inference by up to 75%. Amazon Elastic Inference supports TensorFlow, Apache MXNet, PyTorch, and ONNX models
t-Distributed Stochastic Neighbor Embedding (t-SNE)
non-linear dimensionality reduction algorithm used for exploring high-dimensional data
collaborative filtering v.s. content-based filtering
collaborative filtering: algorithm that groups customers with similar buying intentions, preferences, and behaviors and serendipitously predicts future purchases content-based filtering: algorithm that recommends products based on the product features of items the customer has interacted with in the past
SageMaker Ground Truth
helps you build highly accurate training datasets for machine learning quickly. Ground Truth Plus: Labels data using expert labeling workforce and pre-labeling ML models - you don't need to set up workflows or manage your own labeling Ground Truth: Build your own workflows and labeling workforce (mechanical turk, own workforce,
AWS Panorama
machine learning device and software development kit (SDK) that allows you to bring computer vision to on-premises cameras to make predictions locally with high accuracy and low latency.