AWS ML Specialty

Ace your homework & exams now with Quizwiz!

Image classification/Object detection

Determine if an object exists within an image. Object detection identifies one or many objects within an image.

Sagemaker Early Stopping

During training, you can have your sagemaker models stop training when the model begins to stop improving performance.

Loss Function

Evaluates how effective an algorithm is at modeling data

CPU vs GPU

GPU- massive, *parallel* architecture consisting of thousands of smaller, more efficient cores designed for handling *multiple tasks simultaneously* CPU- a few cores optimized for *sequential*, serial processing

XGBoost

Gradient boosted decision trees (ensemble decision trees) Supervised learning mode for classification (binary or multi-class), regression, and ranking problems. Hyperparameters: Subsample, ETA (learning rate), gamma (the information gain required to create a split), Alpha (L1 regularization), Lambda (L2 regularization), Eval_metric, Max_depth Data: CSV, Libsvm Instances: Can use single instance GPU or CPU if multiple instances are needed

Amazon Augmented AI (A2I)

Lets you build ML workflows that integrate human intervention into your models. Good for sending low-confidence predictions for review/correction.

LSTM

Long Short-Term Memory networks were invented to prevent the vanishing gradient problem in Recurrent Neural Networks by using a memory gating mechanism. LSTMs are very good at mapping both long + short term dependencies within data.

Amazon Comprehend

NLP to derive meaningful insights from text. Can use annotations & training documents OR entity types & training docs.

Can you attach an EBS volume to a sagemaker training job?

No

Underfitting

Occurs when a machine learning model has poor predictive abilities because it did not learn the complexity in the training data. - Increase the number of domain specific or relevant features. The input data may not have enough information for the model - Decrease the amount of regularization

Hyperbolic Tangent (tanh)

similar to sigmoid function, but centered at 0. Converges faster than sigmoid function so it is generally preferred.

Sequence2Ssequence

supervised learning algorithm where the input is a sequence of tokens (for example, text, audio) and the output generated is another sequence of tokens. ex (machine translation, text-speech, text summariazation)

Blazing Text

supervised model for classifying sentences (not full documents). This uses an optimized word2vec model. Can use either skip-gram, batch skip-gram or continuous bag of word (CBOW) architectures. Instances: CPU can be used for any of the above modes, GPU can be used but not for batch skip-gram Input: Labeled text sentences

Factorization Machines

supervised model used for classification or regression with sparse data. This is often used with recommendation engines or click through rate Input: Must be record-IO Float32 Instances: CPU or GPU (most likely CPU)

k-means

unsupervised classification algorithm that groups points together into k groups (clustering)

Latent Dirichlet Allocation (LDA)

unsupervised topic modeling for text documents without a neural network (similar to neural topic modeling but less of a tendency to overfit). determines N number of topics

k-fold cross validation

used to compare models. create K folds (subsets of the data). train each model on all but one fold (k-1). Evaluate model on last fold of data. Repeat K times until each combination has been tried. Great for testing most generally successful model.

Sagemaker IP Insights

uses statistical modeling and neural networks to capture associations between online resources (for example, online bank accounts) and IPv4 addresses

Word2Vec

word2vec is an algorithm and tool to learn word embeddings (man -> boy) by trying to predict the context of words in a document.

NLP Preprocessing

* Converting to lower case * Stop word removal is the process of removing words that do not add meaning to a sentence * Word tokenization is the process of splitting up a sentence to a number of words so it can be turned into vectors in word2vec

Sagemaker Hyperparameter tuning techniques

1) Grid search 2) Random search (can outperform grid search) 3) Bayesian Optimization (optimization model that searches for best hyperparameters)

F1 Score

2 * precision * recall / (precision + recall)

sagemaker: inference pipelines

2-5 Containers strung together in sagemaker pipelines

Amazon Translate

A neural machine translation service that delivers fast, high-quality, and affordable language translation.

Neural Networks

A set of nodes, organized in layers. Hidden layers are weighted sums of previous layers. To go from one layer to the next, we use an 'Activation Function' which provides the non-linear transform. Works well in problem sets where a linear line does not well divide data.

Accuracy v.s. Precision v.s. Recall

Accuracy: How often did the model predict the right thing? Precision: Percentage of positive identifications were actually correct? (TP / TP + FP) Recall: Percentage of positive records that were classified correctly? (TP / TP + FN)

ReLu

Activation function that converts outputs into a piecewise function, range of (0, positive integer). not a probability distribution. overcomes the vanishing gradient problem (unlike tanH and sigmoid)

Amazon Lex

Amazon Lex is a service for building conversational interfaces into any application using voice and text. Conversational chat bots with both voice and text input. Uses NLU + ASR. Select the language, build the bot, and create the intent (goals of the user). Then attach Lex to your application.

Amazon Transcribe

An automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capability to their applications. Has features to remove PII (Automatic content redaction) and filter out words you dont want (Vocabulary filtering)

multinomal logistic regression

An extension of Binary Logistic Regression -> Multiclass Classification Model for Supervised learning

AUC

Area under the ROC curve (TP v.s. FP rate). Used for measuring the success of a classification model. ROC curve is a graph of the True Positive Rate vs False Positive Rate. AUC can be interpreted as the probability that a random positive value is ranked higher than a random negative value. A perfect model has a AUC = 1.

Help a CNN converge

Batch normalization, increase learning rate, normalize the images

Sagemaker Model Endpoint + API gateway

How to expose a model to the public. Sagemaker model endpoints are private (require AWS perms) to use without an API gateway.

How to update an endpoint with a new model without downtime or change to code.

If autoscaling is turned on: de-register the endpoint as a scalable target. update the endpoint using a new endpoint configuration with latest s3 path, and then re-register the endpoint as scalable target. If autoscaling is not turned on: update the endpoint using a new configuration with the new s3 path.

MICE (Multiple Imputations and Chained Equations)

Imputation technique that uses multi-variate predictive models to infill data. Better than naive approaches

Model Pruning

Removing weights that don't contribute as much to model training. Reduces deep learning model size + inference time. Different from dropout regularization, which removes nodes that don't perform well and reduces overfitting.

FSx for Lustre on Sagemaker

Speed up training + startup times with File based input data by using FSx for Lustre. This way, Sagemaker doesn't need to load all data into memory on EBS volumes.

Scaling preprocessing techniques

Standard Scaler: normalizes data in all columns to scale, shift, and center all columns. Normalizer: Normalizes only one column. Max absolute scaler: scales each column by its max value, but does not shift/venter

Improve Sagemaker Startup Times

Store dataset as protobuf RecordIO format in S3. This enables sagemaker pipe mode, where data is streamed into Sagemaker. Reduces startup time and improves throughput for faster training.

KNN

Supervised classification algorithm that groups points based on K-nearest neighbors

SVM (Support Vector Machines)

Supervised classification algorithm that uses decision boundaries.

Factorization Machines (FMs)

Supervised learning model for classification and regression

TF-IDF

Term frequency (how many times it appears in a document) / Inverse Document Frequency (Log(# of documents that contain the word / Total # of documents) * Any word that is in every document will automatically have a TF-IDF of 0 because log(1) = 0

Softmax

The softmax function is typically used to convert a vector of raw scores into class probabilities at the output layer of a Neural Network used for classification. Output may be (.1, .2, .7) for a 3-classification problem

Semantic Segmentation & Instance Segmentation

This algorithm is used in computer vision to classify each pixel. Example - separate a virtual background from a person. Instance Segmentation takes semantic one step further, by identifying potentially more than one person from the virtual background (i.e. person 1, person 2, etc).

Object2Vec

Turns full objects (sentances, documents, etc), into embeddings and finds the nearest neighbors of a document. This can be used for recommendations. Can use average pooling, CNNs or LTSMs to embed documents Input: Must be tokenized to integers Instances: Start with xlarge CPU or GPU for training. xlarge GPU is recommended for inference

Neural Topic Modeling

Unsupervised clustering of documents into similar topics (clusters) Input: RecordIO-Wrapped-Protobuf or CSV for training. Inference can be text/csv, protobufIO, Application/JSON or application/json lines Hyperparameters: vocab_size, num_topics, other NN hyperparameters Instances: GPU recommended, CPU acceptable

Learning Rate

a value that can range from 0 to 1 and controls how much learning (exploration v.s. exploitation) takes place after each trial. Too large a learning rate, and the model will not converge at the optimal solution (loss v.s. epoch looks like 1/x). Too small a learning rate, and the model will take longer and may never converge on the optimal solution (loss v.s. epoch looks like -x or extreme cases parabola).

Sigmoid

activation function squashes inputs as 0 (as more negative a number gets) or 1 (as more positive a number gets)

Collaborative Filtering

algorithm based on (user, item, rating) tuples. good for identifying patterns/recommendations amongst users, to serendipitously recommend products.

Amazon Elastic Inference

allows you to attach low-cost GPU-powered acceleration to Amazon EC2 and Sagemaker instances or Amazon ECS tasks, to reduce the cost of running deep learning inference by up to 75%. Amazon Elastic Inference supports TensorFlow, Apache MXNet, PyTorch, and ONNX models

t-Distributed Stochastic Neighbor Embedding (t-SNE)

non-linear dimensionality reduction algorithm used for exploring high-dimensional data

collaborative filtering v.s. content-based filtering

collaborative filtering: algorithm that groups customers with similar buying intentions, preferences, and behaviors and serendipitously predicts future purchases content-based filtering: algorithm that recommends products based on the product features of items the customer has interacted with in the past

SageMaker Ground Truth

helps you build highly accurate training datasets for machine learning quickly. Ground Truth Plus: Labels data using expert labeling workforce and pre-labeling ML models - you don't need to set up workflows or manage your own labeling Ground Truth: Build your own workflows and labeling workforce (mechanical turk, own workforce,

AWS Panorama

machine learning device and software development kit (SDK) that allows you to bring computer vision to on-premises cameras to make predictions locally with high accuracy and low latency.


Related study sets

UNIT 1 FEDERAL SECURITIES REGULATIONS: 1.6-1.11 (REVIEW QUESTIONS)

View Set

UWORLD/COMBANK (non-OMM) 1 liner

View Set

MATH 3620- Chapter 19: Social Security and Medicare

View Set