GCP - ML Engineer

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

ai platform training CLI flag for standard distributed training

--scale-tier BASIC_GPU, BASIC_TPU

ai platform training default worker configuration

--scale-tier BASIC single worker node

ai platform training CLI flag for custom machine types?

--scale-tier CUSTOM then add in parameters as flags (--master-machine-type n1-highcpu-16, etc.) or config.yaml

AUTO_CLASS_WEIGHTS

-BQML parameter -used if need to balance the classes

Early Stopping

-Form of regularization -Enact when validation error begins to increase -Indicates that overfitting is beginning

L1 Regularization

-Goal: make unimportant weights exactly 0 -helps decrease sparsity helpful w/ feature selection

L2 Regularization

-Goal: make weights close to 0 (not exactly 0)

Dropout

-Regularization for neural networks -randomly dropout neurons / unit activations for a singular gradient step

Embeddings

-allow for lower dimensionality representation of feature crosses to help w/ sparsity

TFDV: purpose and parts

-analyze data/validate it is correct statisticsgen, schema gen, example validator

what is "placement" for recommendations AI?

-area on website where to locate the recommendation

parameterserver worker strategy

-asynchronous distributed training -some machines are workers and some are parameter servers -workers calculate gradients; parameter server updates the weights and passes that to the workers

Accuracy

-bad for imbalanced class set (TP + TN) ---------- (TP + FP + TN + FN)

recommendations AI - what is rejoining?

-best practice: ensure product catalog is up-to-date and if you are importing catalog while recording events, you will need to rejoin on product ID -events that can't be associated w/ an ID are not used during training

Recommendations AI

-configure to set up A/B testing -integrate with Google Tag Manager to record events (like clicks, etc.) -integrate with Merchant Center to upload product catalog

rolling average

-dataprep preproccesing function -smooths out noise -preferred over daily min/max

how to enable continuous evaluation w/ ai platform prediction?

-establish ground truth as either yourself or use data labeling service -must already have a model version deployed -then you can run a daily evaluation job. this job will compare online prediction results by storing them in BQ and comparing to existing ground truth -you can then analyze evaluation metrics in console

Relu

-example of an activation function -used between hidden layers -allows you to cap negative x-coordinates to be 0 rather than negative y-values

Clipping

-fix for outliers -handles extreme outliers by setting outlier to be = max value ex: if housing data shows a house w/ 500 rooms, instead adjust the 500 = max of dataset (such as 10)

when to use tf.data?

-if dataset can't fit in memory -if you need preprocessing -need access to different hardware/batches

3 ways to record events in Recommendations AI

-javascript pixel -API: eventStores.userEvents.write -google tag manager (creates a trigger that will fire whenever the event occurs)

When to use TPUs?

-large batches -sharded data -large models use tf.data DNN on tf.keras

what is tf.data?

-library for reading TFRecords as datasets -lets you significantly reduce latency by enabling prefetching for letting training happen on the accelerator while CPU does transformations (reduces CPU idle time)

Logistic Regression

-linear classification -probably of something happening -good for low-latency

when to use parameterserver worker strategy? (3 reasons)

-low latency -want to continue if a machine crashes (such as using preemptible machines) -machines all have different performance

storage transfer service

-moves data to GCS (from S3, URL, or other GCS bucket) - data > 1 TB

When to use GPUs?

-need lots of parallelization -lots of math ops

in continuous training, if you have model drift, what should be done?

-need to retrain model, redeploy new model -retrigger whole CI/CD pipeline

what is schemagen?

-part of TFDV -takes raw data and infers schema -this is stored as metadata and used later in pipeline to ensure consistency (such as during tf transform)

what is examplevalidator?

-part of TFDV -validates data/schema to make sure data conforms (such as making sure it is an int, etc.) -also used by tf.transform to look for training/serving skew since it knows previous shape of data

statisticsgen

-part of TFDV -visual report/graphical distribution of data -can detect outliers, anamolies, skews, missing data

AUC ROC

-plots TPR vs. FPR -tells you the probability that a randomly chosen positive example is ranked higher than a randomly chosen negative example -good default

AUC PR

-plots precision vs. recall -use this if you care more about positive than negative class / dataset is imbalanced example: need to detect fraud

what happens during "trainer" phase in ci/cd?

-produces serialized "SavedModel" to be stored in GCS -keras/estimator phase

what is the data labeling service?

-provide dataset, instructions, list of labels -assigns humans to give labels to data -part of continuous evaluation strategy -assigns ground truth to data

how to optimize online prediction?

-scale out with GKE, GAE, CAIP prediction -make each type of prediction microservice

SavedModel

-serialized model artifact -allows for model-agnostic deployment (CPU/TPU/GPU)

TFRecords

-stores data as protobuf instead of bytes -improves readability

central storage distributed training & when to use?

-synchronous -1 machine/worker that is attached to multiple GPUs -each GPU calculates gradients and sends to CPU of machine. CPU updates weights and sends to GPUs to calculate gradients -good for large embeddings that don't fit on single GPU

multi-worker mirror strategy

-synchronous -multiple machines each with multiple GPUs

mirror strategy

-synchronous -one machine attached to multiple GPUs/TPUs -each GPU/TPU has a copy of the model -each machine shares its weights with the other machines -all weights are then aggregated together (usually mean) -requires good connection between GPU/TPUs

what happens if you do feature engineering during "train model"?

-this would mean using tf -this is not helpful if you need to compute averages or any aggregations over multiple inputs

what happens if you do feature engineering during "feature creation" with BQML?

-training/serving skew -you can fix this by adding TRANSFORM clause and putting all SELECT logic inside of it -this bakes it into the prediction graph

what happens if you do feature engineering during "feature creation" phase with beam?

-training/serving skew -you will need to run the same pipeline at prediction time to compute the same aggregations

what happens during transformation phase in ci/cd?

-transform data (such as string -> int, bucketizing, etc.) in dataflow -important to use tf.transform to reduce training/serving skew

3 ways to fix class imbalance

-upsampling (SMOTE) -downsampling -weighted classes (give more attention to minority class)

what happens during model evaluation phase in ci/cd?

-use TFMA in dataflow -can compare two models and see how performance differs -can also slice data by certain metrics (such as comparing dates or features)

phases of tf.transform

1. analysis -doing during training -for numeric might be searching for min/max over whole dataset -for categorical might be searching for all unique values -uses beam 2. transform -done during prediction -scales individual input by min/max for numeric; might be changing to one-hot encoding for categorical -uses tensorflow

how does evaluation job work if using own groundtruth?

1. data labeling service creates an ai platform dataset w/ all of the new rows in BQ since the last run 2. you must have already added in groundtruth labels in the column BEFORE the evaluation job runs (evaluation job will skip any rows w/o groundtruth label) 3. data labeling service will then calculate evaluation metrics

how does evaluation job work if using data labeling service?

1. data labeling service creates an ai platform dataset w/ all of the new rows in BQ since the last run --> both input/output of model 2. data labeling service sends labeling request on this new data to generate groundtruth 3. data labeling service will calculate evaluation metrics for the day before it ran (so parallel evaluation jobs will always sample day before's data to ensure different samples)

steps to train a custom model

1. develop tf model/code 2. create dockerfile with model code 3. build the image 4. upload the image to GCR 5. start training job

Recommendations AI - model types

1. others you may like 2. frequently bought together 3. recommended for you 4. recently viewed each have a default placement & optimization (such as CTR, revenue per order, conversion)

ai platform training steps

1. train locally 2. upload to gcs 3. submit to ai platform training to run on cloud

which products have explainability built-in?

AutoML & AI Platform *look for keyword "trust"

Regularization

Avoids overfitting; helps generalize

if customer can't move data outside of EDW for compliance what should they choose?

BQML

if customer wants model ASAP/cheapest, which should they choose?

BQML

Linear Regression: loss function

RMSE

Recall

TP ----- TP + FN -out of all positive predictions, how many were correct? -if you want to minimize false negatives, then maximize recall

Precision

TP ------- TP + FP -looks to see if all positives were correct -if you want to minimize false positives, then maximize precision

How to fix sparsity?

Use L1 regularization

which performance metric to use if class is balanced and each class is equally important?

accuracy

how to optimize offline prediction?

add more machines

asynchronous distributed training

all workers are independently training over the input data and updating variables asynchronously

synchronous distributed training

all workers train over different slices of input data in sync, and aggregating gradients at each step

what is prefetching?

allows for more efficient use of CPUs w/ an accelerator. specifically, if the CPU is preparing batch 2 of data, lets GPU/TPU simultaneously train batch 1 such that it is not idle time while waiting for batches

main difference between automated ML pipeline and full CI/CD pipelining

automatically deploying the model via Cloud Build triggers vs. manually deploying new version

what happens under-the-hood for ai platform training

bayesian optimization for hyperparameter tuning

which distributed training service to use to optimize wall-time?

centralstorage -> each GPU will compute weights w/o waiting for others

Logistic Regression: loss function

cross entropy/log loss

When to use tf.transform

during preprocessing

ai platform training - cloud job submit CLI

gcloud ai-platform jobs submit training $JOB_NAME / --job-dir $OUTPUT_PATH --runtime-version 1.13 / --module-name trainer.task --package-path trainer / --region $REGION --train-files $TRAIN_DATA / --eval-files $EVAL_DATA --num-epochs 1000 / --learning-rate 0.01

ai platform CLI for creating job with custom model

gcloud ai-platform jobs submit training my-job \ --region $REGION --master-image-uri \ gcr.io/my-project/my-repo:my-image --lr=0.01

ai platform training local CLI

gcloud ai-platform local train --module-name / trainer.task --package-path trainer / --train-files $TRAIN_DATA --eval-files $EVAL_DATA / --job-dir $OUTPUT_DIR

how to send prediction input to ai platform prediction?

gcloud ai-platform predict --model $NAME --version \ $VERSION --json-instances='data.txt' where data.txt is newline-delineated JSON

CLI to use explainability in prediction call?

gcloud beta ai-platform versions create $VERSION \ --model $NAME --explanation-method \ 'integrated-gradients' gcloud beta ai-platform explain --model $NAME \ --version $VERSION --json-instances='data.txt'

when to use normalization?

if the range of values is really large (such as age, income, city population, etc.)

xrai: type of data

images

if you want to do your own hyperparameter tuning in ai platform training how do you do it?

include --config flag & a config.yaml. use trainingInput in yaml and then specify maxTrials, enableEarlyStopping, metric, etc.

if you increase the classification threshold, what will happen to precision?

it will increase b/c false positive rate will decrease

if you increase the classification threshold, what will happen to recall?

it will stay the same or decrease b/c true positives will increase or stay the same

what metric do we want to optimize for spam detection?

minimize FP; optimize precision

differential models: type of model & explainability framework

neural nets can use integrated gradients or xrai

ai platform prediction - batch - frameworks available?

only tf

what is smote?

oversampling the minority group to make classes more balanced

Transfer Appliance

physical device connect from your network to upload to GCS

Which metric is: Did the boy cry wolf too often?

precision

of the things that the system predicted, how correct was it?

precision

Which metric is: Did the boy miss any wolves?

recall

did the system miss anything?

recall

in continuous training, if you find you have data drift, what should be done?

retrain model only

sampled shapley: type of data

tabular

integrated gradients: type of data

tabular, low-resolution images (such as x-rays), text

which frameworks use explainability in ai platform prediction?

tf

ai platform prediction - online - which frameworks available?

tf, xgboost, scikit-learn, etc.

why would you use tf.keras.layers.lambda?

to help with training/serving skew

if you don't know how to code and want to submit ai platform training job, what should you do?

use UI and choose "prebuilt algorithms" ex: linear linear, xgboost, wide and deep, object detection, image classificaiton, etc.

BQ DTS

used for ingesting Google Ad Data to BQ

when to use scaling for normalization?

when range is evenly distributed & you know lower/upper bound i.e. age NOT income

non-differential models: type of model & explainability framework

xgboost, decision trees sampled shapley


Ensembles d'études connexes

Competency 2: Healthcare Classification Systems

View Set

A&P 2 respiration/immunity/blood exam 2

View Set

BLAW Exam 2 Part 2 (chapter 12,13)

View Set

Ch6. Perception and Individual Decision Making

View Set

Dr. Piras-Med Surg II-Cardiac Dysrhythmias

View Set