I - 10 End-to-End Machine Learning with TensorFlow on GCP

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Method to select % of the dataset

"ABS(FARM_FINGERPRINT(CONCAT(CAST(YEAR AS STRING), CAST(month AS STRING)))) AS hashmonth" in query to get a new field Then use: "WHERE MOD(abs(hashmonth), 10) < 8" to get 80% (repeatable) You can then add "AND RAND() < 0.01" to the WHERE clause to get 1/100 of that (not sure if repeatable but seems unlikely)

What are the benefits of using the hashing and modulo operators for creating ML datasets ? 1. It allows you to create datasets in a repeatable manner. 2. It is more computationally efficient than using the rand() function. 3. It provides the best performing split for training and evaluation. 4. None of the above.

1. It allows you to create datasets in a repeatable manner.

What makes a feature 'good'?

1. Should be related to the objective 2. Known at prediction time 3. Should be numeric with meaningful magnitude 4. Have enough examples 5. Bring human insight to the problem

What does the scale-tier represent in CMLE training jobs? 1. The type of hardware you want your model to run on 2. The number of steps your model should scale to 3. The number of distributed machines your model should use

1. The type of hardware you want your model to run on

What is the purpose of task.py when creating a Python package for ML Engine? 1. To parse command-line parameters and to call the model training logic. 2. It defines the task that will be submitted to ML Engine. 3. It contains the hyperparameters that will be used for model training. 4. None of the above.

1. To parse command-line parameters and to call the model training logic.

Three keys to a successful ML model:

1. Training the model on as much data as you can 2. Feature engineering (human insights) 3. Using appropriate model architecture

What are the benefits of the TensorFlow Estimator API?

1. You can train both locally and in a distributed model training environment. 2. It provides a high-level API, simplifying model development. 3. It automatically saves summaries to TensorBoard.

True or False: In ML, you could train using all your data and decide not to hold out a test set and still get a good model

True

Question 2 What are the benefits of using the hashing and modulo operators for creating ML datasets ?

Allows you to create datasets in a repeatable way

What is Google datalab?

Cloud Datalab is a powerful interactive tool created to explore, analyze, transform and visualize data and build machine learning models on Google Cloud Platform. It runs on Google Compute Engine and connects to multiple cloud services easily so you can focus on your data science tasks.

Datalab vs. Data Studio

Cloud Datalab is intended for doing for data science and machine learning. While Data Studio is focused on reports, Datalab is focused on notebooks.

tf.estimator

High level API for distributed training

True or False: Apache Beam can only be executed on a local environment or using Cloud DataFlow.

False

True or False: Cloud DataFlow and ML Engine have servers

False, Cloud DataFlow and ML Engine are both serverless technologies

A useful BigQuery hashing function used to create training, test, and validation datasets

Farm Fingerprint

TensorFlow is a framework for:

Numeric processing

For productionalizing a model, a machine learning framework needs to be _________, _______, and _______.

Repeatable, scalable, and tuned

Describe Cloud ML Engine

The Google Cloud ML Engine is a hosted platform to run machine learning training jobs and predictions at scale. The service treats these two processes (training and predictions) independently. It is possible to use Google Cloud ML Engine just to train a complex model by leveraging the GPU and TPU infrastructure. The outcome from this step — a fully-trained machine learning model — can be hosted in other environments including on-prem infrastructure and public cloud. The service can also be used to deploy a model that is trained in external environments. Cloud ML Engine automates all resource provisioning and monitoring for running the jobs. It can also manage the lifecycle of deployed models and their versions.

Training-Serving Skew

The difference between what a model was trained on and what it is being presented at prediction time

What does the scale-tier represent in CMLE training jobs?

The type of hardware you want your model to run on and the number of distributed machines your model should use

What is the purpose of task.py when creating a Python package for ML Engine?

To parse command-line parameters and to call the model training logic

When do you use TensorFlow, other than for deep learning?

When your datasets won't fit in memory. As your data size increases, batching and distribution become important. Better than sampling (losing data) or using huge hardware.


Ensembles d'études connexes

[Music Theory] chapter 09: connecting intervals in note-to-note counterpoint

View Set

HW#1 Cause of new imperialism (807-814;790-792)

View Set

II Lecture Chapter 21 Short Answer: Shoulder Surgery pp 465

View Set

Nutrition 220 Chapter 1 Study Guide

View Set

Econ Final Multiple choice questions and answers for Chapter 4

View Set

Sociology - Real World - Ch 11: Globalization Issues

View Set