Google Cloud: Certification Learning Path: Professional Machine Learning Engineer

Ace your homework & exams now with Quizwiz!

True

(T/F) One of the goals of tf.Transform is to provide a TensorFlow graph for preprocessing that can be incorporated into the serving graph (and, optionally, the training graph).

feature

A value that is passed as an input to a model

Vertex Vizier

Black-box optimization service?

Supervised Model

Machine learning model that has labels

tf.losses, tf.metrics, tf.optimizers

The most useful components when building custom NN's

True

(T/F) Batch prediction is useful for making several prediction requests at the same time and is optimized to handle a high volume of instances in a job

False

(T/F) Hyperparameter tuning happens before model training and is the task responsible for assigning initial weights to the variables (or parameters) which allow the model to find patterns on the data.

Recall

A hospital uses Google's machine learning technology to help pre-diagnose cancer by feeding historical patient medical data to the model. The goal is to identify as many potential cases as possible. Which metric should the model focus on?

Use the last few digits of a hash function on the field that you're using to split or bucketize your data

Allows you to create repeatable samples of your data.

Accounting and summarizing, anomaly detection, statistical analysis and clustering

Components of EDA

Cloud Dataflow is the API for data pipeline building in java or python and Apache Beam is the implementation and execution framework.

Describe the relationship between Apache Beam and Cloud Dataflow?

A data source constructs a Dataset from data stored in memory or in one or more files and a data transformation constructs a dataset from one or more tf.data.Dataset objects.

Distinct way to create a dataset

task.py, model.py

Fill in the blanks. When sending training jobs to Vertex AI, it is common to split most of the logic into a _________ and a ___________ file.

By using the outputs of a component as an input of another component

How can you define the pipelines workflow as a graph?

Adding dropout layers to NN

How does regularization help build generalizable models

The arguments of the Python function we are wrapping into a Kubeflow task

In a lightweight Python component the run parameters are taken from....

Labels

In a supervised ML model, what provides historical data that can be used to predict future data

Tailoring the results of a search engine to a specific user.

In which of the following use cases is it recommended to go with a contextual bandit system?

When you have optimization or control problems where simulation trial and error is possible.

In which scenarios is reinforcement learning preferable over supervised learning?

A task to be performed and a docker container to be executed

Kubeflow tasks are organized into a dependency graph where each node represents

Linear regression metrics

MAE, MAPE, RMSE, RMSLE and R 2 are all available as test examples in the Evaluate section of Vertex AI and are common examples of what type of metric?

Data preparation, model training, model serving

Machine learning workflow

Model drift, model performance, model outliers and data quality

Moving from experimentation to production requires packaging, deploying and monitoring your model - which can give you confidence that your model is making useful predictions in production. Monitoring measures key model performance metrics and includes:

Explainable AI

Offers feature attributions to provide insights into why models generate predictions

To provide history and versions of your ML model

One major benefit of the Lineage tracking feature of AI Platform pipelines is:

translation API

Pre-built ML API used for language translations

Ingestion and Process

Pub/Sub, Dataflow, Dataproc and Cloud Data fusion align to which stage of the data-to-AI workflow?

Data parallelism in distributed training

Run the same model & computation on every device, but train each of them using different training samples

Make a huge difference in model quality

Setting hyperparameters to their optimal values for a given data set can __________

Regression

To predict the continuous value of our label what algorithm should be used?

Large language models

Transformers and BERT are examples of _________. Large in _______________ refers to both huge training datasets and many parameters. ___________ can be pre-trained for general purpose and then fine-tuned for specific tasks

tf.Keras.Layers.TextVectorization

Turns raw strings into an encoded representation that can be read by an embedding layer or dense layer

Keras Functional API

Unlike the Keras Sequential API, we have to provide the shape of the input to the model for this API

Grid Search

Useful, if you want to specify a quantity of trials that is greater than the number of points in the feasible space?

A Google Cloud Storage bucket that acts as an input for both AutoML and custom training jobs

Vertex AI has a unified data reparation tool that supports image, tabular, text and video content. Where are uploaded datasets stored in Vertex AI?

AutoML, which is a no-code solution, and custom training, which is a code-based solution

Vertex AI provides two solutions to build an NLP project. Which of the following is correct about these two solutions?

FARM_FINGERPRINT, an opensource hashing algorithm that is implemented in BQ and SQL

What allows you to split the dataset based upon a field in your data?

Measure the cosine similarity between the two items in an embedding space. Compute the inner product between the two items in an embedding space. Count how many features the two items have in common.

What are some potential techniques to determine how similar two items are? There could be more than one answer.

1. Entity extraction 2. text classification 3. sentiment analysis

What are the NLP tasks solved by AutoML?

Compared to basic vectorization, which converts text to sparse vectors, word embedding converts text to dense vectors. Compared to basic vectorization, which converts text to vectors without semantic meaning, word embeddings represent words in a vector space where the distance between them indicates semantic similarity and difference. You can use pre-trained word-embedding to represent text.

What are the benefits of using word embedding (such as word2vec) compared to basic vectorization (such as one-hot encoding) when you convert text to vectors?

An RNN uses a mechanism called hidden state to carry the previous information to the next learning iteration.

What is the key feature to enable a "memory" of an RNN (recurrent neural network)?

Video object tracking model

Which AutoML model type analyzes your video data and returns a list of shots and segments where objects are detected

good value functions

You have a movie recommender system. The reward is the count of clicks. You want to train an agent to win a race. The reward is the negative value of the total time taken to run the race.

The knowledge-based component

You want to create a hybrid recommendation system to suggest music for new users on your music streaming app that just launched. New users are asked to rate a few bands they like. You have reliable data for artist name, song name, album name, etc. Each song is labeled for genre at a coarse level (rock, pop, etc.). Which component of your recommendation system will likely perform the best?

strides

_______ with a value greater than 1 will reduce the shape produced by the convolutional layer, ______ are the size of the step by which the filter slides across the input image

Analytics Hub

_________- efficiently and securely exchanges data analytics assets across organizations to address challenges of data reliability and cost. You can create and access a curated library of internal and external assets, including unique datasets like Google Trends, backed by the power of BigQuery. There are three roles in _____________ - A Data Publisher, Exchange Administrator, and a Data Subscriber.

convolution

__________ is the process of sliding a kernel across an image

Kubeflow

can be used out-of-the box to operationalize xgboost model

Both updating network weights iteratively based on training data by diagonal rescaling of the gradients

how does Adam help in compiling the Keras model

Vision API

Assigns labels to images and quickly classifies them into millions of predefined categories. It detects objects and faces reads printed and handwritten text, and builds valuable metadata into the image catalog

performance can be measured as a function of adjustable parameters.

Black box optimization algorithms find the best operating parameters for any system whose ______________?

Preprocessing function

Fill in the blank: The ______________ _______________ is the most important concept of tf.Transform. The ______________ _______________ is a logical description of a transformation of the dataset. The ______________ _______________ accepts and returns a dictionary of tensors, where a tensor means Tensor or 2D SparseTensor.

adds the sum of the squared parameter weights term to the loss function

The L2 regularization provides:

Cross entropy

loss function used for classification problems

Fast experimentation, accelerated deployment and simplified model management

What does Vertex AI offer to achieve your ML goals?

Provides a TensorFlow Graph for preprocessing

What does tf.Transform do during the training and serving phase?

How much each feature impacts the model, expressed as a percentage

What does the Feature importance attribution in Vertex AI display?

model.predict()

What function can be used for a model to do prediction

II, III, I, IV

What is the order of steps to push a trained model to AI Platform for serving? I - Run the command gcloud ai-platform versions create {model_version} to create a version for the model. II - Train and save the model. III - Run the command gcloud ai-platform models create to create a model object. IV - Run the command gcloud ai-platform predict to get predictions.

Defines the number of epochs

What is the significance of the fit method while training a Keras model?

Model Training

Which stage of the ML workflow includes model evaluation

True

(T/F) In TensorFlow Playground, orange and blue are used throughout the visualization in slightly different ways, but in general orange shows negative values while blue shows positive values.

True

(T/F) Larger batch size require smaller learning rates

True

(T/F) The way you deploy a TensorFlow model is different from how you deploy a PyTorch model, and even TensorFlow models might differ based on whether they were created using AutoML or by means of code. True or False: In the unified set of APIs that Vertex AI provides, you can treat all these models in the same way.

True

(T/F) True or False: In TensorFlow Playground, the data points (represented by small circles) are initially colored orange or blue, which correspond to zero and negative one

True

(T/F) Use BQ to process tabular data and Dataflow to process unstructured data

False

(T/F) When building a content-based recommender system, it's not important to express both your items and users using the same embedding space (that is. the same dimensions and features).

Database and storage

Cloud storage, coud bigtable, cloud sql, cloud spanner and firestone represent which type of services?

OCR analyzes the patterns of light and dark that make up the letters and numbers to turn the scanned image into text.

How does OCR (optical character recognition) transform images into an electronic form?

Zero

How many learnable parameters does a pooling layer have

Classification model

What model would you use if your problem required a discrete number of values or classes?

Online prediction

What prediction method do you use for synchronous or real-time prediction that quickly returns a prediction but only accepts one prediction request per API call

Entity

Which of the following is an instance of an entity type

kfp.dsl.package

Which package is used to define and interact with pipelines and components?

Model Prototyping

Which process covers algorithm selection, model training, hyperparameter tuning and model evaluation in the experimentation and prototyping activity

Experimentation and training operationalization

Which two activities are involved in ML development

Container Logging

Which type of logging should be enabled in the online prediction that logs the stderr and stdout streams from your prediction nodes to cloud logging and can be useful for debugging?

Static training

Which type of training do you use if your data set doesn't change over time?

Avoids overfitting

Why is regularization important in logistic regression?

Droupout

__________ is a technique that is used to prevent a model from overfitting

Batch Normalization

_______________ applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1

ML.FEATURE_CROSS

________________ generates a STRUCT feature with all combinations of crossed categorical features except for 1-degree items.

preprocessing layers

image data augmentation, image preprocessing, numerical features preprocessing

A large learning rate value may result in the model learning a sub-optimal set of weights too fast or an unstable training process

The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. Choosing the learning rate is challenging. What can happen if the value is too large?

Training may take a long time

Numpy array of predictions

The predict function in the tf.keras API returns what?

Use the AI platform training pre-built Kubeflow component

The simplest way to launch a training task on AI platform from a Kubeflow task is

CNN

Which of the following networks is used in identifying faces, objects, and traffic signs?

Unreliable info, incomplete data, duplicated data

Features of low data quality

K-Means Clustering

Where labels are not available, for example where customer segmentation is required, which of the following BigQuery supported models is useful?

Vertex AI

Which google cloud product lets users create, deploy and manage ML models in one unified platform?

Geospatial analysis

Which BQ feature leverages geography data types and standard SQL geography functions to analyze a data set

Compute

Compute engine, Google Kubernetes Engine, App Engine and Cloud Functions represent which type of services

Evaluating perfomance in ML, understanding inclusion and how to introduce inclusion across different subgroups within your data

Confusion matrix helps ...

Managed Dataset in VertexAI

Data loaded into Vertex AI - whether it be from Google Cloud Storage orBigQuery. This means, for example, that it can be linked to a model.

Use non-saturating, nonlinear activation functions such as ReLUs.

During the training process, each additional layer in your network can successively reduce signal vs. noise. How can we fix this?

Streaming (Pub/Sub), structured batch (BigQuery), unstructured batch (Cloud Storage)

Match the three types of data ingest with an appropriate source of training data.

Create a dataset and upload data, train an ML model, Deploy trained model to an endpoint for serving predictions

Stages of the ML workflow that can be managed with Vertex AI

The number of times the user hiked that trail

Suppose you want to build a collaborative filter to suggest new hiking trails for users. The problem is you don't have any good explicit user ratings for trails. What feature might be useful for creating an implicit measure of a user's rating for a trail instead?

ML.EVALUATE

The _____________ function can be used with linear regression, logistic regression, k-means, matrix factorization, and ARIMA-based time series models, The _____________ function evaluates the predicted values against the actual data, You can use the _____________ function to evaluate model metrics.

First, upload data to Google Cloud Storage. Next, move code into a trainer Python package. Then submit your training job with gcloud to train on Vertex AI. Feedback: This answer is correct.

To make your code compatible with Vertex AI, there are three basic steps that must be completed in a specific order. Choose the answer that best describes those steps.

Runner

To run a pipeline you need something called a ________________.

GANS (Generative Adversarial Networks)

Typically, ML practitioners train models using different architectures, input data sets, hyperparameters, and hardware. What architectural type would you use for cyber-security, pattern recognition, self-driving cars, and reinforced learning?

Ask the user for some basic preferences. Rely on content-based method instead for new users

What are some ways you can address the cold-start problem that can occur for new users of a collaborative filter recommendation system? There could be more than one answer.

Feature store, data catalog, dataplex

What are the parts of google's enterprise data management and governance tool?

False Positive

when the label says something doesn't exist but the model says it exists

One Hot Encoding

process by which categorical variables are converted into a form that could be provided to neural networks to do a better job in prediction.

Endpoints

promises to improve privacy and reduce latency for online prediction tasks by eliminating the need for data to go through any public networks before making it back into VPCs.

Feature Cross

synthetic feature formed by multiplying (crossing) two or more features. Crossing combinations of features can provide predictive abilities beyond what those features can provide individually

the min, median and max values for each feature

what does the aggregation values contain in any feature

tf.feature_column.bucketized_column

what function do you use to discretize floating point values into a smaller number of categorical bins

same padding and valid padding

what kind of padding methods are available in Keras?

True

(T/F) Different problems in the same domain may need different features

True

(T/F) In TensorFlow Playground, in the output layer, the dots are colored orange or blue depending on their original values. The background color shows what the network is predicting for a particular area. The intensity of the color shows how confident that prediction is.

True

(T/F) In the featurestore, the timestamps are an attribute of the feature values, not a separate resource type

True

(T/F) MLOps, besides testing and validating code and components, also tests and validates data, data schemas, and models.

False

(T/F) Non-linearity helps in training your model at a much faster rate and with more accuracy without the loss of your important information?

True

(T/F) One-hot encoding encodes the word to a vector where one corresponds to its position in the vocabulary and zeros to the rest.

True, Anything in Map or FlatMap can be parallelized by the Beam execution framework.

(T/F) The Filter method can be carried out in a parallel and autoscaled by the execution framework?

Numeric with meaningful magnitude, it should be known at prediction time, should be related to the objective

A good feature has what characteristics

Proportional to k, the number of dimensions in your embedding space. Proportional to the number of users

ALS and WALS create embedding tables for both users and items. Because these are held in memory, it's important to plan for their size. How big would you expect the embedding table for the users to be?

Region, display-name, worker-pool-spec

Fill in the blanks. You can use either pre-built containers or custom containers to run training jobs. Both containers require you specify settings that Vertex AI needs to run your training code, including __________, ____________, and ________.

Hidden, weights, positive, output, negative

Fill in the blanks: In the ____ layers, the lines are colored by the _____ of the connections between neurons. Blue shows a _____ weight, which means the network is using that ____ of the neuron as given. An orange line shows that the network is assigning a _____ weight.

kernel

Filter that is used to extract the features from the images

Drift detection

For which, the baseline is the statistical distribution of the features values seen in production in the recent past?

Training operationalization

If the model needs to be repeatedly retrained in the future, an automated training pipeline is also developed. Which task do we use for this?

Continuous classification

In addition to CI/CD practiced by DevOps teams, MLOps introduces:

after turning your raw data into a useful feature vector

In what form can raw data b used inside ML models?

III, II, I, VI, V, IV

In what order are the following phases executed in a machine learning project? I - Selection of ML algorithm II - Data Exploration III - Definition of the business use case IV - Model monitoring V - Model operationalization VI - Model Development

Feature Cross

It is a process of combining features into a single feature, enables a model to learn separate weights for each combination of features.

Multiple inputs and outputs and models with shared layers.

The Keras Function API can be characterized by having

0.0 and 1.0

The learning rate is a configurable hyperparameter used in the training of neural networks that has a small positive value, often in the range between _______

A standard LSTM cell includes three gates: the forget gate to forget irrelevant information, the input gate to remember relevant information, and the update gate to update new information.

What are the major gates in a standard LSTM (long short-term memory) cell?

Data prep, model training, model serving

What are the major stages of an end-to-end workflow to build an NLP project with vertex AI

May lead to performance issues like insufficient computing power, it will increase the input size with longer training time for an ML model

What are the possible consequences for an ML model being trained with high resolution photos with high color depth?

Intent (the topic), entity (the details), and context (the flow of the conversation).

What are the three major components that the Dialogflow API helps to identify in a conversation?

Batch serving and online serving

What are the two methods feature store offers for serving features?

kfp.v2.compiler.Compiler

What can you use to compile a pipeline?

Vertex AI python Client

What can you use to create a pipeline run on Vertex AI Pipelines?

It returns the maximum value out of all the input data values passed to a kernel

What does the max-pooling operation do in a CNN

Apache and TF

What is TensorFlow Transform a hybrid of?

The same code you use to preprocess features in training and evaluation can also be used in serving

What is one key advantage of preprocessing your features using Apache Beam?

GRU(units)

What is the coding in Keras to build the hidden layer of a GRU (gated recurrent unit) model?

Problem definition > Data selection > Data exploration > Feature engineering > Model prototyping > Model validation

What is the correct process that data scientists use to develop the models on experimentation platform?

CBOW uses surrounding words to predict a center word, whereas skip-gram uses a center word to predict surrounding words.

What is the difference between continuous bag-of-words (CBOW) and skip-gram, the two primary techniques of word2vec?

Each data source file must not be larger than 10 GB. You can include multiple files, up to a maximum amount of 100 GB.

What is the maximum size of a CSV during batch prediction

Sequence-to-sequence problems such as machine translation where you translate sentences to another language

What is the problem that an encoder-decoder mainly solves?

A large number of parameters comes from the dense layers at the end, and the convolutional layers contain far fewer parameters

What is the proportion of the number of parameters in the entire network while computing

Connectors allow you to output the results of a pipeline to a specific data sink like Bigtable, Google Cloud Storage, flat file, BigQuery, and more.

What is the purpose of a Cloud Dataflow connector? .apply(TextIO.write().to("gs://..."));

To ensure that the models are good before moving them into a production/staging env

What is the responsibility of model evaluation and validation components

AutoML

What method do you use to create and train a model with minimal technical effort to quickly prototype models and explore new datasets before investing in development?

What percent of system code does the ML model account for?

You should provide the BigQuery Data Editor role to the Vertex AI service account in that project.

What should be done if the source table is in a different project?

negative transfer learning

When knowledge is transferred from a less related source, the target performance might be degraded

init_.py

When you package up a TensorFlow model as a Python Package, what statement should every Python module contain in every folder?

Transformation

When you use the data to train a model, Vertex AI examines the source data type and feature values and infers how it will use that feature in model training. This is called the ________________for that feature

Feature Registry

Where are the features registered?

Padding

Which CNN model parameter helps to maintain the same size across the input and the output of the convolutional layer

MOD(ABS(FARM_FINGERPRINT(field)),10) < 7

Which command allows you to split your dataset to get 70% of it for training in a repeatable fashion?

Dataflow

Which data cloud processing option can be used for transforming large unstructured data in Google Cloud?

Pixel randomization

Which factor does not affect the accuracy of the deep neural network?

task.py

Which file is the entry point to your code that vertex AI will start and contains details such as how to parse command line arguments and where to write model outputs

Ability to scale to a large dataset, find good features, it should be able to preprocess with vertex AI

Which of the following are the requirements to build an effective machine learning model?

Feature ingestion

Which of the following is the process of importing feature values computed by your feature engineering jobs into a featurestore?

F1 Score

Which of the following metrics can be used to find a suitable balance between precision and recall in a model?

REINFORCE, proximal policy optimization (PPO), Deep deterministic policy gradient (DDPG)

You would like to train an agent to drive a car. The action space consists of the following variables: the acceleration (between 0 and 300), the angular degree of turn or tilt (between 0 and 180 degrees), and the direction (either forward or reverse). Select the three algorithms which are appropriate. Credit is given for selecting the correct three.

hashing

______ layer is not trainable

sequential model

__________ is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor

Loss function

__________ measures how accurate the model is during training

Embedding

___________ is a weighted sum of the feature crossed values, ________ is a handy adapter that allows a network to incorporate spores or categorical data, the number of _______ is the hyperparameter to your ML model

Online serving

____________ is for low-latency data retrieval of small batches of data for real time processing

Preprocessing

____________ suppresses unwanted distortions and enhances the required features that are essential for the application

ML.BUCKETIZE

________________ is a pre-processing function that creates buckets by returning a STRING as the bucket name after numerical_expression is split into buckets by array_split_points, : It bucketizes a continuous numerical feature into a string feature with bucket names as the value

Velocity

refers to how quickly data is generated and how quickly that data moves

Volume

refers to the amount of data that exists

Veracity

refers to the quality and accuracy of data

Data Augmentation

set of techniques to artificially increase the amount of data by generating new data points from existing data

Choose an existing processor created for a specialized task, choose an existing processor created for general purposes, create a custom processor and build it on your own

what are the options to create a processor for document AI

Cleaning tools, monitoring tools

Categories of data quality tools

Confusion matrix

One of the ket tools to help in understanding inclusion and how to introduce inclusion across different kinds of groups across your data is by understanding _______.

XGBoost

For classification or regression problems with decision trees, which of the following models is most relevant?

Prepare training data in BigQuery, train a recommendation system with BigQuery ML, use the predicted recommendations in production

3 key steps for creating a recommendation system with BQ ML

Avoid training server skew, avoid target leakage, provide a time signal

What are the best practices for data prep?

It reduces the time it takes to develop trained models and assess their performance

What is the main benefit of using an automated ML workflow?

Precision

A farm uses Google's machine learning technology to detect defective apples in their crop, such as those that are irregular in size or have scratches. The goal is to identify only the apples that are actually bad so that no good apples are wasted. Which metric should the model focus on?

Performance metrics are easier to understand and are directly connected to business modules

benefits of performance metrics over loss functions

Feature columns

describes how the model should use raw input data from your features dictionary

Variety

refers to diversity of data types

Recall

the faction of retrieved instances among all relevant instances

Create ML model inside BQ

Data has been loaded into BQ, and the features have been selected and preprocessed. What should happen next when you use BQML to develop a machine learning model?

Facets

Datasets can contain hundreds of millions of data points, each consisting of hundreds (or even thousands) of features, making it nearly impossible to understand an entire dataset in an intuitive fashion. The key here is to utilize visualizations that help unlock nuances and insights in large datasets. Which tool would be most appropriate?

Custom training

Select the correct word below to fill in the blank: Vertex AI is flexible. You choose your training method. _____________ lets you create a training application optimized for your targeted outcome. You have complete control over training application functionality; you can target any objective, use any algorithm, develop your own loss functions or metrics, or do any other customization.

Speech API

The most efficient way to transcribe speech

precision

the fraction of relevant instances among all retrieved instances

Machine Learning

AutoML, Vertex AI Workbench and TF align to which stage of the data-to-AI workflow?

BQ manages the underlying structure

BQ is a fully managed data warehouse, what does fully managed refer to?

Logistic regression

BigQuery supported classification models is most relevant for predicting binary results, such as True/False?

Directed Acyclic Graph (DAG)

How does TensorFlow represent numeric components?

when loss metrics start to increase

How to decide when to stop training a model

80-10-10

Default setting in AutoML for the data split in model evaluation

Pre-Built APIs

You work for a video production company and want to use machine learning to categorize event footage, but don't want to train your own ML model. Which option can help you get started?

AutoML

Your company has a lot of data, and you want to train your own machine model to see what insights ML can provide. Due to resource constraints, you require a codeless solution. Which option is best?

For small datasets, train the model within the notebook instance

Your dataset is considered small, less than 5,000 rows and around 10MB. You are not using AutoML but a Jupyter Notebook instance. Which of the following is a Best Practice for Training a model with a small dataset?

True

(T/F) TensorFlow is a scalable and multi platform programming interface for implementing and running machine learning algorithms, including convenience wrappers for deep learning

Volume

Due to several data types and sources, big data often has many data dimensions. This can introduce data inconsistencies and uncertainties. Which type of challenge might this present to data engineers?

Univariate, Bivariate

EDA is majorly performed using these methods

Big Query ML

For a user who can use SQL, has little Machine Learning experience and wants a 'Low-Code' solution, which Machine Learning framework should they use?

In Cloud Storage

The data used to train a model can originate from any number of systems, for example, logs from an online service system, images from a local device, or documents scraped from the web. Which of the following is a Best Practice for Preparing and Storing unstructured data such as images, audio, and video?

Tabular

If a dataset is presented in a Comma Separated Values (CSV) file, which is the correct data type to choose in Vertex AI?

Data preparation

Which stage of the ML workflow includes feature engineering?

There are the human biases that exist in data because data found in "the world" has existing biases with regard to properties like gender, race, and sexual orientation. For example, there may be reporting bias by our subjects because they only choose to reveal certain aspects about themselves or their opinions. We can also run into human biases which arise as part of our data collection and labeling procedures.

Human biases lead to bias in machine learning models. Unconscious biases exist in our data and exist in two forms. What are the two forms of unconscious biases in data?

Regression/Classification

If the business case is to predict fraud detection, which is the correct objective to choose in vertex AI?

Framing the problem

In ML development which phase identifies your use case?

Both Labeled and Unlabeled data

Refers to the type of data used in ML models

1. Ingest streaming results, 2. Process data 3. Visualize results

Streaming data workflow steps

More complex ways of connecting layers, cambrian explosion of computing power to train, automatic feature extraction

What differentiates deep learning networks in multilayered networks?

tf.data.Dataset

Which API is used to build performant complex input pipelines from simple, re-usable pieces that will feed your models training or evaluation loosp

TPU (Tensor Processing Units)

Which Google hardware innovation tailors architecture to meet the computation needs on a domain such as the matrix multiplication in ML

Vertex AI Pipelines

Which Vertex AI tool automates, monitors, and governs machine learning systems by orchestrating the workflow in a serverless manner?

Workbench

Which VertexAI service lets you access data, process data in a Dataproc cluster, train a model, share results and more all without leaving the JupyterLab interface?

Custom Training

Which code based solution offered with Vertex AI gives Data Scientists full control over the development env and process?

Archive storage

Which data storage class is best for storing data that needs to be accessed less than once a year, such as online backups and disaster recovery?

Pub/Sub

Which google cloud product is distributed messaging service that is designed to ingest messages from multiple device streams such as gaming events, IoT devices and application streams?

Dataflow

Which google cloud products acts as an execution engine to process and implement data processing pipelines?

Batch Load

Which pattern describes source data that is moved into BQ table in a single operation?

Supervised learning, logistic regression

You want to use ML to identify whether an email is spam. Which should you use?

Unsupervised learning, cluster analysis

You want to use machine learning to group random photos into similar groups what do you use?

Value

refers to the value that big data can provide and related directly to what organizations can do with that collected data

Observing how well a model performs against a new dataset that it hasn't seen before.

What is the best way to assess the quality of a model

mean squared error as their loss function

What is the most essential metric a regression model uses?

They can be both reshaped and sliced

What operations can be performed on tensors

Storage and Analytics

What two services does BQ provide?

Pre-trained models

What would you use to replace user input by ML?

Velocity

When you build scalable and reliable pipelines, data often needs to be processed in near-real time, as soon as it reaches the system which type of challenge might this present to data engineers.

BQ ML

You work for a global hotel chain that has recently loaded some guest data into BigQuery. You have experience writing SQL and want to leverage machine learning to help predict guest trends for the next few months. Which option is best?

Categorical features preprocessing

tf.keras.layers.CategoryEncoding, tf.keras.layers.Hashing, tf.keras.layers.IntegerLookup

Check for missing data and other mistakes, gain maximum insight into the data set and its underlying structure, uncover a parsimonious model, one which explains the data with a minimum number of predictor variables

what are the objectives of EDA

See all study sets

Google Cloud: Certification Learning Path: Professional Machine Learning Engineer

Related study sets

[Introduction To Psychology - PSY111] WileyPlus Ch.7 Matching Quiz

310 Professional Nursing Chapter 7: Legal Dimensions of Nursing Practice

Mental Health 3

Project Management Final Review of chapter quizzes

Chapter 47. Total Art: Wagner and German Romantic Opera; Listening Guide 38: Wagner: Die Walküre, Act III, Opening and Finale

HR Chapter 14 Quiz Questions

Geology Lab Midterm

Chapter 6 accounting

Engel v. Vitale (1962)

Chapter 5 Fundamentals of Nursing

MGT. Chp 5

Psych- learnsmart chapter 10: Motivation and Emotion

osteoarthritis

Test #3

ap psych ch 4 test

MKT 499 Exam One

Macroeconomics

Study Guide Test 2 NUR 139

Chapter 21, 22 & 23: Respiratory

Leadership change mid term