Google Cloud: Certification Learning Path: Professional Machine Learning Engineer
True
(T/F) One of the goals of tf.Transform is to provide a TensorFlow graph for preprocessing that can be incorporated into the serving graph (and, optionally, the training graph).
feature
A value that is passed as an input to a model
Vertex Vizier
Black-box optimization service?
Supervised Model
Machine learning model that has labels
tf.losses, tf.metrics, tf.optimizers
The most useful components when building custom NN's
True
(T/F) Batch prediction is useful for making several prediction requests at the same time and is optimized to handle a high volume of instances in a job
False
(T/F) Hyperparameter tuning happens before model training and is the task responsible for assigning initial weights to the variables (or parameters) which allow the model to find patterns on the data.
Recall
A hospital uses Google's machine learning technology to help pre-diagnose cancer by feeding historical patient medical data to the model. The goal is to identify as many potential cases as possible. Which metric should the model focus on?
Use the last few digits of a hash function on the field that you're using to split or bucketize your data
Allows you to create repeatable samples of your data.
Accounting and summarizing, anomaly detection, statistical analysis and clustering
Components of EDA
Cloud Dataflow is the API for data pipeline building in java or python and Apache Beam is the implementation and execution framework.
Describe the relationship between Apache Beam and Cloud Dataflow?
A data source constructs a Dataset from data stored in memory or in one or more files and a data transformation constructs a dataset from one or more tf.data.Dataset objects.
Distinct way to create a dataset
task.py, model.py
Fill in the blanks. When sending training jobs to Vertex AI, it is common to split most of the logic into a _________ and a ___________ file.
By using the outputs of a component as an input of another component
How can you define the pipelines workflow as a graph?
Adding dropout layers to NN
How does regularization help build generalizable models
The arguments of the Python function we are wrapping into a Kubeflow task
In a lightweight Python component the run parameters are taken from....
Labels
In a supervised ML model, what provides historical data that can be used to predict future data
Tailoring the results of a search engine to a specific user.
In which of the following use cases is it recommended to go with a contextual bandit system?
When you have optimization or control problems where simulation trial and error is possible.
In which scenarios is reinforcement learning preferable over supervised learning?
A task to be performed and a docker container to be executed
Kubeflow tasks are organized into a dependency graph where each node represents
Linear regression metrics
MAE, MAPE, RMSE, RMSLE and R 2 are all available as test examples in the Evaluate section of Vertex AI and are common examples of what type of metric?
Data preparation, model training, model serving
Machine learning workflow
Model drift, model performance, model outliers and data quality
Moving from experimentation to production requires packaging, deploying and monitoring your model - which can give you confidence that your model is making useful predictions in production. Monitoring measures key model performance metrics and includes:
Explainable AI
Offers feature attributions to provide insights into why models generate predictions
To provide history and versions of your ML model
One major benefit of the Lineage tracking feature of AI Platform pipelines is:
translation API
Pre-built ML API used for language translations
Ingestion and Process
Pub/Sub, Dataflow, Dataproc and Cloud Data fusion align to which stage of the data-to-AI workflow?
Data parallelism in distributed training
Run the same model & computation on every device, but train each of them using different training samples
Make a huge difference in model quality
Setting hyperparameters to their optimal values for a given data set can __________
Regression
To predict the continuous value of our label what algorithm should be used?
Large language models
Transformers and BERT are examples of _________. Large in _______________ refers to both huge training datasets and many parameters. ___________ can be pre-trained for general purpose and then fine-tuned for specific tasks
tf.Keras.Layers.TextVectorization
Turns raw strings into an encoded representation that can be read by an embedding layer or dense layer
Keras Functional API
Unlike the Keras Sequential API, we have to provide the shape of the input to the model for this API
Grid Search
Useful, if you want to specify a quantity of trials that is greater than the number of points in the feasible space?
A Google Cloud Storage bucket that acts as an input for both AutoML and custom training jobs
Vertex AI has a unified data reparation tool that supports image, tabular, text and video content. Where are uploaded datasets stored in Vertex AI?
AutoML, which is a no-code solution, and custom training, which is a code-based solution
Vertex AI provides two solutions to build an NLP project. Which of the following is correct about these two solutions?
FARM_FINGERPRINT, an opensource hashing algorithm that is implemented in BQ and SQL
What allows you to split the dataset based upon a field in your data?
Measure the cosine similarity between the two items in an embedding space. Compute the inner product between the two items in an embedding space. Count how many features the two items have in common.
What are some potential techniques to determine how similar two items are? There could be more than one answer.
1. Entity extraction 2. text classification 3. sentiment analysis
What are the NLP tasks solved by AutoML?
Compared to basic vectorization, which converts text to sparse vectors, word embedding converts text to dense vectors. Compared to basic vectorization, which converts text to vectors without semantic meaning, word embeddings represent words in a vector space where the distance between them indicates semantic similarity and difference. You can use pre-trained word-embedding to represent text.
What are the benefits of using word embedding (such as word2vec) compared to basic vectorization (such as one-hot encoding) when you convert text to vectors?
An RNN uses a mechanism called hidden state to carry the previous information to the next learning iteration.
What is the key feature to enable a "memory" of an RNN (recurrent neural network)?
Video object tracking model
Which AutoML model type analyzes your video data and returns a list of shots and segments where objects are detected
good value functions
You have a movie recommender system. The reward is the count of clicks. You want to train an agent to win a race. The reward is the negative value of the total time taken to run the race.
The knowledge-based component
You want to create a hybrid recommendation system to suggest music for new users on your music streaming app that just launched. New users are asked to rate a few bands they like. You have reliable data for artist name, song name, album name, etc. Each song is labeled for genre at a coarse level (rock, pop, etc.). Which component of your recommendation system will likely perform the best?
strides
_______ with a value greater than 1 will reduce the shape produced by the convolutional layer, ______ are the size of the step by which the filter slides across the input image
Analytics Hub
_________- efficiently and securely exchanges data analytics assets across organizations to address challenges of data reliability and cost. You can create and access a curated library of internal and external assets, including unique datasets like Google Trends, backed by the power of BigQuery. There are three roles in _____________ - A Data Publisher, Exchange Administrator, and a Data Subscriber.
convolution
__________ is the process of sliding a kernel across an image
Kubeflow
can be used out-of-the box to operationalize xgboost model
Both updating network weights iteratively based on training data by diagonal rescaling of the gradients
how does Adam help in compiling the Keras model
Vision API
Assigns labels to images and quickly classifies them into millions of predefined categories. It detects objects and faces reads printed and handwritten text, and builds valuable metadata into the image catalog
performance can be measured as a function of adjustable parameters.
Black box optimization algorithms find the best operating parameters for any system whose ______________?
Preprocessing function
Fill in the blank: The ______________ _______________ is the most important concept of tf.Transform. The ______________ _______________ is a logical description of a transformation of the dataset. The ______________ _______________ accepts and returns a dictionary of tensors, where a tensor means Tensor or 2D SparseTensor.
adds the sum of the squared parameter weights term to the loss function
The L2 regularization provides:
Cross entropy
loss function used for classification problems
Fast experimentation, accelerated deployment and simplified model management
What does Vertex AI offer to achieve your ML goals?
Provides a TensorFlow Graph for preprocessing
What does tf.Transform do during the training and serving phase?
How much each feature impacts the model, expressed as a percentage
What does the Feature importance attribution in Vertex AI display?
model.predict()
What function can be used for a model to do prediction
II, III, I, IV
What is the order of steps to push a trained model to AI Platform for serving? I - Run the command gcloud ai-platform versions create {model_version} to create a version for the model. II - Train and save the model. III - Run the command gcloud ai-platform models create to create a model object. IV - Run the command gcloud ai-platform predict to get predictions.
Defines the number of epochs
What is the significance of the fit method while training a Keras model?
Model Training
Which stage of the ML workflow includes model evaluation
True
(T/F) In TensorFlow Playground, orange and blue are used throughout the visualization in slightly different ways, but in general orange shows negative values while blue shows positive values.
True
(T/F) Larger batch size require smaller learning rates
True
(T/F) The way you deploy a TensorFlow model is different from how you deploy a PyTorch model, and even TensorFlow models might differ based on whether they were created using AutoML or by means of code. True or False: In the unified set of APIs that Vertex AI provides, you can treat all these models in the same way.
True
(T/F) True or False: In TensorFlow Playground, the data points (represented by small circles) are initially colored orange or blue, which correspond to zero and negative one
True
(T/F) Use BQ to process tabular data and Dataflow to process unstructured data
False
(T/F) When building a content-based recommender system, it's not important to express both your items and users using the same embedding space (that is. the same dimensions and features).
Database and storage
Cloud storage, coud bigtable, cloud sql, cloud spanner and firestone represent which type of services?
OCR analyzes the patterns of light and dark that make up the letters and numbers to turn the scanned image into text.
How does OCR (optical character recognition) transform images into an electronic form?
Zero
How many learnable parameters does a pooling layer have
Classification model
What model would you use if your problem required a discrete number of values or classes?
Online prediction
What prediction method do you use for synchronous or real-time prediction that quickly returns a prediction but only accepts one prediction request per API call
Entity
Which of the following is an instance of an entity type
kfp.dsl.package
Which package is used to define and interact with pipelines and components?
Model Prototyping
Which process covers algorithm selection, model training, hyperparameter tuning and model evaluation in the experimentation and prototyping activity
Experimentation and training operationalization
Which two activities are involved in ML development
Container Logging
Which type of logging should be enabled in the online prediction that logs the stderr and stdout streams from your prediction nodes to cloud logging and can be useful for debugging?
Static training
Which type of training do you use if your data set doesn't change over time?
Avoids overfitting
Why is regularization important in logistic regression?
Droupout
__________ is a technique that is used to prevent a model from overfitting
Batch Normalization
_______________ applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1
ML.FEATURE_CROSS
________________ generates a STRUCT feature with all combinations of crossed categorical features except for 1-degree items.
preprocessing layers
image data augmentation, image preprocessing, numerical features preprocessing
A large learning rate value may result in the model learning a sub-optimal set of weights too fast or an unstable training process
The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. Choosing the learning rate is challenging. What can happen if the value is too large?
Training may take a long time
The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. Choosing the learning rate is challenging. What can happen if the value is too small?
Numpy array of predictions
The predict function in the tf.keras API returns what?
Use the AI platform training pre-built Kubeflow component
The simplest way to launch a training task on AI platform from a Kubeflow task is
CNN
Which of the following networks is used in identifying faces, objects, and traffic signs?
Unreliable info, incomplete data, duplicated data
Features of low data quality
K-Means Clustering
Where labels are not available, for example where customer segmentation is required, which of the following BigQuery supported models is useful?
Vertex AI
Which google cloud product lets users create, deploy and manage ML models in one unified platform?
Geospatial analysis
Which BQ feature leverages geography data types and standard SQL geography functions to analyze a data set
Compute
Compute engine, Google Kubernetes Engine, App Engine and Cloud Functions represent which type of services
Evaluating perfomance in ML, understanding inclusion and how to introduce inclusion across different subgroups within your data
Confusion matrix helps ...
Managed Dataset in VertexAI
Data loaded into Vertex AI - whether it be from Google Cloud Storage orBigQuery. This means, for example, that it can be linked to a model.
Use non-saturating, nonlinear activation functions such as ReLUs.
During the training process, each additional layer in your network can successively reduce signal vs. noise. How can we fix this?
Streaming (Pub/Sub), structured batch (BigQuery), unstructured batch (Cloud Storage)
Match the three types of data ingest with an appropriate source of training data.
Create a dataset and upload data, train an ML model, Deploy trained model to an endpoint for serving predictions
Stages of the ML workflow that can be managed with Vertex AI
The number of times the user hiked that trail
Suppose you want to build a collaborative filter to suggest new hiking trails for users. The problem is you don't have any good explicit user ratings for trails. What feature might be useful for creating an implicit measure of a user's rating for a trail instead?
ML.EVALUATE
The _____________ function can be used with linear regression, logistic regression, k-means, matrix factorization, and ARIMA-based time series models, The _____________ function evaluates the predicted values against the actual data, You can use the _____________ function to evaluate model metrics.
First, upload data to Google Cloud Storage. Next, move code into a trainer Python package. Then submit your training job with gcloud to train on Vertex AI. Feedback: This answer is correct.
To make your code compatible with Vertex AI, there are three basic steps that must be completed in a specific order. Choose the answer that best describes those steps.
Runner
To run a pipeline you need something called a ________________.
GANS (Generative Adversarial Networks)
Typically, ML practitioners train models using different architectures, input data sets, hyperparameters, and hardware. What architectural type would you use for cyber-security, pattern recognition, self-driving cars, and reinforced learning?
Ask the user for some basic preferences. Rely on content-based method instead for new users
What are some ways you can address the cold-start problem that can occur for new users of a collaborative filter recommendation system? There could be more than one answer.
Feature store, data catalog, dataplex
What are the parts of google's enterprise data management and governance tool?
False Positive
when the label says something doesn't exist but the model says it exists
One Hot Encoding
process by which categorical variables are converted into a form that could be provided to neural networks to do a better job in prediction.
Endpoints
promises to improve privacy and reduce latency for online prediction tasks by eliminating the need for data to go through any public networks before making it back into VPCs.
Feature Cross
synthetic feature formed by multiplying (crossing) two or more features. Crossing combinations of features can provide predictive abilities beyond what those features can provide individually
the min, median and max values for each feature
what does the aggregation values contain in any feature
tf.feature_column.bucketized_column
what function do you use to discretize floating point values into a smaller number of categorical bins
same padding and valid padding
what kind of padding methods are available in Keras?
True
(T/F) Different problems in the same domain may need different features
True
(T/F) In TensorFlow Playground, in the output layer, the dots are colored orange or blue depending on their original values. The background color shows what the network is predicting for a particular area. The intensity of the color shows how confident that prediction is.
True
(T/F) In the featurestore, the timestamps are an attribute of the feature values, not a separate resource type
True
(T/F) MLOps, besides testing and validating code and components, also tests and validates data, data schemas, and models.
False
(T/F) Non-linearity helps in training your model at a much faster rate and with more accuracy without the loss of your important information?
True
(T/F) One-hot encoding encodes the word to a vector where one corresponds to its position in the vocabulary and zeros to the rest.
True, Anything in Map or FlatMap can be parallelized by the Beam execution framework.
(T/F) The Filter method can be carried out in a parallel and autoscaled by the execution framework?
Numeric with meaningful magnitude, it should be known at prediction time, should be related to the objective
A good feature has what characteristics
Proportional to k, the number of dimensions in your embedding space. Proportional to the number of users
ALS and WALS create embedding tables for both users and items. Because these are held in memory, it's important to plan for their size. How big would you expect the embedding table for the users to be?
Region, display-name, worker-pool-spec
Fill in the blanks. You can use either pre-built containers or custom containers to run training jobs. Both containers require you specify settings that Vertex AI needs to run your training code, including __________, ____________, and ________.
Hidden, weights, positive, output, negative
Fill in the blanks: In the ____ layers, the lines are colored by the _____ of the connections between neurons. Blue shows a _____ weight, which means the network is using that ____ of the neuron as given. An orange line shows that the network is assigning a _____ weight.
kernel
Filter that is used to extract the features from the images
Drift detection
For which, the baseline is the statistical distribution of the features values seen in production in the recent past?
Training operationalization
If the model needs to be repeatedly retrained in the future, an automated training pipeline is also developed. Which task do we use for this?
Continuous classification
In addition to CI/CD practiced by DevOps teams, MLOps introduces:
after turning your raw data into a useful feature vector
In what form can raw data b used inside ML models?
III, II, I, VI, V, IV
In what order are the following phases executed in a machine learning project? I - Selection of ML algorithm II - Data Exploration III - Definition of the business use case IV - Model monitoring V - Model operationalization VI - Model Development
Feature Cross
It is a process of combining features into a single feature, enables a model to learn separate weights for each combination of features.
Multiple inputs and outputs and models with shared layers.
The Keras Function API can be characterized by having
0.0 and 1.0
The learning rate is a configurable hyperparameter used in the training of neural networks that has a small positive value, often in the range between _______
A standard LSTM cell includes three gates: the forget gate to forget irrelevant information, the input gate to remember relevant information, and the update gate to update new information.
What are the major gates in a standard LSTM (long short-term memory) cell?
Data prep, model training, model serving
What are the major stages of an end-to-end workflow to build an NLP project with vertex AI
May lead to performance issues like insufficient computing power, it will increase the input size with longer training time for an ML model
What are the possible consequences for an ML model being trained with high resolution photos with high color depth?
Intent (the topic), entity (the details), and context (the flow of the conversation).
What are the three major components that the Dialogflow API helps to identify in a conversation?
Batch serving and online serving
What are the two methods feature store offers for serving features?
kfp.v2.compiler.Compiler
What can you use to compile a pipeline?
Vertex AI python Client
What can you use to create a pipeline run on Vertex AI Pipelines?
It returns the maximum value out of all the input data values passed to a kernel
What does the max-pooling operation do in a CNN
Apache and TF
What is TensorFlow Transform a hybrid of?
The same code you use to preprocess features in training and evaluation can also be used in serving
What is one key advantage of preprocessing your features using Apache Beam?
GRU(units)
What is the coding in Keras to build the hidden layer of a GRU (gated recurrent unit) model?
Problem definition > Data selection > Data exploration > Feature engineering > Model prototyping > Model validation
What is the correct process that data scientists use to develop the models on experimentation platform?
CBOW uses surrounding words to predict a center word, whereas skip-gram uses a center word to predict surrounding words.
What is the difference between continuous bag-of-words (CBOW) and skip-gram, the two primary techniques of word2vec?
Each data source file must not be larger than 10 GB. You can include multiple files, up to a maximum amount of 100 GB.
What is the maximum size of a CSV during batch prediction
Sequence-to-sequence problems such as machine translation where you translate sentences to another language
What is the problem that an encoder-decoder mainly solves?
A large number of parameters comes from the dense layers at the end, and the convolutional layers contain far fewer parameters
What is the proportion of the number of parameters in the entire network while computing
Connectors allow you to output the results of a pipeline to a specific data sink like Bigtable, Google Cloud Storage, flat file, BigQuery, and more.
What is the purpose of a Cloud Dataflow connector? .apply(TextIO.write().to("gs://..."));
To ensure that the models are good before moving them into a production/staging env
What is the responsibility of model evaluation and validation components
AutoML
What method do you use to create and train a model with minimal technical effort to quickly prototype models and explore new datasets before investing in development?
5%
What percent of system code does the ML model account for?
You should provide the BigQuery Data Editor role to the Vertex AI service account in that project.
What should be done if the source table is in a different project?
negative transfer learning
When knowledge is transferred from a less related source, the target performance might be degraded
init_.py
When you package up a TensorFlow model as a Python Package, what statement should every Python module contain in every folder?
Transformation
When you use the data to train a model, Vertex AI examines the source data type and feature values and infers how it will use that feature in model training. This is called the ________________for that feature
Feature Registry
Where are the features registered?
Padding
Which CNN model parameter helps to maintain the same size across the input and the output of the convolutional layer
MOD(ABS(FARM_FINGERPRINT(field)),10) < 7
Which command allows you to split your dataset to get 70% of it for training in a repeatable fashion?
Dataflow
Which data cloud processing option can be used for transforming large unstructured data in Google Cloud?
Pixel randomization
Which factor does not affect the accuracy of the deep neural network?
task.py
Which file is the entry point to your code that vertex AI will start and contains details such as how to parse command line arguments and where to write model outputs
Ability to scale to a large dataset, find good features, it should be able to preprocess with vertex AI
Which of the following are the requirements to build an effective machine learning model?
Feature ingestion
Which of the following is the process of importing feature values computed by your feature engineering jobs into a featurestore?
F1 Score
Which of the following metrics can be used to find a suitable balance between precision and recall in a model?
REINFORCE, proximal policy optimization (PPO), Deep deterministic policy gradient (DDPG)
You would like to train an agent to drive a car. The action space consists of the following variables: the acceleration (between 0 and 300), the angular degree of turn or tilt (between 0 and 180 degrees), and the direction (either forward or reverse). Select the three algorithms which are appropriate. Credit is given for selecting the correct three.
hashing
______ layer is not trainable
sequential model
__________ is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor
Loss function
__________ measures how accurate the model is during training
Embedding
___________ is a weighted sum of the feature crossed values, ________ is a handy adapter that allows a network to incorporate spores or categorical data, the number of _______ is the hyperparameter to your ML model
Online serving
____________ is for low-latency data retrieval of small batches of data for real time processing
Preprocessing
____________ suppresses unwanted distortions and enhances the required features that are essential for the application
ML.BUCKETIZE
________________ is a pre-processing function that creates buckets by returning a STRING as the bucket name after numerical_expression is split into buckets by array_split_points, : It bucketizes a continuous numerical feature into a string feature with bucket names as the value
Velocity
refers to how quickly data is generated and how quickly that data moves
Volume
refers to the amount of data that exists
Veracity
refers to the quality and accuracy of data
Data Augmentation
set of techniques to artificially increase the amount of data by generating new data points from existing data
Choose an existing processor created for a specialized task, choose an existing processor created for general purposes, create a custom processor and build it on your own
what are the options to create a processor for document AI
Cleaning tools, monitoring tools
Categories of data quality tools
Confusion matrix
One of the ket tools to help in understanding inclusion and how to introduce inclusion across different kinds of groups across your data is by understanding _______.
XGBoost
For classification or regression problems with decision trees, which of the following models is most relevant?
Prepare training data in BigQuery, train a recommendation system with BigQuery ML, use the predicted recommendations in production
3 key steps for creating a recommendation system with BQ ML
Avoid training server skew, avoid target leakage, provide a time signal
What are the best practices for data prep?
It reduces the time it takes to develop trained models and assess their performance
What is the main benefit of using an automated ML workflow?
Precision
A farm uses Google's machine learning technology to detect defective apples in their crop, such as those that are irregular in size or have scratches. The goal is to identify only the apples that are actually bad so that no good apples are wasted. Which metric should the model focus on?
Performance metrics are easier to understand and are directly connected to business modules
benefits of performance metrics over loss functions
Feature columns
describes how the model should use raw input data from your features dictionary
Variety
refers to diversity of data types
Recall
the faction of retrieved instances among all relevant instances
Create ML model inside BQ
Data has been loaded into BQ, and the features have been selected and preprocessed. What should happen next when you use BQML to develop a machine learning model?
Facets
Datasets can contain hundreds of millions of data points, each consisting of hundreds (or even thousands) of features, making it nearly impossible to understand an entire dataset in an intuitive fashion. The key here is to utilize visualizations that help unlock nuances and insights in large datasets. Which tool would be most appropriate?
Custom training
Select the correct word below to fill in the blank: Vertex AI is flexible. You choose your training method. _____________ lets you create a training application optimized for your targeted outcome. You have complete control over training application functionality; you can target any objective, use any algorithm, develop your own loss functions or metrics, or do any other customization.
Speech API
The most efficient way to transcribe speech
precision
the fraction of relevant instances among all retrieved instances
Machine Learning
AutoML, Vertex AI Workbench and TF align to which stage of the data-to-AI workflow?
BQ manages the underlying structure
BQ is a fully managed data warehouse, what does fully managed refer to?
Logistic regression
BigQuery supported classification models is most relevant for predicting binary results, such as True/False?
Directed Acyclic Graph (DAG)
How does TensorFlow represent numeric components?
when loss metrics start to increase
How to decide when to stop training a model
80-10-10
Default setting in AutoML for the data split in model evaluation
Pre-Built APIs
You work for a video production company and want to use machine learning to categorize event footage, but don't want to train your own ML model. Which option can help you get started?
AutoML
Your company has a lot of data, and you want to train your own machine model to see what insights ML can provide. Due to resource constraints, you require a codeless solution. Which option is best?
For small datasets, train the model within the notebook instance
Your dataset is considered small, less than 5,000 rows and around 10MB. You are not using AutoML but a Jupyter Notebook instance. Which of the following is a Best Practice for Training a model with a small dataset?
True
(T/F) TensorFlow is a scalable and multi platform programming interface for implementing and running machine learning algorithms, including convenience wrappers for deep learning
Volume
Due to several data types and sources, big data often has many data dimensions. This can introduce data inconsistencies and uncertainties. Which type of challenge might this present to data engineers?
Univariate, Bivariate
EDA is majorly performed using these methods
Big Query ML
For a user who can use SQL, has little Machine Learning experience and wants a 'Low-Code' solution, which Machine Learning framework should they use?
In Cloud Storage
The data used to train a model can originate from any number of systems, for example, logs from an online service system, images from a local device, or documents scraped from the web. Which of the following is a Best Practice for Preparing and Storing unstructured data such as images, audio, and video?
More complex ways of connecting layers, cambrian explosion of computing power to train, automatic feature extraction
What differentiates deep learning networks in multilayered networks?
Observing how well a model performs against a new dataset that it hasn't seen before.
What is the best way to assess the quality of a model
mean squared error as their loss function
What is the most essential metric a regression model uses?
They can be both reshaped and sliced
What operations can be performed on tensors
Storage and Analytics
What two services does BQ provide?
Pre-trained models
What would you use to replace user input by ML?
Velocity
When you build scalable and reliable pipelines, data often needs to be processed in near-real time, as soon as it reaches the system which type of challenge might this present to data engineers.
tf.data.Dataset
Which API is used to build performant complex input pipelines from simple, re-usable pieces that will feed your models training or evaluation loosp
TPU (Tensor Processing Units)
Which Google hardware innovation tailors architecture to meet the computation needs on a domain such as the matrix multiplication in ML
Vertex AI Pipelines
Which Vertex AI tool automates, monitors, and governs machine learning systems by orchestrating the workflow in a serverless manner?
Workbench
Which VertexAI service lets you access data, process data in a Dataproc cluster, train a model, share results and more all without leaving the JupyterLab interface?
Custom Training
Which code based solution offered with Vertex AI gives Data Scientists full control over the development env and process?
Archive storage
Which data storage class is best for storing data that needs to be accessed less than once a year, such as online backups and disaster recovery?
Pub/Sub
Which google cloud product is distributed messaging service that is designed to ingest messages from multiple device streams such as gaming events, IoT devices and application streams?
Dataflow
Which google cloud products acts as an execution engine to process and implement data processing pipelines?
Batch Load
Which pattern describes source data that is moved into BQ table in a single operation?
Supervised learning, logistic regression
You want to use ML to identify whether an email is spam. Which should you use?
Unsupervised learning, cluster analysis
You want to use machine learning to group random photos into similar groups what do you use?
BQ ML
You work for a global hotel chain that has recently loaded some guest data into BigQuery. You have experience writing SQL and want to leverage machine learning to help predict guest trends for the next few months. Which option is best?
Value
refers to the value that big data can provide and related directly to what organizations can do with that collected data
Categorical features preprocessing
tf.keras.layers.CategoryEncoding, tf.keras.layers.Hashing, tf.keras.layers.IntegerLookup
Check for missing data and other mistakes, gain maximum insight into the data set and its underlying structure, uncover a parsimonious model, one which explains the data with a minimum number of predictor variables
what are the objectives of EDA
Tabular
If a dataset is presented in a Comma Separated Values (CSV) file, which is the correct data type to choose in Vertex AI?
Data preparation
Which stage of the ML workflow includes feature engineering?
There are the human biases that exist in data because data found in "the world" has existing biases with regard to properties like gender, race, and sexual orientation. For example, there may be reporting bias by our subjects because they only choose to reveal certain aspects about themselves or their opinions. We can also run into human biases which arise as part of our data collection and labeling procedures.
Human biases lead to bias in machine learning models. Unconscious biases exist in our data and exist in two forms. What are the two forms of unconscious biases in data?
Regression/Classification
If the business case is to predict fraud detection, which is the correct objective to choose in vertex AI?
Framing the problem
In ML development which phase identifies your use case?
Both Labeled and Unlabeled data
Refers to the type of data used in ML models
1. Ingest streaming results, 2. Process data 3. Visualize results
Streaming data workflow steps