SIMULATED TEST QUESTIONS - PRACTICE EXAM 3 - (please feel free to submit edits/corrections to Mike!)

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

You have a customer ranking ML model in production for an e-commerce site; the model used to work very well. You use GCP managed services, specifically AI Platform and Vertex AI. Suddenly, there is a sensible degradation in the quality of the inferences. You perform various checks, but the model seems to be perfectly fine. Finally, you control the input data and notice that the frequency distributions have changed for a specific feature. Which GCP service can be helpful for you to manage features in a more organized way? A. Regularization against overfitting B. Feature Store C. Hyperparameters tuning D. Model Monitoring

B. Feature Store SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. Feature engineering means transforming input data, often strings, into a feature vector. Lots of effort is spent in mapping categorical values in the best way: we have to convert strings to numeric values. We have to define a vocabulary of possible values, usually mapped to integer values. We remember that in an ML model everything must be translated into numbers. Therefore it is easy to run into problems of this type. Vertex Feature Store is a service to organize and store ML features through a central store. This allows you to share and optimize ML features important for the specific environment and to reuse them at any time. All these translate into the greater speed of the creation of ML services. But these also allow minimizing problems such as processing skew, which occurs when the distribution of data in production is different from that of training, often due to errors in the organization of the features. For example, Training-serving skew may happen when your training data uses a different unit of measure than prediction requests. So, Training-serving skew happens when you generate your training data differently than you generate the data you use to request predictions. For example, if you use an average value, and for training purposes, you average over 10 days, but you average over the last month when you request prediction. A and C are wrong because the model is OK. So both Regularization against overfitting and Hyperparameters are tuned. D is wrong because Monitor is suitable for Training-serving skew prevention, not organization. For any further detail: https://developers.google.com/machine-learning/crash-course/representation/feature-engineering https://cloud.google.com/architecture/ml-on-gcp-best-practices#use-vertex-feature-store-with-structured-data https://cloud.google.com/blog/topics/developers-practitioners/kickstart-your-organizations-ml-application-development-flywheel-vertex-feature-store

You are consulting a CIO of a big firm regarding organization and cost optimization for his company's ML projects in GCP. He asked: "How can I get the most from ML services and the least costs?" What are the best practices recommended by Google in this regard? A. Use Notebooks as ephemeral instances B. Set up an automatic shutdown routine C. Use Preemptible VMs per long-running interrumpible tasks D. Get monitoring alerts about GPU usage E. All of the above

E. All of the above A is OK because Notebooks are used for a limited time, but they reserve VM and other resources. So you have to treat them as ephemeral instances, not as long-living ones. B is OK because you can configure an automatic shutdown routine when your instance is idle, saving money. C is OK because Preemptible VMs are far cheaper than normal instances and are OK for long-running (batch) large experiments. D is OK because you can set up the GPU metrics reporting script; it is important because GPU is expensive. For any further detail: Best practices for performance and cost optimization for machine learning

With your team, you have to decide the strategy for implementing an online forecasting model in production. This template needs to work with both a web interface as well as DialogFlow and Google Assistant. A lot of requests are expected. You are concerned that the final system is not efficient and scalable enough. You are looking for the simplest and most managed GCP solution. Which of these can be the solution? A. AI Platform Prediction B. GKE and TensorFlow C. VMs and Autoscaling Groups with Application LB D. Kubeflow

A. AI Platform Prediction SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. The AI Platform Prediction service is fully managed and automatically scales machine learning models in the cloud. The service supports both online prediction and batch prediction. B and C are wrong because they are not managed services. D is wrong because Kubeflow is not a managed service. It is used in AI platforms and lets you deploy ML systems in various environments. For any further detail: https://cloud.google.com/blog/products/ai-machine-learning/scaling-machine-learning-predictions https://cloud.google.com/ai-platform/prediction/docs/overview https://cloud.google.com/blog/topics/developers-practitioners/cook-your-own-ml-recipes-ai-platform

Your client has a large e-commerce Website that sells sports goods and especially scuba diving equipment. It has a seasonal business and has collected a lot of sales data from its structured ERP and market trend databases. It wants to predict the demand of its customers both to increase business and improve logistics processes. What managed and fast-to-use GCP products can be used for these types of models (pick 2)? A. Auto ML B. BigQuery ML C. KubeFlow D. TFX

A. Auto ML B. BigQuery ML We speak clearly of X. Obviously, we have in GCP the possibility to use a large number of models and platforms. But the fastest and most immediate modes are with Auto ML and BigQuery ML; both support quick creation and fine-tuning of templates. C and D are wrong because KubeFlow and TFX are open-source libraries that work with Tensorflow. So, they are not managed and so simple. Moreover, they can work in an environment outside GCP that is a big advantage, but it is not in our requirements. Kubeflow is a system for deploying, scaling and managing complex Tensorflow systems on Kubernetes. TFX is a platform that allows you to create scalable production ML pipelines for TensorFlow projects. For any further detail: https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create#model_type https://ai.googleblog.com/2020/12/using-automl-for-time-series-forecasting.html

Your team is designing a financial analysis model for a major Bank. The requirements are: Various banking applications will send transactions to the new system both in real-time and in batch in standard/normalized format The data will be stored in a repository Structured Data will be trained and retrainedLabels are drawn from the data. You need to prepare the model quickly and decide to use Auto ML for structured Data. Which GCP Services could you use (pick 3)? A. AutoML Tables B. AI Platform C. BigQuery ML D. Vertex AI

A. AutoML Tables C. BigQuery ML D. Vertex AI SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. Auto ML Tables is aimed to automatically build and deploy models on your data in the fastest way possible.It is integrated within BigQuery ML and is now available in the unified Vertex AI.Vertex AI includes an AI Platform, too.But AI Platform alone doesn't have any AutoML Services. So, B is wrong. For any further detail: https://cloud.google.com/automl-tables/docs/beginners-guide https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-automl https://cloud.google.com/vertex-ai/docs/beginner/beginners-guide#text

You are working with Vertex AI, the managed ML Platform in GCP. You want to leverage Explainable AI to understand which are the most essential features and how they influence the model. For what kind of model may you use Vertex Explainable AI (pick 3)? A. AutoML tables B. Image Classification C. DNN D. Decision Tree

A. AutoML tables B. Image Classification C. DNN SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. Deep Learning is known to give little comprehension about how a model works in detail. Vertex Explainable AI helps to detect it, both for classification and regression tasks. So these functions are useful for testing, tuning, finding biases and thus improving the process. You can get explanations from Vertex Explainable AI both for online and batch inference but only regarding these ML models: Structured data models (AutoML, classification and regression) Custom-trained models with tabular data and images In the Evaluate section, you can find these insights in the Google Cloud Console (Feature importance graph). D is wrong because Decision Tree Models are explainable without any sophisticated tool for enlightenment. It uses three methods for feature attributions: sampled Shapley: Uses scores for each feature and their permutations integrated gradextension of the integrated gradients method creates a saliency map with overlapping regions of the image (like in the picture) For any further detail: https://cloud.google.com/resources/mlops-whitepaper https://cloud.google.com/vertex-ai/docs/explainable-ai/overview

Your team is working on a great number of ML projects. You need to appropriately collect and transform data and then create and tune your ML models. In a second moment, these procedures will be inserted in an MLOps flow and therefore will have to be automated and be as simple as possible. What are the methodologies / services recommended by Google (pick 3)? A. Dataflow B. BigQuery C. Tensorflow D. Cloud Fusion E. Dataprep

A. Dataflow B. BigQuery C. Tensorflow SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. Dataflow is an optimal solution for compute-intensive preprocessing operations because it is a fully managed autoscaling service for batch and streaming data processing. BigQuery is a strategic tool for GCP. BigData at scale, machine learning, preprocessing with plain SQL are all important factors. TensorFlow has many tools for data preprocessing and transformation operations. Main techniques are aimed to feature engineering (crossed_column, embedding_column, bucketized_column) and data transformation (tf.Transform library). D is wrong because Cloud Fusion is for ETL with a GUI, so with limited programming. E is wrong because Dataprep is a tool for visual data cleaning and preparation. For any further detail: Preparing data and managing datasets | Vertex AI https://cloud.google.com/architecture/data-preprocessing-for-ml-with-tf-transform-pt1 https://cloud.google.com/architecture/data-preprocessing-for-ml-with-tf-transform-pt1#where_to_do_preprocessing https://cloud.google.com/blog/topics/developers-practitioners/architect-your-data-lake-google-cloud-data-fusion-and-composer

You work as a junior Data Scientist in a consulting company, and you work with several ML projects. You need to properly collect and transform data and then work on your ML models. You want to identify the services for data transformation that are most suitable for your needs. You need automatic procedures triggered before training. What are the methodologies / services recommended by Google (pick 3)? A. Dataflow B. BigQuery C. Tensorflow D. Cloud Composer

A. Dataflow B. BigQuery C. Tensorflow SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. Google primarily recommends BigQuery, because this service allows you to efficiently perform both data and feature engineering operations with SQL standard.In other words, it is suitable both to correct, divide and aggregate the data, and to process the features (fields) merging, normalizing and categorizing them in an easy way. In order to transform data in advanced mode, for example, with window-aggregation feature transformations in streaming mode, the solution is Dataflow. It is also possible to perform transformations on the data with Tensorflow (tf.transform), such as creating new features: crossed_column, embedding_column, bucketized_column. It is important to note that with Tensorflow these transformations become part of the model and will be integrated into the graph that will be produced when the SavedModel is created. Look at the summary table at this link for a complete overview. D is wrong because Cloud Composer is often used in ML processes, but as a workflow tool, not for data transformation. For any further detail: Preparing data and managing datasets | Vertex AI https://cloud.google.com/composer https://cloud.google.com/architecture/setting-up-mlops-with-composer-and-mlflow

You are a junior Data Scientist, and you work in a Governmental Institution. You are preparing data for a linear regression model for Demographic research. You need to choose and manage the correct feature.Your input data is in BigQuery. You know very well that you have to avoid multicollinearity and optimize categories. So you need to group some features together and create macro categories. In particular, you have to join country and language in one variable and divide data between 5 income classes. Which ones of the following options can you use (pick 2)? A. FEATURE_CROSS B. ARRAY_CONCAT C. QUANTILE_BUCKETIZE D. ST_AREA

A. FEATURE_CROSS C. QUANTILE_BUCKETIZE A feature cross is a new feature that joins two or more input features together. (The term cross comes from cross product.) Usually, numeric new features are created by multiplying two or more other features. QUANTILE_BUCKETIZE groups a continuous numerical feature into categories with the bucket name as the value based on quantiles. Example: ML.FEATURE_CROSS STRUCT(country, language) AS origin) and ML.QUANTILE_BUCKETIZE → income_class B is wrong because ARRAY_CONCAT joins one or more arrays (number or strings) into a single array. D is wrong because ST_AREA returns the number of square meters covered by a GEOGRAPHY area. For any further detail: https://towardsdatascience.com/assumptions-of-linear-regression-fdb71ebeaa8b https://cloud.google.com/bigquery-ml/docs/bigqueryml-transform

Your team has prepared a Multiclass logistic regression model with tabular data in the Vertex AI with Auto ML environment. Everything went very well. You appreciated the convenience of the platform and AutoML. What other types of models can you implement with AutoML (Pick 3)? A. Image Data B. Text Data C. Cluster Data D. Video Data

A. Image Data B. Text Data D. Video Data SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. AutoML on Vertex AI can let you build a code-free model. You have to provide training data.The types of models that AutoML on Vertex AI can build are created with image data, tabular data, text data, and video data. All the detailed information is at the link: So C is wrong because Cluster Data may be related to unsupervised learning; that is not supported by Auto ML. For any further detail: https://cloud.google.com/vision/automl/docs/beginners-guide https://cloud.google.com/vertex-ai/docs/start/automl-model-types

Your team is working with a great number of ML projects, especially with Tensorflow.You recently prepared a DNN model for image recognition that works well and is about to be rolled out in production. Your manager asked you to demonstrate the inner workings of the model. It is a big problem for you because you know that it is working well. But you don't have the explainability of the model. Which of these techniques could help you? A. Integrated Gradient B. LIT C. WIT D. PCA

A. Integrated Gradient SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. Integrated Gradient is an explainability technique for deep neural networks that gives information about the model's prediction. Integrated Gradient works highlighting the feature importance. It computes the gradient of the model's prediction output regarding its input features without modification to the original model. In the picture, you can see that it tunes the inputs and computes attributions so that it can compute the feature importances for the input image. You can use tf.GradientTape to compute the gradients. B is wrong because LIT is only for NLP models. C is wrong because What-If Tool is only for classification and regression models with structured data. D is wrong because Principal component analysis (PCA) transforms and reduces the number of features by creating new variables from linear combinations of the original variables. The new features will be all independent of each other. For any further detail: TensorFlow Core https://towardsdatascience.com/understanding-deep-learning-models-with-integrated-gradients-24ddce643dbf

You are working with Vertex AI, the managed ML Platform in GCP. You are dealing with custom training. You are looking and studying the job progresses during the training service lifecycle. Which of the following states are not correct? A. JOB_STATE_ACTIVE B. JOB_STATE_RUNNING C. JOB_STATE_QUEUED D. JOB_STATE_ENDED

A. JOB_STATE_ACTIVE Queueing a new job When you create a CustomJob or HyperparameterTuningJob, the job is in the JOB_STATE_QUEUED. When a training job starts, Vertex AI schedules as many workers according to the configuration in parallel. So Vertex AI starts running code as soon as a worker becomes available. When all the workers are available, the job state will be: JOB_STATE_RUNNING. A training job ends successfully when its primary replica exits with exit code 0. Therefore all the other workers will be stopped. The state will be: JOB_STATE_ENDED. So A is wrong simply because this state doesn't exist. All the other answers are correct. Each replica in the training cluster is given a single role or task in distributed training. For example: Primary replica: Only one replica, whose main task is to manage the workers. Worker(s): Replicas that do part of the work. Parameter server(s): Replicas that store model parameters (optional). Evaluator(s): Replicas that evaluate your model (optional). For any further detail: https://cloud.google.com/vertex-ai/docs/training/custom-training https://cloud.google.com/vertex-ai/docs/training/distributed-training

You are a junior data scientist working on a logistic regression model to break down customer text messages into important/urgent and important / not urgent. You want to use the best loss function that you can use to determine your model's performance. Which of the following is the optimal methodology? A. Log Loss B. Mean Square Error C. Mean Absolute Error D. Mean Bias Error E. Softmax

A. Log Loss SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. With a logistic regression model, the optimal loss function is the log loss. The intuitive explanation is that when you want to emphasize the loss of bigger mistakes, you need to find a way to penalize such differences. In this case, it is often used the square loss. But in the case of probabilistic values (between 0 and 1), the squaring decreases the values; it does not make them bigger. On the other hand, with a logarithmic transformation, the process is reversed: decimal values get bigger.In addition, logarithmic transformations do not modify the minimum and maximum characteristics (monotonic functions). These are some of the reasons why they are widely used in ML. Pay attention to the difference between loss function and ROC/AUC, which is useful as a measure of how well the model can discriminate between two categories. You may have two models with the same AUC but different losses. B is wrong because Mean Square Error, as explained, would penalize higher errors. C is wrong because Mean Absolute Error takes the absolute value of the difference between predictions and actual outcomes. So, it would not empathize higher errors. D is wrong because Mean Bias Error takes just the value of the difference between predictions and actual outcomes. So, it compensates positive and negative differences between predicted/actual values. It is used to calculate the average bias in the model. E is wrong because softmax is used in multi-class classification models which is clearly not suitable in the case of a binary-class logarithmic loss. For any further detail: https://www.kaggle.com/dansbecker/what-is-log-loss https://developers.google.com/machine-learning/crash-course/logistic-regression/model-training https://en.wikipedia.org/wiki/Monotonic_function https://datawookie.dev/blog/2015/12/making-sense-of-logarithmic-loss/

You are starting to operate as a Data Scientist and speaking with your mentor who asked you to prepare a simple model with a lazy learning algorithm. The problem is that you don't know the meaning of lazy learning; so you looked for it. Which of the following methods uses lazy learning? A. Naive Bayes B. K-Nearest Neighbors C. Logistic Regression D. Simple Neural Networks E. Semi-supervised learning

A. Naive Bayes B. K-Nearest Neighbors Lazy learning means that the algorithm only stores the data of the training part without learning a function. The stored data will then be used for the evaluation of a new query point. K-nearest neighbor is a simple supervised algorithm for both classification and regression problems. You begin with data that is already classified. A new example will be set by looking at the k nearest classified points. Number k is the most important hyperparameter. Naive Bayes is a classification algorithm. The features have to be independent. It requires a small amount of training data. C and D are wrong because for both Neural Networks and Logistic Regression, you have to train the model and figure out the parameters of a specific function that best fit the data before the inference. E is wrong because Semi-supervised learning is a family of classification algorithms with labeled and unlabeled data and methods to organize examples based on similarities and clustering. They have to set up a model and find parameters with training jobs. For any further detail: https://towardsdatascience.com/all-machine-learning-algorithms-you-should-know-in-2021-2e357dd494c7 https://towardsdatascience.com/k-nearest-neighbors-knn-algorithm-23832490e3f4 https://machinelearningmastery.com/parametric-and-nonparametric-machine-learning-algorithms/ https://en.wikipedia.org/wiki/Lazy_learning

You work in a large company that produces luxury cars. The following models will have a control unit capable of collecting data on mileage and technical status to allow intelligent management of maintenance by both the customer and the service centers. Every day a small batch of data will be sent that will be collected and processed in order to provide customers with the management of their vehicle health and push notifications in case of important messages. Which GCP products are the most suitable for this project (pick 3)? A. Pub/Sub B. DataFlow C. Dataproc D. Firebase Messaging

A. Pub/Sub B. DataFlow D. Firebase Messaging SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. The best products are: Pub/Sub for technical data messages DataFlow for data management both in streaming and in batch mode Firebase Messaging for push notifications DataFlow manages data pipelines directed acyclic graphs (DAG) of transformations (PTransforms) on data (PCollections). The same pipeline can activate multiple PTransforms. All the processing can be performed both in batch and in streaming mode. So, in our case of streaming data, Dataflow can: Serialize input data Preprocess and transform data Call the inference function Get the results and postprocess them C is wrong because Dataproc is the managed Apache Hadoop environment for big data analysis usually for batch processing. For any further detail: https://cloud.google.com/architecture/processing-streaming-time-series-data-overview https://cloud.google.com/blog/products/data-analytics/ml-inference-in-dataflow-pipeline shttps://github.com/GoogleCloudPlatform/dataflow-sample-applications/tree/master/timeseries-streaming

As a Data Scientist, you are involved in various projects in an important retail company. You prefer to use, whenever possible, simple and easily explained algorithms. Where you can't get satisfactory results, you adopt more complex and sophisticated methods. Your manager told you that you should try ensemble methods. Intrigued, you are documented. Which of the following are ensemble-type algorithms (pick 3)? A. Random Forests B. DCN C. Decision Tree D. XGBoost E. Gradient Boost

A. Random Forests D. XGBoost E. Gradient Boost SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. Ensemble learning is performed by multiple learning algorithms working together for higher predictive performance. Examples of Ensemble learning are: Random forests, AdaBoost, gradient boost, and XGBoost. Two main concepts for combining algorithms; Bootstrap sampling uses random samples and selects the best of them. Bagging when you put together selected random samples to achieve a better result Random forests are made with multiple decision trees, random sampling, a subset of variables and optimization techniques at each step (voting the best models). AdaBoost is built with multiple decision trees, too, with the following differences: It creates stumps, that is, trees with only one node and two leaves. Stumps with less error win. Ordering is built in such a way as to reduce errors. Gradient Boost is built with multiple decision trees, too, with the following differences from AdaBoost; Trees instead stumps It uses a loss function to minimize errors. Trees are selected to predict the difference from actual values XGBoost is currently very popular. It is similar to Gradient Boost with the following differences: Leaf nodes pruning, that is regularization in order to keep the best ones for generalization Newton Boosting instead of gradient descent, so math-based and faster Correlation between trees reduction with an additional randomization parameter Optimized algorithm for tree penalization B and C are wrong because Deep & Cross Networks are a new kind of Neural Networks. Decision Trees are flowchart like with a series of tests on the nodes. So both of them use one kind of method.F or any further detail: https://towardsdatascience.com/all-machine-learning-algorithms-you-should-know-in-2021-2e357dd494c7

You work as a Data Scientist for a major banking institution that recently completed the first phase of migration in GCP. You now have to work in the GCP Managed Platform for ML. You need to deploy a custom model with Vertex AI so that it will be available for online predictions. Which is the correct procedure (pick 2)? A. Save the model in a Docker container B. Set a VM with a GPU processor C. Use TensorFlow Serving D. Create an endpoint and deploy to that endpoint

A. Save the model in a Docker container D. Create an endpoint and deploy to that endpoint SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. AI Platform/Vertex Prediction is a managed serving platform that supports both CPU and, optionally, GPU. Its main functions are aimed to: infrastructure setup Maintenance Management Its main elements are: Base image Custom container image (A) of the model. Model AI Platform Prediction uses an architectural paradigm that is based on immutable instances of models and model versions. Direct model server Direct access to the model server Model server with listener Listener between the service and the model server Machine types may be configured with: Different number of virtual CPUs (vCPUs) per node Desired amount of memory per node Support for GPUs, which you can add to some machine types B is wrong because you don't need to set any specific VM. You will point out the configuration and Vertex will manage everything. C is wrong because TensorFlow Serving is used under the hood, but you don't need to call their functions explicitly. For any further detail: Vertex Prediction AI Platform Prediction: Custom container concepts https://www.tensorflow.org/tfx/guide/serving https://cloud.google.com/vertex-ai/docs/general/deployment https://cloud.google.com/architecture/ai-platform-prediction-custom-container-concepts

You work for an industrial company that wants to improve its quality system. It has developed its own deep neural network model with Tensorflow to identify the semi-finished products to be discarded with images taken from the production lines in the various production phases. You need to monitor the performance of your models and let them go faster. Which is the best solution that you can adopt? A. TFProfiler B. TF function C. TF Trace D. TF Debugger E. TF Checkpoint

A. TFProfiler SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. TensorFlow Profiler is a tool for checking the performance of your TensorFlow models and helping you to obtain an optimized version. In TensorFlow 2, the default is eager execution. So, one-off operations are faster, but recurring ones may be slower. So, you need to optimize the model. B is wrong because the TF function is a transformation tool used to make graphs out of your programs. It helps to create performant and portable models but is not a tool for optimization. C is wrong because TF tracing lets you record TensorFlow Python operations in a graph. D is wrong because TF debugging is for Debugger V2 and creates a log of debug information. E is wrong because Checkpoints catch the value of all parameters in a serialized SavedModel format. For any further detail: https://www.tensorflow.org/guide/profiler https://cloud.google.com/tpu/docs/cloud-tpu-tools#capture_profile https://www.tensorflow.org/tensorboard/debugger_v2 https://www.tensorflow.org/guide/checkpoint

Your company runs an e-commerce site. You produced static deep learning models with Tensorflow that process Analytics-360 data. They have been in production for some time. Initially, they gave you excellent results, but then gradually, the accuracy has decreased. You are using Compute Engine and GKE. You decided to use a library that lets you have more control over all processes, from development up to production. Which tool is the best one for your needs? A. TFX B. Vertex AI C. SageMaker D. Kubeflow

A. TFX SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. TensorFlow Extended (TFX) is a set of open-source libraries to build and execute ML pipelines in production. Its main functions are: Metadata management Model validation DeploymentProduction execution. The libraries can also be used individually. B is wrong because AI Platform is an integrated suite of ML managed products, and you are looking for a library. AI Platform main functions are: Train an ML model Evaluate and tune model Deploy models Manage prediction: Batch, Online and monitoring Manage model versions: workflows and retraining C is wrong because Sagemaker is a managed product in AWS, not GCP. D is wrong because Kubeflow Pipelines don't deal with production control. Kubeflow Pipelines is an open-source platform designed specifically for creating and deploying ML workflows based on Docker containers. Their main features: Using packaged templates in Docker images in a K8s environment Manage your various tests / experiments Simplifying the orchestration of ML pipelines Reuse components and pipelines For any further detail: https://www.tensorflow.org/tfx

Your team works for an international company with Google Cloud. You develop, train and deploy different ML models. You use a lot of tools and techniques and you want to make your work leaner, faster and more efficient. Now you have the problem that you have to create a model for recognizing photographic images related to collaborators and consultants. You have to do it quickly, and it has to be an R-CNN model. You don't want to start from scratch. So you are looking for something that can help you and that can be optimal for the GCP platform. Which of these tools do you think can help you? A. TensorFlow-hub B. GitHub C. GCP Marketplace Solutions D. BigQueryML Open

A. TensorFlow-hub SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. TensorFlow Hub is ready to use repository of trained machine learning models. It is available for reusing advanced trained models with minimal code. The ML models are optimized for GCP. B is wrong because GitHub is public and for any kind of code. C is wrong because GCP Marketplace Solutions is a solution that lets you select and deploy software packages from vendors. D is wrong because BigQueryML Open is related to Open Data. For any further detail: TensorFlow-hubhttps://www.tensorflow.org/hub

You are starting to operate as a Data Scientist and are working on a deep neural network model with Tensorflow to optimize the level of customer satisfaction for after-sales services with the goal of creating greater client loyalty. You have to follow the entire lifecycle: model development, design, and training, testing, deployment, and retraining. You are looking for UI tools that can help you work and solve all issues faster. Which solutions can you adopt (pick 3)? A. Tensorboard B. Jupyter notebooks C. KFServing D. Kubeflow UI E. Vertex AI

A. Tensorboard B. Jupyter notebooks D. Kubeflow UI SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. Tensorboard is aimed at model creation and experimentation: Profiling Monitoring metrics, weights, biases Examine model graph Working with embeddings Jupyter notebooks are a wonderful tool to develop, experiment, and deploy. You may have the latest data science and machine learning frameworks with them. The Kubeflow UIs is for ML pipelines and includes visual tools for: Pipelines dashboards Hyperparameter tuning Artifact Store Jupyter notebooks C is incorrect because KFServing is an open-source library for Kubernetes that enables serverless inferencing. It works with TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX to solve issues linked to production model serving. So, no UI. E is incorrect because Vertex AI is a suite of services that combines AutoML and AI Platform - you can use both AutoML training and custom training in the same environment. For any further detail: https://www.tensorflow.org/tensorboard https://www.kubeflow.org/docs/components/kfserving/kfserving/ https://cloud.google.com/vertex-ai/docs/pipelines/visualize-pipeline https://www.kubeflow.org/docs/components/central-dash/overview/

Your team is working on a great number of ML projects for an international consulting firm. The management has decided to store most of the data to be used for ML models in BigQuery. The motivation is that BigQuery allows for preprocessing and transformations easily and with standard SQL. It is highly structured; so it offers efficiency, integration and security.Y our team must create and modify code to directly access BigQuery data for building models in different environments. What are the tools you can use (pick 3)? A. Tf.data.dataset B. BigQuery Omni C. BigQuery Python client library D. BigQuery I/O Connector

A. Tf.data.dataset C. BigQuery Python client library D. BigQuery I/O Connector tf.data.dataset reader for BigQuery is the way to connect directly to BigQuery from TensorFlow or Keras. BigQuery I/O Connector is the way to connect directly to BigQuery from Dataflow. For any other framework, you can use BigQuery Python client library B is wrong because BigQuery Omni is a multi-cloud analytics solution. You can access from BigQuery data across Google Cloud, Amazon Web Services (AWS), and Azure. For any further detail: https://cloud.google.com/vertex-ai/docs/training/using-managed-datasets https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas https://beam.apache.org/documentation/io/built-in/google-bigquery/ https://cloud.google.com/vertex-ai/docs/training/using-managed-datasets

You work for an important organization. Your manager tasked you with a new classification model with lots of data drawn from the company Data Lake. The big problem is that you don't have the labels for all the data, but you have very little time to complete the task for only a subset of it. Which of the following services could help you? A. Vertex Data Labeling B. Mechanical Turk C. GitLab ML D. Tag Manager

A. Vertex Data Labeling In supervised learning, the correctness of label data, together with the quality of all your training data, is utterly important for the resulting model and the quality of the future predictions. If you cannot have your data correctly labeled, you may request professional people to complete your training data. GCP has a service for this: Vertex AI data labeling. Human labelers will prepare correct labels following your directions. You have to set up a data labeling job with: The dataset A list, vocabulary of the possible labels An instructions document for the professional people B is wrong because Mechanical Turk is an Amazon service. C is wrong because GitLab is a DevOps lifecycle tool. D is wrong because Tag Manager is in the Google Analytics ecosystem. For any further detail: https://cloud.google.com/vertex-ai/docs/datasets/data-labeling-job

You work as a junior Data Scientist in a Startup and work with several projects with Python and Tensorflow in Vertex AI. You deployed a new model in the test environment and detected some problems that are puzzling you. An experienced colleague of yours asked for the logs. You found out that there is no logging information available. What kind of logs do you need and how do you get them (pick 2)? A. You need to Use Container logging B. You need to Use Access logging C. You can enable logs dynamically D. You have to undeploy and redeploy

A. You need to Use Container logging D. You have to undeploy and redeploy In Vertex AI, you may enable or avoid logs for prediction. When you want to change, you must undeploy and redeploy. There are two types of logs: Container logging, which logs data from the containers hosting your model; so these logs are essential for problem solving and debugging. Access logging, which logs accesses and latency information. Therefore, you need Container logging. The opposite, as with answers B and C, is obviously wrong. For any further detail: https://cloud.google.com/vertex-ai/docs/predictions/online-prediction-logging

Your team is preparing a Deep Neural Network custom model with Tensorflow in AI Platform that forecasts, based on diagnostic images, medical diagnoses. It is a complex and demanding job. You want to get help from GCP for hyperparameter tuning. What are the parameters that you must indicate (pick 2)? A. learning_rate B. parameterServerType C. scaleTier D. num_hidden_layers

A. learning_rate D. num_hidden_layers With AI Platform/Vertex, it is possible to create a hyperparameter tuning job for LINEAR_REGRESSION and DNN. You can choose many parameters. But in case of DNN, you have to use a hyperparameter named learning_rate. The ConditionalParameterSpec object lets you add hyperparameters to a trial when the value of its parent hyperparameter matches a condition that you specify (added automatically) and the number of hidden layers, that is num_hidden_layers. B and C are wrong because scaleTier and parameterServerType are parameters for infrastructure setup for a training job. For any further detail: https://cloud.google.com/ai-platform/training/docs/using-hyperparameter-tuning https://cloud.google.com/vertex-ai/docs/training/hyperparameter-tuning-overview

You are working with Vertex AI, the managed ML Platform in GCP. You want to leverage Vertex Explainable AI to understand the most important features and how they influence the model. Which three methods does Vertex AI leverage for feature attributions? A. sampled Shapley B. integrated gradients C. Maximum Likelihood D. XRAI

A. sampled Shapley B. integrated gradients D. XRAI SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. Deep Learning is known to give little comprehension about how a model works in detail. Vertex Explainable AI helps to detect it, both for classification and regression tasks. So, these functions are useful for testing, tuning, finding biases and thus improving the process. It uses three methods for feature attributions: sampled Shapley: Uses scores for each feature and their permutations integrated gradients: computes the gradient of the features at different points, integrates them and computes the relative weights XRAI is an optimization of the integrated gradients method C is wrong because Maximum Likelihood is a probabilistic method for determining the parameters of a statistical distribution.F or any further detail: https://cloud.google.com/vertex-ai/docs/explainable-ai/overview https://storage.googleapis.com/cloud-ai-whitepapers/AI%20Explainability%20Whitepaper.pdf

You work as a Data Scientist in a Startup and work with several projects with Python and Tensorflow. You need to increase the performance of the training sessions. You already use caching and prefetching. So, now you want to use GPUs, but in a single machine, for cost reduction and experimentations. Which of the following is the correct strategy? A. tf.distribute.MirroredStrategy B. tf.distribute.TPUStrategy C. tf.distribute.MultiWorkerMirroredStrategy D. tf.distribute.OneDeviceStrategy

A. tf.distribute.MirroredStrategy SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. tf.distribute.Strategy is an API explicitly for training distribution among different processors and machines. tf.distribute.MirroredStrategy lets you use multiple GPUs in a single VM, with a replica for each CPU. B is wrong because tf.distribute. TPUStrategy lets you use TPUs, not GPUs. C is wrong because tf.distribute.MultiWorkerMirroredStrategy is for multiple machines. D is wrong because tf.distribute.OneDeviceStrategy, like the default strategy, is for a single device, so a single virtual CPU. For any further detail: https://www.tensorflow.org/guide/distributed_training https://www.tensorflow.org/guide/intro_to_graphs https://blog.tensorflow.org/2019/09/tensorflow-20-is-now-available.html

You just started working as a junior Data Scientist in a consulting Company. You are in a project team that is building a new model and you are experimenting. But the results are absolutely unsatisfactory because your data is dirty and needs to be modified. In particular, you have various fields that have no value or report NaN. Your expert colleague told you that you need to carry out a procedure that modifies them at the time of acquisition. What kind of functionalities do you need to provide (pick 3)? A. Delete all records that have a null/NaN value in any field B. Compute Mean / Median for numeric measures C. Replace Categories with the most frequent one D. Use another ML model for missing values guess

B. Compute Mean / Median for numeric measures C. Replace Categories with the most frequent one D. Use another ML model for missing values guess The most frequent methodologies have been listed.In the case of numerical values, substituting the mean generally does not distort the model (it depends on the underlying statistical distribution). In the case of categories, the most common method is to replace them with the more frequent values. There are often multiple categories in the data. So, in this way, the effect of the missing category is minimized, but the additional values of the current example are used. A is wrong because the common practice is to delete records / examples that are completely wrong or completely lacking information (all null values).In all other cases, it is better to draw all the possible meanings from them. For any further detail: https://towardsdatascience.com/7-ways-to-handle-missing-values-in-machine-learning-1a6326adf79e Data preprocessing for machine learning: options and recommendations

Your company produces and sells a lot of different products.You work as a Data Scientist. You train and deploy several ML models. Your manager just asked you to find a simple method to determine affinities between different products and categories to give sellers and applications a wider range of suitable offerings for customers .The method should give good results even without a great amount of data. Which of the following different techniques may help you better? A. One-hot encoding B. Cosine Similarity C. Matrix Factorization D. PCA

B. Cosine Similarity SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. In a recommendation system (like with the Netflix movies) it is important to discover similarities between products so that you may recommend a movie to another user because the different users like similar objects. So, the problem is to find similar products as a first step. Cosine similarity is a method to do so. You take two products and their characteristics (all transformed in numbers). So, you have two vectors. You may compute differences between vectors in the euclidean space. Geometrically, that means that they have different lengths and different angles. A is wrong because One-hot encoding is a method used in feature engineering for obtaining better regularization and independence. C is wrong because Matrix Factorization is correctly used in recommender systems. Still, it is used with a significant amount of data, and there is the problem of reducing dimensionality. So, for us, Cosine Similarity is a better solution. D is wrong because Principal component analysis is a technique to reduce the number of features by creating new variables. For any further detail: https://wikipedia.org/wiki/Principal_component_analysis https://en.wikipedia.org/wiki/Cosine_similarity https://cloud.google.com/architecture/recommendations-using-machine-learning-on-compute-engine

You just started working as a junior Data Scientist in a consulting Company. The job they gave you is to perform Data cleaning and correction so that they will later be used in the best possible way for creating and updating ML models. Data is stored in files of different formats. Which GCP service is best to help you with this business? A. BigQuery B. Dataprep C. Cloud Compose D. Dataproc

B. Dataprep SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. Dataprep is an end-user service that allows you to explore, clean and prepare structured and unstructured data for many purposes, especially for machine learning. It is completely serverless. You don't need to write code or procedures. A is wrong because BigQuery could obviously query and update data. But you need to preprocess data and prepare queries and procedures. C is wrong because Cloud Compose is for workflow management, not for Data preparation. D is wrong because Dataproc is a fully managed service for the Apache Hadoop environment. For any further detail: https://cloud.google.com/tensorflow-enterprise/docs/overview https://cloud.google.com/blog/products/gcp/google-cloud-platform-adds-new-tools-for-easy-data-preparation-and-integration

You work in a medium-sized company as a developer and data scientist and use the managed ML platform, Vertex AI / AI Platform. You have updated an Auto ML model and want to deploy it to production. But you want to maintain both the old and the new version at the same time. The new version should only serve a small portion of the traffic. What can you do (pick 2)? A. Save the model in a Docker container image B. Deploy on the same endpoint C. Update the Traffic split percentage D. Create a Canary Deployment with Cloud Build

B. Deploy on the same endpoint C. Update the Traffic split percentage The correct procedure is:Deploy your model to an existing endpoint. Update the Traffic split percentage in such a way that all of the percentages add up to 100%. A is wrong because you don't have to create a Docker container image with AutoML. D is wrong because Canary Deployment with Cloud Build is a procedure used in CI/CD pipelines. There is no need in such a managed environment. For any further detail: https://cloud.google.com/vertex-ai/docs/predictions/deploy-model-console

Your company runs a big retail website. You develop many ML models for all the business activities. You migrated to Google Cloud. Your models are developed with PyTorch, TensorFlow and BigQueryML. You are now working on an international project with other partners. You need to let them use your Vertex AI dataset in Cloud Storage for a different organization. (Pick two) What can you do? A. Let them use your GCP Account B. Exporting metadata and annotations in a JSONL file C. Exporting metadata and annotations in a CSV file D. Give access (Service account or signed URL) to the Cloud Storage file E. Copy the data in a removable storage

B. Exporting metadata and annotations in a JSONL file D. Give access (Service account or signed URL) to the Cloud Storage file SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. You can export a Dataset; when you do that, no additional copies of data are generated. The result is only JSONL files with all the useful information, including the Cloud Storage files URIs. But you have to grant access to these Cloud Storage files with a Service account or a signed URL, if to be used outside GCP. A and E are wrong mainly for security reasons. C is wrong because annotations are written in JSON files. For any further detail: https://cloud.google.com/vertex-ai/docs/datasets/export-metadata-annotations https://cloud.google.com/vertex-ai/docs/datasets/datasets https://codelabs.developers.google.com/codelabs/vertex-ai-custom-code-training

You are starting to operate as a Data Scientist. You speak with your mentor who asked you to prepare a simple model with a nonparametric Machine Learning algorithm of your choice. The problem is that you don't know the difference between parametric and nonparametric algorithms. So you looked for it. Which of the following methods are nonparametric? A. Simple Neural Networks B. K-Nearest Neighbors C. Decision Trees D. Logistic Regression

B. K-Nearest Neighbors C. Decision Trees The non-parametric method refers to a method that does not assume any distribution with any function with parameters. K-nearest neighbor is a simple supervised algorithm for both classification and regression problems. You begin with data that is already classified. A new example will be set by looking at the k nearest classified points. Number k is the most important hyperparameter. A decision tree has a series of tests inside a flowchart-like structure. So, no mathematical functions to solve. In the case of both Neural Networks and Logistic Regression, you have to figure out the parameters of a specific function that best fit the data. So A and D are wrong. For any further detail: https://towardsdatascience.com/all-machine-learning-algorithms-you-should-know-in-2021-2e357dd494c7 https://towardsdatascience.com/k-nearest-neighbors-knn-algorithm-23832490e3f4 https://machinelearningmastery.com/parametric-and-nonparametric-machine-learning-algorithms/

You are a junior Data Scientist and you need to create a multi-class classification Machine Learning model with Keras Sequential model API.You have been asked which activation function to use. Which of the following do you choose? A. ReLU B. Softmax C. SIGMOID D. TANH

B. Softmax SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. Softmax is for multi-class classification what Sigmoid is for logistic regression. Softmax assigns decimal probabilities to each class so that their sum is 1. A is wrong because ReLU (Rectified Linear Unit): half rectified. f(z) is zero when z is less than zero and f(z) is equal to z when z. It returns one value C is wrong because Sigmoid is for logistic regression and therefore returns one value from 0 to 1. D is wrong because Tanh or hyperbolic tangent is like sigmoid but returns one value from -1 to 1. For any further detail: https://developers.google.com/machine-learning/crash-course/multi-class-neural-networks/softmax?hl=en https://developers.google.com/machine-learning/crash-course/multi-class-neural-networks/softmax

Your team works for a startup company with Google Cloud. You develop, train and deploy several ML models with Tensorflow. You use data in Parquet format and need to manage it both in input and output. You want the smoothest solution without adding infrastructure and keeping costs down. Which one of the following options do you follow? A. Cloud Dataproc B. TensorFlow I/O C. Dataflow Flex Template D. BigQuery to TFRecords

B. TensorFlow I/O TensorFlow I/O is a set of useful file formats, Dataset, streaming, and file system types management not available in TensorFlow's built-in support, like Parquet. So the integration will be immediate without any further costs or data transformations. Apache Parquet is an open-source column-oriented data storage format born in the Apache Hadoop environment but supported in many tools and used for data analysis. A is wrong because Cloud Dataproc is the managed Hadoop service in GCP. It uses Parquet but not Tensorflow out of the box. Furthermore, it'd be an additional cost. C and D are wrong because there will be an additional cost and additional data transformations. For any further detail: https://www.tensorflow.org/io https://towardsdatascience.com/data-formats-for-training-in-tensorflow-parquet-petastorm-feather-and-more-e55179eeeb72

Your company traditionally deals with the statistical analysis of data. The services have been integrated with ML models for forecasting for some years, but analyzes and simulations of all kinds are carried out. So you are using two types of tools. But you have been told that it is possible to have more levels of integration between traditional statistical methodologies and those more related to AI / ML processes. Which tool is the best one for your needs? A. TensorFlow Hub B. TensorFlow Probability C. TensorFlow Enterprise D. TensorFlow Statistics

B. TensorFlow Probability TensorFlow Probability is a Python library for statistical analysis and probability, which can be processed on TPU and GPU. TensorFlow Probability main features are: Probability distributions and differentiable and injective (one to one) functions. Tools for deep probabilistic models building. Inference and Simulation methods support: Markov chain, Monte Carlo. Optimizers such as Nelder-Mead, BFGS, and SGLD. All the other answers are wrong because they don't deal with traditional statistical methodologies. For any further detail: https://www.tensorflow.org/probability

Your company is a Financial Institution. You develop many ML models for all the business activities. You migrated to Google Cloud. Your models are developed with PyTorch, TensorFlow and BigQueryML. You are now working on an international project with other partners. You need to use the Vertex AI. You are asking experts which the capabilities of this managed suite of services are. Which elements are integrated into Vertex AI? A. Training environments and MLOps B. Training Pipelines, Datasets, Models Management and inference environments (endpoints) C. Deployment environments D. Training Pipelines and Datasets for data sources

B. Training Pipelines, Datasets, Models Management and inference environments (endpoints) SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. Vertex AI covers all the activities and functions listed: from Training Pipelines (so MLOps), to Data Management (Datasets), custom models and Auto ML models management, deployment and monitoring. So, all the other answers are wrong because they cover only a subset of Vertex functionalities. For any further detail: https://cloud.google.com/vertex-ai https://codelabs.developers.google.com/codelabs/vertex-ai-custom-code-training

You work as a junior Data Scientist in a consulting company and work with several projects with Tensorflow. You prepared and tested a new model, and you are optimizing it before deploying it in production. You asked for advice from an experienced colleague of yours. He said that it is not advisable to deploy the model in eager mode. What can you do (pick 3)? A. Configure eager_execution=no B. Use graphs C. Use tf.function decoration function D. Create a new tf.Graph

B. Use graphs C. Use tf.function decoration function D. Create a new tf.Graph When you develop and test a model, the eager mode is really useful because it lets you execute operations one by one and facilitate debugging. But when in production, it is better to use graphs, which are data structures with Tensors and integrated computations Python independent. In this way, they can be deployed on different devices (like mobiles) and are optimizable. To do that, you have to use tf.function decoration function for a new tf.Graph creation. So, A is wrong because there is no such parameter as eager_execution = no. Using graphs instead of eager execution is more complex than that. For any further detail: https://www.tensorflow.org/guide/function https://colab.research.google.com/github/zaidalyafeai/Notebooks/blob/master/Eager_Execution_Enabled.ipynb

Your company runs a big retail website. You develop many ML models for all the business activities. You migrated to Google Cloud. Your models are developed with PyTorch, TensorFlow, and BigQueryML. You also use BigTable and CloudSQL, and Cloud Storage, of course. You need to use input tabular data in CSV format. You are working with Vertex AI. How do you manage them in the best way (pick 2)? A. Vertex AI manage any CSV automatically, no operations needed B. You have to setup an header and column names may have only alphanumeric character and underscore C. Vertex AI cannot handle CSV files D. Delimiter must be a comma E. You can import only a file max 10GB

B. You have to setup an header and column names may have only alphanumeric character and underscore D. Delimiter must be a comma Vertex AI manages CSV files automatically. But you need to have headers only with alphanumeric characters and underscores with commas as delimiters. So, A and C are wrong. You can import multiple files, each one max 10GB. So, E is wrong. For any further detail: https://cloud.google.com/vertex-ai/docs/datasets/prepare-tabular#csv https://cloud.google.com/vertex-ai/docs/datasets/datasets

In your company, you train and deploy several ML models with Tensorflow. You use on-prem servers, but you often find it challenging to manage the most expensive training. Checking and updating models create additional difficulties. You are undecided whether to use Vertex Pipelines and Kubeflow Pipelines. You wonder if starting from Kubeflow, you can later switch to a more automated and managed system like Vertex AI. Which of these answers are correct (pick 4)? A. Kubeflow pipelines and Vertex Pipelines are incompatible B. You may use Kubeflow Pipelines written with DSL in Vertex Ai C. Kubeflow pipelines work only in GCP D. Kubeflow pipelines may work in any environment E. Kubeflow pipelines may use Kubernetes persistent volume claims (PVC) F. Vertex Pipelines can use Cloud Storage FUSE

B. You may use Kubeflow Pipelines written with DSL in Vertex Ai D. Kubeflow pipelines may work in any environment E. Kubeflow pipelines may use Kubernetes persistent volume claims (PVC) F. Vertex Pipelines can use Cloud Storage FUSE SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. Vertex AI Pipelines is a managed service in GCP. Kubeflow Pipelines is an open-source tool based on Kubernetes and Tensorflow for any environment. So C is wrong. Vertex AI support code written with Kubeflow Pipelines SDK v2 domain-specific language (DSL). So A is wrong. Like any workflow in Kubernetes, access to persistent data is performed with Volumes and Volume Claims. Vertex Pipelines can use Cloud Storage FUSE. So Vertex AI can leverage Cloud Storage buckets like file systems on Linux or macOS. For any further detail: https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline#compare https://cloud.google.com/storage/docs/gcs-fuse https://cloud.google.com/vertex-ai

You are a Data Scientist and working on a project with PyTorch. You need to save the model you are working on because you have to cope with an urgency. You, therefore, need to resume your work later. What command will you use for this operation? A. callbacks.ModelCheckpoint (keras) B. save C. model.fit D. train.Checkpoint TF

B. save SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. PyTorch is a popular library for deep learning that you can leverage using GPUs and CPUs. When you have to save a model for resuming training, you have to record both models and updated buffers and parameters in a checkpoint. A checkpoint is an intermediate dump of a model's entire internal state (its weights, current learning rate, etc.) so that the framework can resume the training from that very point. In other words, you train for a few iterations, then evaluate the model, checkpoint it, then fit some more. When you are done, save the model and deploy it as normal.To save checkpoints, you must use torch.save() to serialize the dictionary of all your state data, In order to reload, the command is torch.load(). A is wrong because ModelCheckpoint is used with keras. C is wrong because model.fit is used to fit a model in scikit-learn best. D is wrong because train.Checkpoint is used with Tensorflow. For any further detail: https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html https://towardsdatascience.com/ml-design-pattern-2-checkpoints-e6ca25a4c5fe

Your team needs to create a model for managing security in restricted areas of campus. Everything that happens in these areas is filmed. Instead of having a physical surveillance service, the videos must be managed by a model capable of intercepting unauthorized people and vehicles, especially at particular times. What are the GCP services that allow you to achieve all this with minimal effort? A. AI Infrastructure B. Cloud Video Intelligence AI C. AutoML Video Intelligence Classification D. Vision AI

C. AutoML Video Intelligence Classification SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. AutoML Video Intelligence is a service that allows you to customize the pre-trained Video intelligence GCP system according to your specific needs. In particular, AutoML Video Intelligence Object Tracking allows you to identify and locate particular entities of interest to you with your specific tags. A is wrong because AI Infrastructure allows you to manage hardware configurations for ML systems and, in particular, the processors used to accelerate machine learning workloads. B is wrong because Cloud Video Intelligence AI is a pre-configured and ready-to-use service, therefore not configurable for specific needs. D is wrong because Vision A is for images and not video. For any further detail: https://cloud.google.com/video-intelligence/automl/object-tracking/docs/index-object-tracking https://cloud.google.com/video-intelligence/automl/docs/beginners-guide

You are working on a new model together with your client, a large financial institution. The data you are dealing with contains PII (Personally Identifiable Information) contents. You face 2 different sets of problems: Transform data to hide personal information you don't need Protect your work environment because certain combinations of personal data are useful for your model and you need to keep them What are the solutions offered by GCP that it is advisable to use (choose 2)? A. Cloud Armor security policies B. Cloud HSM C. Cloud Data Loss Prevention D. Network firewall rules E. VPC service-controls

C. Cloud Data Loss Prevention E. VPC service-controls Cloud Data Loss Prevention is a service that can discover, conceal and mask personal information in data. VPC service-controls is a service that lets you build a security perimeter that is not accessible from outside; in this way data exfiltration dangers are greatly mitigated. It is a network security service that helps protect data in a Virtual Private Cloud (VPC) in a multi-tenant environment. Option A is wrong because Cloud Armor is a security service at the edge against attacks like DDoS. Option B is wrong because Cloud HSM is a service for cryptography based on special and certified hardware and software Option D is wrong because Network firewall rules are a set of rules that deny or block network traffic in a VPC, just network rules. VPC service-controls lets you define control at a more granular level, with context-aware access, suitable for multi-tenant environments like this one. For any further detail: https://cloud.google.com/vpc-service-controlshttps://cloud.google.com/dlp

You are a Data Scientist. You are going to develop an ML model with Python. Your company adopted GCP and Vertex AI, but you need to work with your developing tools. What are you going to do (pick 2)? A. Use an Emulator B. Work with the Console C. Create a service account key D. Set the environment variable named GOOGLE_APPLICATION_CREDENTIALS

C. Create a service account key D. Set the environment variable named GOOGLE_APPLICATION_CREDENTIALS Client libraries are used by developers for calling the Vertex AI API in their code. The client libraries reduce effort and boilerplate code. The correct procedure is: Enable the Vertex AI API or AI Platform Training & Prediction and Compute Engine APIs. Enable the APIs Create/Use a Service account and a service account key Set the environment variable named GOOGLE_APPLICATION_CREDENTIALS A is wrong because there isn't a specific Emulator for using the SDK B is wrong because it was asked to create a local work environment. For any further detail: Installing the Vertex AI client libraries https://cloud.google.com/ai-platform/training/docs/python-client-library

You are working on a linear regression model with data stored in BigQuery. You have a view with many columns. You want to make some simplifications for your work and avoid overfitting. You are planning to use regularization. You are working with Bigquery ML and preparing the query for model training. You need an SQL statement that allows you to have all fields in the view apart from the label. Which one do you choose? A. ROLLUP B. UNNEST C. EXCEPT D. LAG

C. EXCEPT SQL and Bigquery are powerful tools for querying and manipulating structured data. EXCEPT gives all rows or fields on the left side except the one coming from the right side of the query. Example:S ELECT * EXCEPT(mylabel) myvalue AS label A is wrong because ROLLUP is a group function for subtotals. B is wrong because UNNEST gives the elements of a structured file. D is wrong because LAG returns the field value on a preceding row. For any further detail: https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-hyperparameter-tuning https://cloud.google.com/bigquery-ml/docs/hyperparameter-tuning-tutorial

You and your team are working for a large consulting firm. You are preparing an NLP ML model to classify customer support needs and to assess the degree of satisfaction. The texts of the various communications are stored in different storage. What types of storage should you avoid in the managed environment of GCP ML, such as Vertex AI and AI Platform (pick 2)? A. Cloud Storage B. BigQuery C. Filestore D. Block Storage

C. Filestore D. Block Storage Google advises avoiding data storage for ML in block storage, like persistent disks or NAS like Filestore. They are more difficult to manage than Cloud Storage or BigQuery. Therefore A and B are wrong. Likewise, it is strongly discouraged to read data directly from databases such as Cloud SQL. So, it is strongly recommended to store data in BigQuery and Cloud Storage. Similarly, avoid reading data directly from databases like Cloud SQL. For any further detail: https://cloud.google.com/architecture/ml-on-gcp-best-practices#avoid-storing-data-in-block-storage Cloud Storage documentation https://cloud.google.com/bigquery/docs/loading-data https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-launches-vertex-ai-unified-platform-for-mlops

You are a junior Data Scientist and working on a deep neural network model with Tensorflow to optimize the level of customer satisfaction for after-sales services to create greater client loyalty. You are struggling with your model (learning rates, hidden layers and nodes selection) for optimizing processing and letting it converge in the fastest way. What is your problem in ML language? A. Cross Validation B. Regularization C. Hyperparameter tuning D. drift detection management

C. Hyperparameter tuning ML training Manages three main data categories: Training data is also called examples or records. It is the main input for model configuration and, in supervised learning, presents labels, that are the correct answers based on past experience. Input data is used to build the model but will not be part of the model. Parameters are instead the variables to be found to solve the riddle. They are part of the final model and they make the difference among similar models of the same type. Hyperparameters are configuration variables that influence the training process itself: Learning rate, hidden layers number, number of epochs, regularization, batch size are all examples of hyperparameters. Hyperparameters tuning is made during the training job and used to be a manual and tedious process, made by running multiple trials with different values. The time required to train and test a model can depend upon the choice of its hyperparameters. With Vertex AI you just need to prepare a simple YAML configuration without coding. A is wrong because Cross Validation is related to the input data organization for training, test and validation. B is wrong because Regularization is related to feature management and overfitting. D is wrong because drift management is when data distribution changes and you have to adjust the model. For any further detail: https://cloud.google.com/vertex-ai/docs/training/hyperparameter-tuning-overview https://cloud.google.com/blog/products/ai-machine-learning/hyperparameter-tuning-cloud-machine-learning-engine-using-bayesian-optimization

You are a junior Data Scientist working on a logistic regression model to break down customer text messages into two categories: important / urgent and unimportant / non-urgent. You want to find a metric that allows you to evaluate your model for how well it separates the two classes. You are interested in finding a method that is scale invariant and classification threshold invariant. Which of the following is the optimal methodology? A. Log Loss B. One-hot encoding C. ROC- AUC D. Mean Square Error E. Mean Absolute Error

C. ROC- AUC SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. The ROC curve (receiver operating characteristic curve) is a graph showing the behavior of the model with positive guesses at different classification thresholds. It plots and relates each others two different values: TPR: true positives / all actual positives FPR: false positives / all actual negatives The AUC (Area Under the Curve) index is the area under the ROC curve and indicates the capability of a binary classifier to discriminate between two categories. Being a probability, it is always a value between 0 and 1. Hence it is a scale invariant. It provides divisibility between classes. So it is independent of the chosen threshold value; in other words, it is threshold-invariant. When it is equal, it is 0.5 indicating that the model randomly foresees the division between two classes, similar to what happens with heads and tails when tossing coins. A is wrong because Log Loss is a loss function used especially for logistic regression; it measures loss. So it is highly dependent on threshold values. B is wrong because One-hot encoding is a method used in feature engineering for obtaining better regularization and independence. D is wrong because Mean Square Error is the most frequently used loss function used for linear regression. It takes the square of the difference between predictions and real values. E is wrong because Mean Absolute Error is a loss function, too. It takes the absolute value of the difference between predictions and actual outcomes. For any further detail: https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc

Your team works for an international company with Google Cloud. You develop, train and deploy several ML models with Tensorflow. You use many tools and techniques and want to make your work leaner, faster, and more efficient. You would like engineer-to-engineer assistance from both Google Cloud and Google's TensorFlow teams. How is it possible? Which service? A. AI Platform B. Kubeflow C. Tensorflow Enterprise D. TFX

C. Tensorflow Enterprise The TensorFlow Enterprise is a distribution of the open-source platform for ML, linked to specific versions of TensorFlow, tailored for enterprise customers. It is free but only for big enterprises with a lot of services in GCP. It is prepackaged and optimized for usage with containers and VMs. It works in Google Cloud, from VM images to managed services like GKE and Vertex AI. The TensorFlow Enterprise library is integrated into the following products: Deep Learning VM Images Deep Learning Containers Notebooks AI Platform/Vertex AITraining It is ready for automatic provisioning and scaling with any kind of processor. It has a premium level of support from Google. A is wrong because AI Platform is a managed service without the kind of support required. B and D are wrong because they are open-source libraries with standard support from the community. For any further detail: https://cloud.google.com/tensorflow-enterprise/docs/overview

Your team is working with a great number of ML projects, especially with Tensorflow.You have to prepare a demo for the Manager and Stakeholders. You are certain that they will ask you about the understanding of the classification and regression mechanism. You'd like to show them an interactive demo with some cool interference. Which of these tools is best for all of this? A. Tensorboard B. Tableau C. What-If Tool D. Looker E. LIT

C. What-If Tool SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. The What-If Tool (WIT) is an open-source tool that lets you visually understand classification and regression ML models. It lets you see data points distributions with different shapes and colors and interactively try new inferences. Moreover, it shows which features affect your model the most, together with many other characteristics. All without code. A is wrong because Tensorboard provides visualization and tooling needed for experiments, not for explaining inference. You can access the What-If Tool from Tensorboard. B and D are wrong because Tableau and Looker are graphical tools for data reporting. E is wrong because LIT is for NLP models. For any further detail: https://www.tensorflow.org/tensorboard/what_if_tool

You work as a Data Scientist in a Startup. You want to create an optimized input pipeline to increase the performance of training sessions, avoiding GPUs and TPUs as much as possible because they are expensive. Which technique or algorithm do you think is best to use? A. Caching B. Prefetching C. Parallelizing data D. All of the above

D. All of the above GPUs and TPUs can greatly increase the performance of training sessions, but an optimized input pipeline is likewise important. The tf.data API provides these functions: Prefetching tf.data.Dataset.prefetch: while the execution of a training pass, the data for the next pass is read. Parallelizing data transformation The tf.data API offers the map function for the tf.data.Dataset.map transformation. This transformation can be parallelized across multiple cores with the num_parallel_calls option. Sequential and parallel interleave tf.data.Dataset.interleave offers the possibility of interleaving and allowing multiple datasets to execute in parallel (num_parallel_calls). Cachingtf.data. Dataset.cache allows you to cache a dataset increasing performance. For any further detail: https://www.tensorflow.org/guide/data_performance

Your team is preparing a multiclass logistic regression model with tabular data. The environment is Vertex AI with Auto ML, and your data is stored in a CSV file in Cloud Storage. AutoML can perform transformations on the data to make the most of it. Which of the following types of transformations are you not allowed, based on your requirements? A. Categorical B. Text C. Timestamp D. Array E. Number

D. Array With complex data like Arrays and Structs, transformations are available only by using BigQuery, which supports them natively. All the other kinds of data are also supported for CSV files, as stated in the referred documentation. For any further detail: https://cloud.google.com/vertex-ai/docs/datasets/data-types-tabular https://cloud.google.com/vertex-ai/docs/datasets/data-types-tabular#compound_data_types

You have just started working as a junior Data Scientist in a Startup. You are involved in several projects with Python and Tensorflow in Vertex AI. You are starting to get interested in MLOps and are trying to understand the different processes involved. You have prepared a checklist, but inside there is a service that has nothing to do with MLOps. Which one? A. CI/CD B. Source Control Tools C. Data Pipelines D. CDN E. Artifact Registry, Container Registry

D. CDN SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. Cloud CDN is the service that caches and delivers static content from the closest locations (edge locations) to customers to accelerate web and mobile applications. This is a very important service for the Cloud but out of scope for MLOps. MLOps covers all processes related to ML models; experimentation, preparation, testing, deployment and above all continuous integration and delivery. The MLOps environment is designed to provide (some of) the following: Environment for testing and experimentation Source control, like Github CI/CD Continuous integration/continuous delivery Container registry: custom Docker images management Feature Stores Training service sMetadata repository Artifacts repository ML pipelines orchestrators Data warehouse/ storage and scalable data processing for batch and streaming data. Prediction service both batch and online. So, all the other answers describe MLOps functionalities. For any further detail: https://cloud.google.com/architecture/setting-up-mlops-with-composer-and-mlflow https://mlflow.org/ https://cloud.google.com/composer/docs

Your company does not have a great ML experience. Therefore they want to start with a service that is as smooth, simple and managed as possible. The idea is to use BigQuery ML. Therefore you are considering whether it can cover all the functionality you need. Various projects start with the design and set up various models using various techniques and algorithms in your company. Which of these techniques/algorithms is not supported by BigQuery ML? A. Wide-and-Deep DNN models B. ARIMA C. Ensamble Boosted Model D. CNN

D. CNN The convolutional neural network (CNN) is a type of artificial neural network extensively used especially for image recognition and classification. It uses the convolutional layers, that is, the reworking of sets of pixels by running filters on the input pixels. It is not supported because it is specialized for images. The other answers are wrong because they are all supported by BigQuery ML. Following the list of the current models and techniques. Linear regression Binary logistic regression Multiclass logistic regression K-means clustering Matrix Factorization Time series Boosted Tree Deep Neural Network (DNN) AutoML Tables TablesTensorFlow model importing. Autoencoder MODEL_TYPE = { 'LINEAR_REG' | 'LOGISTIC_REG' | 'KMEANS' | 'PCA' |'MATRIX_FACTORIZATION' | 'AUTOENCODER' | 'TENSORFLOW' | 'AUTOML_REGRESSOR' |'AUTOML_CLASSIFIER' | 'BOOSTED_TREE_CLASSIFIER' | 'BOOSTED_TREE_REGRESSOR' |'DNN_CLASSIFIER' | 'DNN_REGRESSOR' | 'DNN_LINEAR_COMBINED_CLASSIFIER' |'DNN_LINEAR_COMBINED_REGRESSOR' | 'ARIMA_PLUS' } For any further detail: https://cloud.google.com/bigquery-ml/docs/introduction

Your company runs a big retail website. You develop many ML models for all the business activities. You migrated to Google Cloud when you were using Vertex AI. Your models are developed with PyTorch, TensorFlow and BigQueryML. You also use BigTable and CloudSQL, and of course Cloud Storage. In many cases, the same data is used for multiple models and projects. And your data is continuously updated, sometimes in streaming mode. Which is the best way to organize the input data? A. Dataflow per Data Transformation sia in streaming che batch B. CSV C. BigQuery D. Datasets E. BigTable

D. Datasets SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. Vertex AI integrates the following elements: Datasets: data, metadata and annotations, structured or unstructured. For all kinds of libraries. Training pipelines to build an ML model ML models, imported or created in the environment Endpoints for inference Because Datasets are suitable for all kinds of libraries, it is a useful abstraction for this requirement. A is wrong because Dataflow deals with Data Pipelines and is not a way to access and organize data. B is wrong because CSV is just a data format, and an ML Dataset is made with data and metadata dealing with many different formats. C and E are wrong because BigQuery and BigTable are just one of the ways in which you can store data. Moreover, BigTable is not currently supported for data store for Vertex datasets. For any further detail: https://cloud.google.com/vertex-ai/docs/datasets/datasets https://cloud.google.com/vertex-ai/docs/training/using-managed-datasets https://codelabs.developers.google.com/codelabs/vertex-ai-custom-code-training

You work for an important Banking group. The purpose of your current project is the automatic and smart acquisition of data from documents and modules of different types. You work on big datasets with a lot of private information that cannot be distributed and disclosed. You are asked to replace sensitive data with specific surrogate characters. Which of the following techniques do you think is best to use? A. Format-preserving encryption B. k-anonymity C. Replacement D. Masking

D. Masking SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. Masking replaces sensitive values with a given surrogate character, like hash (#) or asterisk (*). Format-preserving encryption (FPE) encrypts in the same format as the plaintext data. For example, a 16-digit credit card number becomes another 16-digit number. k-anonymity is a way to anonymize data in such a way that it is impossible to identify person-specific information. Still, you maintain all the information contained in the record. Replacement just substitutes a sensitive element with a specified value. For any further detail: https://en.wikipedia.org/wiki/Data_maskinghttps://en.wikipedia.org/wiki/K-anonymityhttps://www.mysql.com/it/products/enterprise/masking.html NOTE FROM MIKE: from the example picture it looks like the ID field is actually

You have a customer ranking ML model in production for an e-commerce site; the model used to work very well. You use GCP managed services, specifically AI Platform and Vertex AI. Suddenly there is a sensible degradation in the quality of the inferences. You perform various checks, but the model seems to be perfectly fine. Which of the following methods could you use to avoid such problems? A. Regularization against overfitting B. Feature Store C. Hyperparameter tuning D. Model Monitoring

D. Model Monitoring SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. Input data to ML models may change over time. This can be a serious problem, as performance will obviously degrade. To avoid this, it is necessary to monitor the quality of the forecasts continuously. Vertex Model Monitoring has been designed just for this. The main goal is to cope with feature skew and drift detection. For skew detection, it looks at and compares the feature's values distribution in the training data. For drift detection, it looks at and compares the feature's values distribution in the production data. It uses two main methods: Jensen-Shannon divergence for numerical features. L-infinity distance for categorical features. Options A and C are wrong because the model is OK. So both Regularization against overfitting and Hyperparameters are tuned. Option B is wrong because Feature Store is suitable for feature organization, not for data skew prevention. For any further detail: https://cloud.google.com/vertex-ai/docs/model-monitoring/overview

Your company runs an e-commerce site. You manage several deep learning models with Tensorflow that process Analytics-360 data, and they have been in production for some time. The modeling is made essentially with customers and orders data. You need to classify many business outcomes. Your Manager realized that different teams in different projects used to deal with the same features based on the same data differently. The problem arose when models drifted unexpectedly over time. You have to advise your Manager on the best strategy. Which of the following do you choose (pick 2)? A. Each group classifies their features and sends them to the other teams B. For each model of the different features store them in Cloud Storage C. Search for features in Cloud Storage and reuse them D. Search the Vertex Feature Store for features that are the same E. Insert or update the features in Vertex Feature Store accordingly

D. Search the Vertex Feature Store for features that are the same E. Insert or update the features in Vertex Feature Store accordingly SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. The best strategy is to use the Vertex Feature Store. Vertex Feature Store is a service to organize and store ML features through a central store. This allows you to share and optimize ML features important for the specific environment and to reuse them at any time. Here is the typical procedure for using the Feature Store: Check out the Vertex Feature Store for Features that you can reuse or use as a template. If you don't find a Feature that fits perfectly, create or modify an existing one. Update or insert features of your work in the Vertex Feature Store. Use them in training work. Sets up a periodic job to generate feature vocabulary data and optionally updates the Vertex Feature Store A is wrong because it creates confusion and doesn't solve the problem. B and C are wrong because they will not avoid feature definition overlapping. Cloud Storage is not enough for identifying different features. For any further detail: https://developers.google.com/machine-learning/crash-course/representation/feature-engineering https://cloud.google.com/architecture/ml-on-gcp-best-practices#use-vertex-feature-store-with-structured-data https://cloud.google.com/blog/topics/developers-practitioners/kickstart-your-organizations-ml-application-development-flywheel-vertex-feature-store

You work for an industrial company that wants to improve its quality system. It has developed its own deep neural network model with Tensorflow to identify the semi-finished products to be discarded with images taken from the production lines in the various production phases. You work on this project. You need to deal with input data that is binary (images) and made by CSV files. You are looking for the most convenient way to import and manage this type of data. Which is the best solution that you can adopt? A. tf.RaggedTensor B. Tf.quantization C. tf.train.Feature D. tf.TFRecordReader

D. tf.TFRecordReader SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. The TFRecord format is efficient for storing a sequence of binary and not-binary records using Protocol buffers for serialization of structured data. A is wrong because RaggedTensor is a tensor with ragged dimensions, that is with different lengths like this: [[6, 4, 7, 4], [], [8, 12, 5], [9], []] B is wrong because quantization is aimed to reduce CPU and TPU GCP latency, processing, and power. C is wrong because tf.train is a feature for Graph-based Neural Structured model training For any further detail: https://www.tensorflow.org/tutorials/load_data/tfrecord

Your team is working with a great number of ML projects, especially with Tensorflow .You recently prepared an NLP model that works well and is about to be rolled out in production. You have to prepare a demo for the Manager and Stakeholders for your new system of text and sentiment interpretation. You are certain that they will ask you for explanations and understanding about how a software may capture human feelings. You'd like to show them an interactive demo with some cool interference. Which of these tools is best for all of this? A. Tensorboard B. Tableau C. What-If Tool D. Looker E. LIT

E. LIT SEE IMAGE IN "Images - PRACTICE EXAM 3- Web-found ML Eng Questions" doc in the "Images for Quizlet Questions" folder. The Language Interpretability Tool (LIT) is an open-source tool developed specifically to explain and visualize NLP natural language processing models. It is similar to the What-If tool, which instead targets classification and regression models with structured data. It offers visual explanations of the model's predictions and analysis with metrics, tests and validations. A is wrong because Tensorboard provides visualization and tooling needed for experiments, not for explaining inference. You can access the What-If Tool from Tensorboard. B and D are wrong because Tableau and Looker are graphical tools for data reporting. C is wrong because What-If Tool is for classification and regression models with structured data. For any further detail: https://pair-code.github.io/lit/ https://www.tensorflow.org/tensorboard/what_if_tool


Kaugnay na mga set ng pag-aaral

Chapter 10 Use of Life Insurance

View Set

Chapter 5 - Running Water and Groundwater

View Set

National Unit 9 Practice of Real Estate

View Set

Lab Practices/ Microscopes + Scientific Method

View Set

Management Information Systems Midterm #1

View Set