Azure Data Scientists Associate Knowledge Check

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

When a data scientist enables MLflow autologging, where can all model assets be found? a) In the model folder under Outputs + logs. b) In the outputs folder under Outputs + logs. c) In the model folder under Metrics.

In the model folder under Outputs + logs. - Model assets like the model pickle file will be stored in the model folder under Outputs + logs.

You plan to use hyperparameter tuning to find optimal discrete values for a set of hyperparameters. You want to try every possible combination of a set of specified discrete values. Which kind of sampling should you use? a) Random sampling b) Grid sampling c) Bayesian sampling

Grid sampling - You should use grid sampling to try every combination of discrete hyperparameter values.

A data scientist wants to experiment by training a machine learning model and tracking it with Azure Machine Learning. Which tool should be used to train the model by running a script from their preferred environment? a) The Azure Machine Learning studio b) The Python SDK c) The Azure CLI

The Python SDK

A data scientist trains a regression model and wants to track the model's performance by storing the Root Mean Squared Error (RMSE) with the experiment run. Which method can be used to log the RMSE? a) mlflow.log_param() b) mlflow.log_artifact() c) mlflow.log_metric()

mlflow.log_metric() - Use mlflow.log_metric() to log a metric like the RMSE.

A data scientist wants to use automated machine learning to find the model with the best AUC_weighted metric. Which parameter of the classification function should be configured? a) task='AUC_weighted' b) target_column_name='AUC_weighted' c) primary_metric='AUC_weighted'

primary_metric='AUC_weighted' - Set the primary metric to the performance score for which you want to optimize the model.

A data scientist wants to experiment by training a machine learning model and tracking it with Azure Machine Learning. Which tool should be used to train the model by running a script from their preferred environment? a) The Azure Machine Learning studio b) The Python SDK c) The Azure CLI

The Python SDK - The data scientist is likely to already be familiar with Python and can easily use the Python SDK to run the training script.

You have multiple models deployed to a batch endpoint. You invoke the endpoint without indicating which model you want to use. Which deployed model will do the actual batch scoring? a) The latest version of the deployed model. b) The latest deployed model. c) The default deployed model.

The default deployed model. - The default deployment will be used to do the actual batch scoring when the endpoint is invoked.

You're using hyperparameter tuning to train an optimal model based on a target metric named "AUC". What should you do in your training script? a) Import the logging package and use a logging.info() statement to log the AUC. b) Include a print() statement to write the AUC value to the standard output stream. c) Use a mlflow.log_metric() statement to log the AUC value. Check your answers

Use a mlflow.log_metric() statement to log the AUC value. - Your script needs to use MLflow to log the primary metric to the run using the same name as specified in the sweep job.

You are creating a batch endpoint that you want to use to predict new values for a large volume of data files. You want the pipeline to run the scoring script on multiple nodes and collate the results. What output action should you choose for the deployment? a) summary_only b) append_row c) concurrency

append_row - You should use append_row to append each prediction to one output file.

A data scientist has preprocessed the training data and wants to use automated machine learning to quickly iterate through various algorithms. The data shouldn't be changed. What should be the featurization mode to train a model without letting automated machine learning make changes to the data? a) auto b) custom c) off

off - If you don't want the data to be preprocessed at all, set featurization to be off.

You're creating a pipeline that includes two steps. Step 1 prepares some data, and step 2 uses the preprocessed data to train a model. Which option should you use as input to the second step to train the model? a) pipeline_job_input b) prep_data.outputs.output_data c) train_model.outputs.model_output

prep_data.outputs.output_data - prep_data.outputs.output_data is the output of the step that prepares the data.

A data scientist wants to read data stored in a publicly available GitHub repository. The data will be read in a Jupyter notebook in the Azure Machine Learning workspace for some quick experimentation. Which protocol should be used to read the data in the notebook? a) azureml b) http(s) c) abfs(s)

http(s) - This protocol is used when accessing data stored in a publicly available http(s) location.

You're deploying a model as a real-time inferencing service. What functions must the scoring script for the deployment include? a) main() and score() b) base() and train() c) init() and run()

init() and run() - You must implement init and run functions in the entry (scoring) script.

You've trained a model using the Python SDK for Azure Machine Learning. You want to deploy the model to get real-time predictions. You want to manage the underlying infrastructure used by the endpoint. What kind of endpoint should you create? a) A managed online endpoint. b_ A batch endpoint. c) A Kubernetes online endpoint.

A Kubernetes online endpoint. - You should use Kubernetes online endpoint if you want to manage the underlying Kubernetes clusters.

A data scientist wants to run a script as a command job to train a PyTorch model, setting the batch size and learning rate hyperparameters to specified values each time the job runs. What should be done by the data scientist? a) Create multiple script files - one for each batch size and learning rate combination you want to use. b) Set the batch size and learning rate properties of the command job before submitting the job. c) Add arguments for batch size and learning rate to the script, and set them in the command property of the command job.

Add arguments for batch size and learning rate to the script, and set them in the command property of the command job. - To use different values each time, define arguments in the script and pass them using the arguments parameter of the command job.

A data scientist needs access to the Azure Machine Learning workspace to run a script as a job. Which role should be used to give the data scientist the necessary access to the workspace? a) Reader b) AzureML Data Scientist c) AzureML Compute Operator

AzureML Data Scientist - AnAzureML Data Scientist is allowed to submit a job.

The data scientist wants to run a single script to train a model. What type of job is the best fit to run a single script? a) Command b) Pipeline c) Sweep

Command - A command Job is used to run a single script.

A data scientist has trained a model in a notebook. The model should be retrained every week on new data. What should the data scientist do to make the code production-ready? a) Copy and paste the code from each cell to a script. b) Convert the code to one function in a script that reads the data and trains the model. c) Convert the code to multiple functions in a script that read the data and train the model.

Convert the code to multiple functions in a script that read the data and train the model. - A script consisting of multiple functions is best to use for production workloads.

What type of data asset should someone create when the schema changes frequently and the data is used in many different jobs? a) URI file b) URI folder c) MLTable

MLTable - MLTable is ideal when the schema changes frequently. Then, you only need to make changes in one location instead of multiple.

You've built a pipeline that you want to run every week. You want to take a simple approach to creating a schedule. Which class can you use to create the schedule that runs once per week? a) RecurrencePattern b) JobSchedule c) RecurrenceTrigger

RecurrenceTrigger - You need the RecurrenceTrigger class to create a schedule that runs at a regular interval.

A machine learning model to predict the sales forecast has been developed. Every week, new sales data is ingested, and model needs to be retrained on the newest data before generating the new forecast. Which tool should be used to retrain the model every week? a) The Azure Machine Learning studio b) The Python SDK c) The Azure CLI

The Azure CLI

A machine learning model to predict the sales forecast has been developed. Every week, new sales data is ingested and the model needs to be retrained on the newest data before generating the new forecast. Which tool should be used to retrain the model every week? a) The Azure Machine Learning studio b) The Python SDK c) The Azure CLI

The Azure CLI - The Azure CLI is designed for automating tasks. By using YAML files to define how the model must be trained, the machine learning tasks will be repeatable, consistent, and reliable.


Kaugnay na mga set ng pag-aaral

BIOL 101: Exam 1: Chapter 2 Review Questions

View Set

Chapter 11: Small Business Pricing, Distribution, and Location

View Set

Peds Exam 3: Neuro Study Questions

View Set