Azure extra study

¡Supera tus tareas y exámenes ahora con Quizwiz!

Specifying the Primary Metric

- primary_metric is one of the most important settings to specify, it is the target performance metric for which the optimal model will be determined. - Azure ML supports a set of named metrics for each kind of task. To get a list of metrics available for a certain task type you can use the 'get_primary_metrics' function shown below: from azureml.train.automl.utilities import get_primary_metrics get_primary_metrics('classification')

Query Logs in Application Insights

- To analyze captured log data you can use the Log Analytics query interface for the App Insights in the Azure portal. - Interface supports SQL like query syntax that you can use to extract fields of logged data including custom dimensions created by your Azure ML service. - Following query returns the timestamp and customDimensions.Content fields from log traces that have a message field value "STDOUT" (this indicates the data is in the standard output log) and a customDimenstions.["ServiceName"] field of 'my-svc': traces |where message == "STDOUT" and customDimensions.["Service Name"] = "my-svc" | project timestamp, customDimensions.Content - This query returns the logged data as a table

Submitting an Auto ML Experiment

- Submit an auto ML experiment like any SDK experiment: from azureml.core.experiment import Experiment automl_experiment = Experiment(ws, 'automl_experiment') automl_run = automl_experiment.submit(automl_config)

Retrieving Best Model & It's Run

- Can easily ID in Azure ML Studio and download or deploy the model it generated. - To do this in SDK: best_run, fitted_model = automl_run.get_output() best_run_metrics = best_run.get_metrics() for metric_name in best_run_metrics: metric = best_run_metrics[metric_name] print(metric_name, metric)

While creating a linear regression model, what should you use to determine how closely the data fits the regression line?

- Coefficient of determination (R^2) - This represents the predictive power of the model as a value between 0 and 1. - 0 means the model is random, 1 means it is a perfect fit. - Use caution though relying on this, low values can be normal and high values can be suspect.

Batch Inference

- Generates predictions on a high volume of instances WITHOUT the need of instant responses. - Predictions are stored and available for further usage. - The process of generating predictions on a batch of observations. - Batch inference pipeline accepts data through Dataset - Generally used for long running tasks on large volumes of data and its used to apply a predictive model to multiple cases asynchronously (not at the same time). - In Azure ML, you can implement batch inferencing solutions by creating a pipeline that includes a step to read the input data, load a registered model, predict labels, and write the results as its output. - Use ParallelRunStep to read batches of data and write output to a PipelineData reference. - Can also use "append_now" to ensure all instances of this step will be written to the same single output file called "parallel_run_step.txt"

K-means algorithm

- K means algorithm is a clustering algorithm - Tries to divide dataset into non-overlapping Kpre-defined distinct subgroups (clusters) where each data point belongs to only one group - It tries to make the data points within the cluster as similar as possible while also keeping the clusters separate and as far apart as possible - Assigns data points to a cluster so the sum of the squared difference between the data points and the cluster's centroid is at a minimum. - The less variation between the clusters, the more similar the data points are within the cluster.

Local Feature Importance

- Local feature importance measures the influence of each feature value for a specific individual prediction. - Local importance can vary from global importance and won't always be the same

Grid Sampling

_ Grid sampling can only be used when all hyperparameters are discrete - Grid sampling is used to try every possible combination of parameters in the search space - EX: In this code example, grid sampling is used to try every possible combination of discrete batch_size and learning_rate value: from azureml.train.hyperdrive import GridParameterSampling, choice param_space = { '--batch_size': choice(16, 32, 64), '--learning_rate': choice(0.01, 0.1, 1.0) } param_sampling = GridParameterSampling(param_space)

Running Auto ML Experiments

- To run an auto ML experiment you can use either SDK or the user interface in the Azure ML studio - SDK gives you greater flexibility and you can set experiment options using the AutoMLConfig like so: from azureml.train.automl import AutoMLConfig automl_run_config = RunConfiguration(framework='python') automl_config = AutoMLConfig(name='Automated ML Experiment', task='classification', primary_metric = 'AUC_weighted', compute_target=aml_compute, training_data = train_dataset, validation_data = test_dataset, label_column_name='Label', featurization='auto', iterations=12, max_concurrent_iterations=4)

You create a binary classification model and need to evaluate it's performance, which two metrics should you use?

- Use precision and accurcy

CPU vs GPU

- GPU should be used over CPU due to it's significant speed, however GPU costs more - For things with less demand like inferencing, CPU is better and more cost efficient, unless the inference speed is "bottleneck" or would be a point of congestion and cause things to slow, use GPU.

MimicExplainer

- A MimicExplainer creates a global surrogate model that measures your trained model and can be used to generate explanations. This explainable model MUST have the same architecture as your trained model (ex: linear or tree-based). - Mimic explainer is based on the idea of training global surrogate models to mimic blackbox models. A global surrogate model is an intrinsically interpretable model that is trained to approximate the predictions of any black box model as accurately as possible. Data scientists can interpret the surrogate model to draw conclusions about the black box model. - MimicExplainer code: # MimicExplainer from interpret.ext.blackbox import MimicExplainer from interpret.ext.glassbox import DecisionTreeExplainableModel mim_explainer = MimicExplainer(model=loan_model, initialization_examples=X_test, explainable_model = DecisionTreeExplainableModel, features=['loan_amount','income','age','marital_status'], classes=['reject', 'approve'])

PFIExplainer

- A Permutation Feature Importance explainer analyzes feature importance by shuffling feature values and measuring the impact on the prediction performance. - EX for a hypothetical model named 'loan_model': # PFIExplainer from interpret.ext.blackbox import PFIExplainer pfi_explainer = PFIExplainer(model = loan_model, features=['loan_amount','income','age','marital_status'], classes=['reject', 'approve']) - Global feature importance for a PFI Explainer: # PFIExplainer from interpret.ext.blackbox import PFIExplainer global_pfi_explanation = pfi_explainer.explain_global(X_train, y_train) global_pfi_feature_importance = global_pfi_explanation.get_feature_importance_dict()

TabularExplainer

- A TabularExplainer is used with Tabular datasets - Automatically chooses best SHAP explainer algorithms that are most appropriate for your model to use for explaination. - Tabular Explainer is a meta-explainer - Code to create one: # TabularExplainer from interpret.ext.blackbox import TabularExplainer tab_explainer = TabularExplainer(model=loan_model, initialization_examples=X_test, features=['loan_amount','income','age','marital_status'], classes=['reject', 'approve'])

Model Explainer

- A model explainer uses statistical techniques to calculate feature importance. This allows you to quantify the relative influence that each feature in the training dataset has on the label prediction. - Explainers work by evaluating a test data set of feature cases and the labels the model predicts for them - 3 kinds: 1. PFIExplainer (Permutation Feature Importance) 2. MimicExplainer 3. TabularExplainer

Preprocessing and Featurization

- Along with trying a selection of algorithms, automated ML can apply preprocessing transformations to your data which improves the performance of the model. - Scaling and normalization: Auto ML applies scaling and normalization to numeric data automatically which helps prevent any large-scale features from dominating training. During an auto ML experiment multiple scaling and normalization techniques will be applied. - Optional Featurization: You can choose to have automated machine learning apply preprocessing transformations such as: *Missing value imputation to eliminate nulls in the training dataset. *Categorical encoding to convert categorical features to numeric indicators. *Dropping high-cardinality features, such as record IDs. *Feature engineering (for example, deriving individual date parts from DateTime features) *Others...

Application Insights

- Application performance management service in Microsoft Azure let's you capture, study and analyze telemetry (collected remotely) data - To log telemetry data in application insights from Azure ML service you must have an application insights resource associated with your Azure ML workspace and you also need to configure your service to use it for telemetry logging. When you make a workspace you can choose an Azure Application Insights resource to go with it, if you don't select an existing one a new one will be created in the same resource group as your ws. - Can determine the App Insights resource associated with your workspace by viewing the Overview page of the ws in the Azure portal or by using the 'get_details()' method of a Workspace object like this: from azureml.core import Workspace ws = Workspace.from_config() ws.get_details()['applicatonInsights'] - When deploying a new real-time service you can enable the App Insights in the deployment configuration like this: dep_config = AciWebservice_configuration(cpu_cores = 1, memory_gb = 1, enable_app_insights=True) - If you want to enable the App Insights for an already deployed service you can change the deployment configuration for AKS based services in the Azure portal. You can also update any web service by using SDK like this: service = ws.webservice['my-svc'] service.update(enable_app_insights=True) - App Insights automatically captures any info written to the standard output and error logs and provides a query to view data in these logs

Specifying Data for Auto ML Experiment

- Auto ML is designed to just have you bring your data and have Azure ML figure out how to best train the model from it. - When using Azure ML Studio you can create and select an Azure ML dataset to be used in your experiment. - When using SDK you can submit data by: 1. Specifying a dataset or dataframe of training data that includes features and the label to be predicted. 2. Optionally, specifying 2nd validation data dataset or dataframe that will be used to validate the trained model. if this is not provided, Azure ML will apply cross-validation using the training data. - Alternatively: 1. Specify a dataset, dataframe, or numpy array of X values containing the training features, with a corresponding y array of label values. 2. Optionally, specify X_valid and y_valid datasets, dataframes, or numpy arrays of X_valid values to be used for validation.

Exploring Preprocessing Steps

- Auto ML uses scikit-learn pipelines to encapsulate preprocessing steps within the model. - You can view steps in the fitted model you retrieved the best run for using this code: for step_ in fitted_model.named_steps: print(step)

Automated ML

- Automated machine learning allows you to try multiple algorithms and preprocessing transformations with your data. - This when combined with scalable cloud-based compute makes it possible to find the best performing model for your data without the large amount of time that a manual trial and error process would take. - You can use automated ML in Azure ML to train models for: classification, Regression and Time Series Processing.

Bayesian Sampling

- Bayesian Sampling chooses hyperparameter values based off of the Bayesian optimization algorithm - Bayesian optimization algorithm tries to select parameter values that will result in an improved performance as compared to the previous selection. - Example: from azureml.train.hyperdrive import BayesianParameterSampling, choice, uniform param_space = { '--batch_size': choice(16, 32, 64), '--learning_rate': uniform(0.5, 0.1) } param_sampling = BayesianParameterSampling(param_space) - NOTE: You can only use Bayesian sampling with choice, uniform an quiniform parameter experessions and you CAN'T combine it with an early termination policy.

Troubleshooting Service Deployment

- Lots of things that can go wrong that need troubleshooting: scoring script, runtime configuration, trained model, runtime env, container image, and container host. - Can start with checking the status of a service by examining the state: from azureml.core.webservice import AksWebservice #Get the deployed service service = AksWebservice(name='classifier-service', workspace=ws) #Check its state print(service.state) - For an operational service the state should be 'Healthy' - If a service is not healthy or you are getting errors when using it, you can review its logs like this: print(service.get_logs) - The logs have detailed info about the provisioning of the service and the requests it has processed and it can often give you some info on unexpected errors - Deployment and runtime errors can be easier to diagnose by deploying the service as a container in a local Docker instance like so: from azureml.core.Webservice import LocalWebservice deployment_config = LocalWebservice.deploy_configuration(port=8090) service = Model.deploy(ws, 'test-svc', [model], inference_config, deployment_config) - You can then test the locally deployed model using SDK like this: print(service.run(input_data = json_data)) - Then you can troubleshoot any runtime issues by changing the scoring file that is referenced in the inference config and reload the service without redeployment it (this is something you can only do with a local service) like this: service.reload() print(service.run(input_data = json_data))

Global Feature Importance

- Quantifies the relative importance of each feature in the test dataset as a whole. Provides a general comparison of the extent to which each feature in the dataset influences predictions.

Random Sampling

- Random sampling is used to select a random value for each hyperparameter, which can be a mix of discrete and continuous values. - This is shown below: from azureml.train.hyperdrive import RandomParameterSampling, choice, normal param_space = { '--batch_size': choice(16, 32, 64), '--learning_rate': normal(10, 3) } param_sampling = RandomParameterSampling(param_space)

Hyperparameters

- Referred to as hyperparameters in machine learning - Parameters are values in a dataset itself - Hyperparameters are defined externally from the data - A hyperparameter is a parameter that is set before the learning process begins. These parameters are tunable and can directly affect how well a model trains. Some examples of hyperparameters in machine learning: Learning Rate Number of Epochs Momentum Regularization constant Number of branches in a decision tree Number of clusters in a clustering algorithm (like k-means)

Continuous Hyperparameters

- Some hyperparameters are continuous which means you can use any value along a scale. - To define a search space for these kinds of values you can use any of the following distribution types: *normal *uniform *lognormal *loguniform

Discrete Hyperparameters

- Some hyperparameters require 'discrete' values, which means that you must select the value from a particular set of possibilities. - Can define a search space for a discrete parameter using a 'choice' from a list of explicit values which you can define as a list, range or just an arbitrary set of comma separated values. - Can select discrete values from any of the following discrete distributions: *qnormal *quniform *qlognormal *qloguniform

Search Space

- The set of hyperparameters tried during hyperparameter tuning is known as a search space - The definition of the range of possible values that can be chosen depends on the type of hyperparameter. - Defining as search space: to define a search space create a dictionary with the appropriate parameter expression for each named hyperparameter. - For example, the following search space indicates that the batch_size hyperparameter can have the value 16, 32, or 64, and the learning_rate hyperparameter can have any value from a normal distribution with a mean of 10 and a standard deviation of 3.: from azureml.train.hyperdrive import choice, normal param_space = { '--batch_size': choice(16, 32, 64), '--learning_rate': normal(10, 3) }

Write Log Data

- To capture telemetry data for App Insights, you can write any values to the standard output log in the scoring script for your service by using a 'print' statement like this: def init(): global model model = joblib.load(Model.get_model_path('my_model')) def run(raw_data): data = json.loads(raw_data)['data'] predictions = model.predict(data) log_txt = 'Data:' + str(data) + ' - Predictions:' + str(predictions) print(log_txt) return predictions.tolist() - Azure ML creates a custom dimension in the App Insights data model for the output you write

AKS (Azure Kubernetes Services)

- To deploy to AKS: *You must first make or attach an AKS cluster *Create a deployment configuration that describes the compute resources needed *Also need inference config that describes the env needed to host model and web service - Can create in Azure ML Studio, Azure ML Portal, SDK or ML extension or Azure CLI (Command Line Inference) - Good to use for high-scale production services, use if you need to have the following functionalities: 1. fast response time 2. autoscaling of the deployed service 3. logging 4. model data collection 5. authentication 6. TLS termination 7. hardware acceleration options such as GPU and field programmable gate arrays (FPGA) - When deploying AKS you deploy the AKS cluster connected to your workspace

Explaining Global Feature Importance

- To get global importance values for the features in your model you can call the 'explain_global()' method of your explainer to get a global explaination and then you can use the 'get_feature_importance_dict()' method to get a dictionary of the feature importance values. - Here's an example for PFI, Tabular and Mimic Explainers: # MimicExplainer global_mim_explanation = mim_explainer.explain_global(X_train) global_mim_feature_importance = global_mim_explanation.get_feature_importance_dict() # TabularExplainer global_tab_explanation = tab_explainer.explain_global(X_train) global_tab_feature_importance = global_tab_explanation.get_feature_importance_dict() # PFIExplainer global_pfi_explanation = pfi_explainer.explain_global(X_train, y_train) global_pfi_feature_importance = global_pfi_explanation.get_feature_importance_dict()

Explaining Local Feature Importance

- To get local feature importance from a MimicExplainer or TabularExplainer you have to call the 'explain_local()' method of your explainer, and specify the subset of cases you want to explain. - You can then use the 'get_ranked_local_names()' and 'get_ranked_local_values()' methods to get dictionaries of the feature names and importance values, ranked by importance. -EX: # MimicExplainer local_mim_explanation = mim_explainer.explain_local(X_test[0:5]) local_mim_features = local_mim_explanation.get_ranked_local_names() local_mim_importance = local_mim_explanation.get_ranked_local_values() # TabularExplainer local_tab_explanation = tab_explainer.explain_local(X_test[0:5]) local_tab_features = local_tab_explanation.get_ranked_local_names() local_tab_importance = local_tab_explanation.get_ranked_local_values()

While creating a binary classification model using a two-class logistic regression model, what should you use to evaluate the model results for imbalance?

- Use the AUC Curve - You can inspect the true + vs the false + rate in the ROC curve (remember that chart where the more it looks like the upper left corner of a square the better the model) and the corresponding area under the curve (AUC value). - The closer the curve is to the upper left hand corner, the better the model (maximizing true + rate while minimizing false + rate) - Curves that are closer to the diagonal line across the chart more closely resemble random guessing in the model's predictions

ACI (Azure Container Instance)

- Used to deploy web service if: 1. You need need to quickly deploy and validate your model (dont need to create ACI containers ahead of time, they're made as part of the deployment process) 2. You're testing a model thats still in development - To deploy: *Make a deployment configuration that describes compute resources needed (number of cores, memory, etc). *Also need inference config that describes environment needed to host model and web service

Real-time inferencing

- Used to predict labels for new data instantly/ in real-time. - Usually used to predict smaller numbers of observations from smaller amounts of data - In Azure Machine learning, you can create real-time inferencing solutions by deploying a model as a service, hosted in a containerized platform such as Azure Kubernetes Services (AKS). - Can deploy with several kinds of computes: Azure ML compute, Azure Container Instance (ACI), Azure Kubernetes Service cluster (AKS), Azure Function or an Internet of Things (IoT) service. - To deploy you must first convert the training pipeline to a real-time inference pipeline - When you create one several things happen: 1. Trained model is stored as a Dataset module in the module palet and can be accessed under My Datasets 2. Training modules like Train Model and Split Data are removed 3. The saved trained model is added back to the pipeline 4. Web Service Input & Output modules are added which are shown where the user enters data and where the data is returned/the output - Use AKS compute cluster for this - Viewing real-time endpoints: *View Endpoints page to see endpoint you deployed *Details dashboard will show you tags, status and REST URI *Consume tab shows security keys and you can set authorization there *Deployment logs tab shows detailed deployment logs of your real-time endpoint

Creating an Explanation in the Experiment Script

- When you use an estimator or a script to train a model in an Azure ML experiment, you can make an explainer and upload the explanation it gives to the run for later analysis. - To make an explanation in the experiment script you must make sure the azureml-interpret and azureml-contrib-interpret packages are installed in the env. Then you can use these to generate an explanation from your trained model and upload the explanation to the run outputs. - Example: # Import Azure ML run library from azureml.core.run import Run from azureml.contrib.interpret.explanation.explanation_client import ExplanationClient from interpret.ext.blackbox import TabularExplainer # other imports as required # Get the experiment run context run = Run.get_context() # code to train model goes here # Get explanation explainer = TabularExplainer(model, X_train, features=features, classes=labels) explanation = explainer.explain_global(X_test) # Get an Explanation Client and upload the explanation explain_client = ExplanationClient.from_run(run) explain_client.upload_model_explanation(explanation, comment='Tabular Explanation') # Complete the run run.complete()

Viewing an Explanation

- You can view an explanation that you created for your model in the Explanations tab for the run in the Azure ML studio. - You can also use the ExplanationClient object to download the explanation in Python like so: from azureml.contrib.interpret.explanation.explanation_client import ExplanationClient client = ExplanationClient.from_run_id(workspace=ws, experiment_name=experiment.experiment_name, run_id=run.id) explanation = client.download_model_explanation() feature_importances = explanation.get_feature_importance_dict()


Conjuntos de estudio relacionados

Emergency Care in the Streets Chapter 42 - Neonatal Care

View Set

Microbial Growth Practice Quiz Bank, Exam 2, Chapter 6

View Set

Infection Control Test Results and Key Concepts

View Set

Innovation Lec 4 - Business Model Canvas

View Set