DP-100 Data Science Questions Topic 4

¡Supera tus tareas y exámenes ahora con Quizwiz!

HOTSPOT - You are developing a linear regression model in Azure Machine Learning Studio. You run an experiment to compare different algorithms. The following image displays the results dataset output: Algorithm, Mean AbsoluteError, Root Mean Squared Error, Relative Absolute Error, Relative Squared Error BayesianLinear >>numbers<< NerualNetwork >>numbers<< BoostedDecisionTree >>smallest numbers<< Linear >>numbers<< DecisionForest >>numbers<< Use the drop-down menus to select the answer choice that answers each question based on the information presented in the image. NOTE: Each correct selection is worth one point. Hot Area: Which algorithm minimizes differences between actual and predicted values? - Bayesian Linear Regression - Neutral Network Regression - Boosted Decision Tree Regression - Linear Regression - Decision Forest Regression Which approach should you use to find the best parameters for a Linear Regression model for the Online Gradient Descent Method? - Set the Decrease learning rate option to True - Set the Decrease learning rate option to False - Set the Create trainer mode option to Parameter Range - Increase the number of epochs - Decrease the number of epochs

Correct Answer: - Boosted Decision Tree Regression - Set the Create trainer mode option to Parameter Range Box 1: Boosted Decision Tree Regression Mean absolute error (MAE) measures how close the predictions are to the actual outcomes; thus, a lower score is better. Box 2: Online Gradient Descent: (WHAT? THIS ISN'T AN OPTION) If you want the algorithm to find the best parameters for you, set Create trainer mode option to Parameter Range. You can then specify multiple values for the algorithm to try. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-model https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/linear-regression

HOTSPOT - You are using a decision tree algorithm. You have trained a model that generalizes well at a tree depth equal to 10. You need to select the bias and variance properties of the model with varying tree depth values. Which properties should you select for each tree depth? To answer, select the appropriate options in the answer area. Hot Area: Tree Depth, Bias, Variance 5, >>1<<, >>1<< 15, >>1<<, >>1<< All answers: High, Low, Identical

Correct Answer: - 5: High, Low - 15, Low, High In decision trees, the depth of the tree determines the variance. A complicated decision tree (e.g. deep) has low bias and high variance. Note: In statistics and machine learning, the biasג€"variance tradeoff is the property of a set of predictive models whereby models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples, and vice versa. Increasing the bias will decrease the variance. Increasing the variance will decrease the bias. Reference: https://machinelearningmastery.com/gentle-introduction-to-the-bias-variance-trade-off-in-machine-learning/

DRAG DROP - You have a model with a large difference between the training and validation error values. You must create a new model and perform cross-validation. You need to identify a parameter set for the new model using Azure Machine Learning Studio. Which module you should use for each step? To answer, drag the appropriate modules to the correct steps. Each module may be used once or more than once, or not at all. You may need to drag the split bar between panes or scroll to view content. NOTE: Each correct selection is worth one point. Select and Place: Modules Two-Class Boosted Decision Tree Partition and Sample Tune Model Hyperparameters Split Data Step Define the parameter scope: ____ Define the cross-validation settings: ____ Define the metric: ____ Train, evaluate, and compare: ____

Correct Answer: - Split Data - Partition and Sample - Two-Class Boosted Decision Tree - Tune Model Hyperparameters Alternate Answer: YES - Two-Class Boosted Decision Tree - Partition and Sample - Tune Model Hyperparameters - Tune Model Hyperparameters (https://docs.microsoft.com/en-us/azure/machine-learning/studio/algorithm-parameters-optimize) Box 1: Split data - Box 2: Partition and Sample - Box 3: Two-Class Boosted Decision Tree Box 4: Tune Model Hyperparameters Integrated train and tune: You configure a set of parameters to use, and then let the module iterate over multiple combinations, measuring accuracy until it finds a "best" model. With most learner modules, you can choose which parameters should be changed during the training process, and which should remain fixed. We recommend that you use Cross-Validate Model to establish the goodness of the model given the specified parameters. Use Tune Model Hyperparameters to identify the optimal parameters. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-and-sample

HOTSPOT - You are performing feature scaling by using the scikit-learn Python library for x.1 x2, and x3 features. Original and scaled data is shown in the following image. >>image<< (https://www.examtopics.com/exams/microsoft/dp-100/view/40/) Original: three distinct bell curves A: all bell curves on top of each other with minimal space B: all bell curves on top of each other with more space C: one bell to the left, one bell to the right, one tall bell to the right Use the drop-down menus to select the answer choice that answers each question based on the information presented in the graphic. NOTE: Each correct selection is worth one point. Hot Area: Which scaler is used in graph A? B? C? Answers for all: Standard Scaler, Min Max Scale, Normalizer

Correct Answer: Standard Scaler, Min Max Scaler, Normalizer Box 1: StandardScaler - The StandardScaler assumes your data is normally distributed within each feature and will scale them such that the distribution is now centered around 0, with a standard deviation of 1. Example: All features are now on the same scale relative to one another. Box 2: Min Max Scaler - Notice that the skewness of the distribution is maintained but the 3 distributions are brought into the same scale so that they overlap. Box 3: Normalizer - Reference: http://benalexkeen.com/feature-scaling-with-scikit-learn/

DRAG DROP - You are producing a multiple linear regression model in Azure Machine Learning Studio. Several independent variables are highly correlated. You need to select appropriate methods for conducting effective feature engineering on all the data. Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order. Select and Place: Evaluate the probability function Remove duplicate rows Use the Filter Based Feature Selection module Test the hypothesis using t-Test Compute linear correlation Build a counting transform

Correct Answer: Use the Filter Based Feature Selection Module, Build a counting transform, Test the hypothesis using t-Test Step 1: Use the Filter Based Feature Selection module Filter Based Feature Selection identifies the features in a dataset with the greatest predictive power. The module outputs a dataset that contains the best feature columns, as ranked by predictive power. It also outputs the names of the features and their scores from the selected metric. Step 2: Build a counting transform A counting transform creates a transformation that turns count tables into features, so that you can apply the transformation to multiple datasets. Step 3: Test the hypothesis using t-Test Reference: https://docs.microsoft.com/bs-latn-ba/azure/machine-learning/studio-module-reference/filter-based-feature-selection https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/build-counting-transform

DRAG DROP - You have several machine learning models registered in an Azure Machine Learning workspace. You must use the Fairlearn dashboard to assess fairness in a selected model. Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order. Select and Place: - Select a binary classification or regression model - Select a metric to be measured - Select a multiclass classification model - Select a model feature to be evaluated - Select a clustering model

Correct Answer: - Select a model feature to be evaluated - Select a binary classification or regression model - Select a metric to be measured Alternative Answer: ? - binary class or regress model - feature - metric Step 1: Select a model feature to be evaluated. Step 2: Select a binary classification or regression model. Register your models within Azure Machine Learning. For convenience, store the results in a dictionary, which maps the id of the registered model (a string in name:version format) to the predictor itself. Example: model_dict = {} lr_reg_id = register_model("fairness_logistic_regression", lr_predictor) model_dict[lr_reg_id] = lr_predictor svm_reg_id = register_model("fairness_svm", svm_predictor) model_dict[svm_reg_id] = svm_predictor Step 3: Select a metric to be measuredPrecompute fairness metrics.Create a dashboard dictionary using Fairlearn's metrics package. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-fairness-aml

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You train a classification model by using a logistic regression algorithm. You must be able to explain the model's predictions by calculating the importance of each feature, both as an overall global relative importance value and as a measure of local importance for a specific set of predictions. You need to create an explainer that you can use to retrieve the required global and local feature importance values. Solution: Create a PFIExplainer. Does the solution meet the goal? A. Yes B. No

Correct Answer: A Alternate Answer: B (all comments) (Answer is Yes Mimic explains both local and global feature importance https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability-automl) Permutation Feature Importance Explainer (PFI): Permutation Feature Importance is a technique used to explain classification and regression models. At a high level, the way it works is by randomly shuffling data one feature at a time for the entire dataset and calculating how much the performance metric of interest changes. The larger the change, the more important that feature is. PFI can explain the overall behavior of any underlying model but does not explain individual predictions. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You are creating a model to predict the price of a student's artwork depending on the following variables: the student's length of education, degree type, and art form. You start by creating a linear regression model. You need to evaluate the linear regression model. Solution: Use the following metrics: Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error, Relative Squared Error, and the Coefficient of Determination. Does the solution meet the goal? A. Yes B. No

Correct Answer: A The following metrics are reported for evaluating regression models. When you compare models, they are ranked by the metric you select for evaluation. Mean absolute error (MAE) measures how close the predictions are to the actual outcomes; thus, a lower score is better. Root mean squared error (RMSE) creates a single value that summarizes the error in the model. By squaring the difference, the metric disregards the difference between over-prediction and under-prediction. Relative absolute error (RAE) is the relative absolute difference between expected and actual values; relative because the mean difference is divided by the arithmetic mean. Relative squared error (RSE) similarly normalizes the total squared error of the predicted values by dividing by the total squared error of the actual values. Mean Zero One Error (MZOE) indicates whether the prediction was correct or not. In other words: ZeroOneLoss(x,y) = 1 when x!=y; otherwise 0. Coefficient of determination, often referred to as R2, represents the predictive power of the model as a value between 0 and 1. Zero means the model is random (explains nothing); 1 means there is a perfect fit. However, caution should be used in interpreting R2 values, as low values can be entirely normal and high values can be suspect. AUC. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-model

You are building a binary classification model by using a supplied training set. The training set is imbalanced between two classes. You need to resolve the data imbalance. What are three possible ways to achieve this goal? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point. A. Penalize the classification B. Resample the dataset using undersampling or oversampling C. Normalize the training feature set D. Generate synthetic samples in the minority class E. Use accuracy as the evaluation metric of the model

Correct Answer: ABD A: Try Penalized Models - You can use the same algorithms but give them a different perspective on the problem. Penalized classification imposes an additional cost on the model for making classification mistakes on the minority class during training. These penalties can bias the model to pay more attention to the minority class. B: You can change the dataset that you use to build your predictive model to have more balanced data. This change is called sampling your dataset and there are two main methods that you can use to even-up the classes: ✑ Consider testing under-sampling when you have an a lot data (tens- or hundreds of thousands of instances or more) ✑ Consider testing over-sampling when you don't have a lot of data (tens of thousands of records or less) D: Try Generate Synthetic Samples A simple way to generate synthetic samples is to randomly sample the attributes from instances in the minority class. Reference: https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You are creating a model to predict the price of a student's artwork depending on the following variables: the student's length of education, degree type, and art form. You start by creating a linear regression model. You need to evaluate the linear regression model. Solution: Use the following metrics: Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error, Accuracy, Precision, Recall, F1 score, and AUC. Does the solution meet the goal? A. Yes B. No

Correct Answer: B Accuracy, Precision, Recall, F1 score, and AUC are metrics for evaluating classification models. Note: Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error are OK for the linear regression model. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-model

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You train a classification model by using a logistic regression algorithm. You must be able to explain the model's predictions by calculating the importance of each feature, both as an overall global relative importance value and as a measure of local importance for a specific set of predictions. You need to create an explainer that you can use to retrieve the required global and local feature importance values. Solution: Create a TabularExplainer. Does the solution meet the goal? A. Yes B. No

Correct Answer: B Alternate Answer: A (all comments) (answer should be yes Permutation Feature Importance (PFI) model explainer can only be used to explain how strongly the features contribute to the prediction at the dataset level, it doesn't support evaluation of local importances. Mimic Explainer can be used for interpreting both the global and local importance of features, Tabular Explainer can be used for interpreting both the global and local importance of features) Instead use Permutation Feature Importance Explainer (PFI). Note 1: >image< Note 2: Permutation Feature Importance Explainer (PFI): Permutation Feature Importance is a technique used to explain classification and regression models. At a high level, the way it works is by randomly shuffling data one feature at a time for the entire dataset and calculating how much the performance metric of interest changes. The larger the change, the more important that feature is. PFI can explain the overall behavior of any underlying model but does not explain individual predictions. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability

You are a data scientist creating a linear regression model. You need to determine how closely the data fits the regression line. Which metric should you review? A. Root Mean Square Error B. Coefficient of determination C. Recall D. Precision E. Mean absolute error

Correct Answer: B Coefficient of determination, often referred to as R2, represents the predictive power of the model as a value between 0 and 1. Zero means the model is random(explains nothing); 1 means there is a perfect fit. However, caution should be used in interpreting R2 values, as low values can be entirely normal and high values can be suspect. Incorrect Answers: A: Root mean squared error (RMSE) creates a single value that summarizes the error in the model. By squaring the difference, the metric disregards the difference between over-prediction and under-prediction. C: Recall is the fraction of all correct results returned by the model. D: Precision is the proportion of true results over all positive results. E: Mean absolute error (MAE) measures how close the predictions are to the actual outcomes; thus, a lower score is better. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-model

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You train a classification model by using a logistic regression algorithm. You must be able to explain the model's predictions by calculating the importance of each feature, both as an overall global relative importance value and as a measure of local importance for a specific set of predictions. You need to create an explainer that you can use to retrieve the required global and local feature importance values. Solution: Create a MimicExplainer. Does the solution meet the goal? A. Yes B. No

Correct Answer: B Instead use Permutation Feature Importance Explainer (PFI). Note 1: Mimic explainer is based on the idea of training global surrogate models to mimic blackbox models. A global surrogate model is an intrinsically interpretable model that is trained to approximate the predictions of any black box model as accurately as possible. Data scientists can interpret the surrogate model to draw conclusions about the black box model. Note 2: Permutation Feature Importance Explainer (PFI): Permutation Feature Importance is a technique used to explain classification and regression models. At a high level, the way it works is by randomly shuffling data one feature at a time for the entire dataset and calculating how much the performance metric of interest changes. The larger the change, the more important that feature is. PFI can explain the overall behavior of any underlying model but does not explain individual predictions. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability https://docs.microsoft.com/en-us/learn/modules/explain-machine-learning-models-with-azure-machine-learning/3-explainers

You are creating a binary classification by using a two-class logistic regression model. You need to evaluate the model results for imbalance. Which evaluation metric should you use? A. Relative Absolute Error B. AUC Curve C. Mean Absolute Error D. Relative Squared Error E. Accuracy F. Root Mean Square Error

Correct Answer: B One can inspect the true positive rate vs. the false positive rate in the Receiver Operating Characteristic (ROC) curve and the corresponding Area Under the Curve (AUC) value. The closer this curve is to the upper left corner; the better the classifier's performance is (that is maximizing the true positive rate while minimizing the false positive rate). Curves that are close to the diagonal of the plot, result from classifiers that tend to make predictions that are close to random guessing. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio/evaluate-model-performance#evaluating-a-binary-classification-model

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You are creating a model to predict the price of a student's artwork depending on the following variables: the student's length of education, degree type, and art form. You start by creating a linear regression model. You need to evaluate the linear regression model. Solution: Use the following metrics: Relative Squared Error, Coefficient of Determination, Accuracy, Precision, Recall, F1 score, and AUC. Does the solution meet the goal? A. Yes B. No

Correct Answer: B Relative Squared Error, Coefficient of Determination are good metrics to evaluate the linear regression model, but the others are metrics for classification models. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-model

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You are creating a model to predict the price of a student's artwork depending on the following variables: the student's length of education, degree type, and art form. You start by creating a linear regression model. You need to evaluate the linear regression model. Solution: Use the following metrics: Accuracy, Precision, Recall, F1 score, and AUC. Does the solution meet the goal? A. Yes B. No

Correct Answer: B Those are metrics for evaluating classification models, instead use: Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error, Relative SquaredError, and the Coefficient of Determination. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-model

You are performing feature engineering on a dataset. You must add a feature named CityName and populate the column value with the text London. You need to add the new feature to the dataset. Which Azure Machine Learning Studio module should you use? A. Extract N-Gram Features from Text B. Edit Metadata C. Preprocess Text D. Apply SQL Transformation

Correct Answer: B Typical metadata changes might include marking columns as features. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/edit-metadata

You are a data scientist building a deep convolutional neural network (CNN) for image classification. The CNN model you build shows signs of overfitting. You need to reduce overfitting and converge the model to an optimal fit. Which two actions should you perform? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point. A. Add an additional dense layer with 512 input units. B. Add L1/L2 regularization. C. Use training data augmentation. D. Reduce the amount of training data. E. Add an additional dense layer with 64 input units.

Correct Answer: BD Alternate Answer: BC (all comments) YES (During dropout, we are not actually reducing the training data but rather dropping the neurons to help them memorize less and not overfit. https://www.kdnuggets.com/2019/12/5-techniques-prevent-overfitting-neural-networks.html so the answer is B and C) B: Weight regularization provides an approach to reduce the overfitting of a deep learning neural network model on the training data and improve the performance of the model on new data, such as the holdout test set. Keras provides a weight regularization API that allows you to add a penalty for weight size to the loss function. Three different regularizer instances are provided; they are: ✑ L1: Sum of the absolute weights. ✑ L2: Sum of the squared weights. ✑ L1L2: Sum of the absolute and the squared weights. D: Because a fully connected layer occupies most of the parameters, it is prone to overfitting. One method to reduce overfitting is dropout. At each training stage, individual nodes are either "dropped out" of the net with probability 1-p or kept with probability p, so that a reduced network is left; incoming and outgoing edges to a dropped-out node are also removed. By avoiding training all nodes on all training data, dropout decreases overfitting. Reference: https://machinelearningmastery.com/how-to-reduce-overfitting-in-deep-learning-with-weight-regularization/ https://en.wikipedia.org/wiki/Convolutional_neural_network

You are determining if two sets of data are significantly different from one another by using Azure Machine Learning Studio. Estimated values in one set of data may be more than or less than reference values in the other set of data. You must produce a distribution that has a constant Type I error as a function of the correlation. You need to produce the distribution. Which type of distribution should you produce? A. Unpaired t-test with a two-tail option B. Unpaired t-test with a one-tail option C. Paired t-test with a one-tail option D. Paired t-test with a two-tail option

Correct Answer: D Choose a one-tail or two-tail test. The default is a two-tailed test. This is the most common type of test, in which the expected distribution is symmetric around zero. Example: Type I error of unpaired and paired two-sample t-tests as a function of the correlation. The simulated random numbers originate from a bivariate normal distribution with a variance of 1. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/test-hypothesis-using-t-test https://en.wikipedia.org/wiki/Student%27s_t-test

You are a data scientist working for a bank and have used Azure ML to train and register a machine learning model that predicts whether a customer is likely to repay a loan. You want to understand how your model is making selections and must be sure that the model does not violate government regulations such as denying loans based on where an applicant lives. You need to determine the extent to which each feature in the customer data is influencing predictions. What should you do? A. Enable data drift monitoring for the model and its training dataset. B. Score the model against some test data with known label values and use the results to calculate a confusion matrix. C. Use the Hyperdrive library to test the model with multiple hyperparameter values. D. Use the interpretability package to generate an explainer for the model. E. Add tags to the model registration indicating the names of the features in the training dataset.

Correct Answer: D When you compute model explanations and visualize them, you're not limited to an existing model explanation for an automated ML model. You can also get an explanation for your model with different test data. The steps in this section show you how to compute and visualize engineered feature importance based on your test data. Incorrect Answers: A: In the context of machine learning, data drift is the change in model input data that leads to model performance degradation. It is one of the top reasons where model accuracy degrades over time, thus monitoring data drift helps detect model performance issues. B: A confusion matrix is used to describe the performance of a classification model. Each row displays the instances of the true, or actual class in your dataset, and each column represents the instances of the class that was predicted by the model. C: Hyperparameters are adjustable parameters you choose for model training that guide the training process. The HyperDrive package helps you automate choosing these parameters. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability-automl

HOTSPOT - A biomedical research company plans to enroll people in an experimental medical treatment trial. You create and train a binary classification model to support selection and admission of patients to the trial. The model includes the following features: Age, Gender, and Ethnicity. The model returns different performance metrics for people from different ethnic groups. You need to use Fairlearn to mitigate and minimize disparities for each category in the Ethnicity feature. Which technique and constraint should you use? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: - Technique: Grid search, Outlier detection, Dimensionality reduction - Constraint: Demographic parity, False-positive rate parity

Correct Answer: Grid Search, Demographic parity Box 1: Grid Search - Fairlearn open-source package provides postprocessing and reduction unfairness mitigation algorithms: ExponentiatedGradient, GridSearch, andThresholdOptimizer. Note: The Fairlearn open-source package provides postprocessing and reduction unfairness mitigation algorithms types: ✑ Reduction: These algorithms take a standard black-box machine learning estimator (e.g., a LightGBM model) and generate a set of retrained models using a sequence of re-weighted training datasets. ✑ Post-processing: These algorithms take an existing classifier and the sensitive feature as input. Box 2: Demographic parity - The Fairlearn open-source package supports the following types of parity constraints: Demographic parity, Equalized odds, Equal opportunity, and Bounded group loss. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/concept-fairness-ml

HOTSPOT - You are analyzing the asymmetry in a statistical distribution. The following image contains two density curves that show the probability distribution of two datasets. >>image<< (https://www.examtopics.com/exams/microsoft/dp-100/view/42/) Graph 1: Parabolic curve leaning to the right Graph 2: Parabolic curve leaning to the left Use the drop-down menus to select the answer choice that answers each question based on the information presented in the graphic. NOTE: Each correct selection is worth one point. Hot Area: - Which type of distribution is shown for the dataset density curve of Graph 1? - Which type of distribution is shown for the dataset density curve of Graph 2? Answers for both questions: Negative skew, Positive skew, Normal distribution, Bimodal distribution

Correct Answer: Positive skew, Negative skew Alternative Answer: (all comments) YES Negative skew, Positive Skew (A left-skewed distribution has a long left tail. Left-skewed distributions are also called negatively-skewed distributions. That's because there is a long tail in the negative direction on the number line. The mean is also to the left of the peak. A right-skewed distribution has a long right tail. Right-skewed distributions are also called positive-skew distributions. That's because there is a long tail in the positive direction on the number line. The mean is also to the right of the peak. https://www.statisticshowto.com/probability-and-statistics/skewed-distribution/) Box 1: Positive skew - Positive skew values means the distribution is skewed to the right. Box 2: Negative skew - Negative skewness values mean the distribution is skewed to the left. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/compute-elementary-statistics

HOTSPOT - You train a classification model by using a decision tree algorithm. You create an estimator by running the following Python code. The variable feature_names is a list of all feature names, and class_names is a list of all class names. from interpret.ext.blackbox import TabularExplainer explainer = TabularExplainer(model, x_train, features=feature_names, classes=class_names) You need to explain the predictions made by the model for all classes by determining the importance of all features. For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE: Each correct selection is worth one point. Hot Area: YES or NO - the SHAP TreeExplainer will be used to interpret the model - If you omit the features and classes parameters in the TabularExplainer instantiation, the explainer still works as expected - You could interpret the model by using a MimicExplainer instead of a TabularExplainer

Correct Answer: Yes, Yes, No Alternate Answer: Yes, Yes, Yes YES (You can use one of the following interpretable models as your surrogate model: LightGBM (LGBMExplainableModel), Linear Regression (LinearExplainableModel), Stochastic Gradient Descent explainable model (SGDExplainableModel), and Decision Tree (DecisionTreeExplainableModel). So a MimicExplainer can also be used with Decision Tree.) https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability) Box 1: Yes - TabularExplainer calls one of the three SHAP explainers underneath (TreeExplainer, DeepExplainer, or KernelExplainer). Box 2: Yes - To make your explanations and visualizations more informative, you can choose to pass in feature names and output class names if doing classification. Box 3: No - TabularExplainer automatically selects the most appropriate one for your use case, but you can call each of its three underlying explainers underneath(TreeExplainer, DeepExplainer, or KernelExplainer) directly. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability-aml

HOTSPOT - You write code to retrieve an experiment that is run from your Azure Machine Learning workspace. The run used the model interpretation support in Azure Machine Learning to generate and upload a model explanation. Business managers in your organization want to see the importance of the features in the model. You need to print out the model features and their relative importance in an output that looks similar to the following. Feature, Importance 0, 1.5627345610083558 2, 0.6077689312583112 4, 0.5574002432900718 3, 0.42858759955671777 1, 0.3501361539771977 How should you complete the code? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: # Assume required modules are imported ws = Workspace.from_config() feature_importances = explanation.>>1<< (workspace = ws, experiment_name='train_and_explain', run_id='train_and_explain_12345') explanation = client.>>2<< () feature_importances = explanation.>>3<< () for key, value in feature_importances.items(); print(key, "\t", value) >>1<<: from_run; list_model_explanations; from_run_id; download_model_explanation >>2<<: upload_model_explanation; list_model_explanations; run; download_model_explanation >>3<<: explanation; explanation_client; get_feature_importance; download_model_explanation

Correct Answer: from_run_id; list_model_explanations; explanation Alternate Answer: YES (from_run_id; download_model_explanation; get_feature_importance) lots of discussion: https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/machine-learning/how-to-machine-learning-interpretability-automl.md https://docs.microsoft.com/en-us/learn/modules/explain-machine-learning-models-with-azure-machine-learning/4-create-explanations) Box 1: from_run_id - from_run_id(workspace, experiment_name, run_id)Create the client with factory method given a run ID. Returns an instance of the ExplanationClient.Parameters - ✑ Workspace Workspace - An object that represents a workspace. ✑ experiment_name str - The name of an experiment. ✑ run_id str - A GUID that represents a run. Box 2: list_model_explanations - list_model_explanations returns a dictionary of metadata for all model explanations available. Returns - A dictionary of explanation metadata such as id, data type, explanation method, model type, and upload time, sorted by upload time Box 3: explanation - Reference: https://docs.microsoft.com/en-us/python/api/azureml-contrib-interpret/azureml.contrib.interpret.explanation.explanation_client.explanationclient?view=azure-ml-py


Conjuntos de estudio relacionados

Chapter 04 Quiz: Business Ethics and Social Responsibility: Doing Well by Doing Good

View Set

Human Anatomy: Chapter 5 - The Integumentary System

View Set

Basic Economic Concepts Test Review

View Set

Chemistry 1st semester test study guide

View Set