DP100

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Optimization Algorithms

1. Stochastic Gradient Descent 2. Adaptive Learning Rate (ADADELTA) 3. Adaptive Momentum Estimation (Adam) 4. Various others

Tree-based Algorithms

Algorithms that build a decision tree to reach a prediction

Deploy a Predictive Service

After you've created and tested an inference pipeline for real-time inferencing, you can publish it as a service for client applications to use. Note In this exercise, you'll deploy the web service to a Microsoft Azure Container Instance (ACI). This type of compute is created dynamically, and is useful for development and testing. For production, you should create an inference cluster, which provide an Azure Kubernetes Service (AKS) cluster that provides better scalability and security. Deploy a service 1. View the Predict Auto Price inference pipeline you created in the previous unit. 2. At the top right, select Deploy, and deploy a new real-time endpoint, using the following settings: Name: predict-auto-price Description: Auto price regression. Compute type: Azure Container Instance 3. Wait for the web service to be deployed - this can take several minutes. The deployment status is shown at the top left of the designer interface. Test the service Now you can test your deployed service from a client application - in this case, you'll use the code in the cell below to simulate a client application. 1. On the Endpoints page, open the predict-auto-price real-time endpoint. 2. When the predict-auto-price endpoint opens, view the Consume tab and note the following information there. You need this to connect to your deployed service from a client application. The REST endpoint for your service The Primary Key for your service 3. Observe that you can use the ⧉ link next to these values to copy them to the clipboard. 4. With the Consume page for the predict-auto-price service page open in your browser, open a new browser tab and open a second instance of Azure Machine Learning studio. Then in the new tab, view the Notebooks page (under Author). 5. In the Notebooks page, under My files, use the 🗋 button to create a new file with the following settings: File location: Users/your user name File name: Test-Autos File type: Notebook Overwrite if already exists: Selected 6. When the new notebook has been created, ensure that the compute instance you created previously is selected in the Compute box, and that it has a status of Running. 7. Use the ≪ button to collapse the file explorer pane and give you more room to focus on the Test-Autos.ipynb notebook tab. Important note You will need to copy and paste the entire block of text presented in the code block. Make sure you have selected all of the text or data in the code block including endpoints, brackets etc. before copying this over and placing it into the specified position, such as a note pad or program, in the exercise. This will help to avoid errors occurring or having to go back and start the exercise again. Example /public function processAPI() { if (method_exists($this, $this->endpoint)) { return $this->_response($this->{$this->endpoint}($this->args)); } return $this->_response("No Endpoint: $this->endpoint", 404);/ You can also use the following shortcuts to copy and paste the code: 1. Click inside the code box and select CTL + A followed by CTL+ C 2. Alternatively, if you are using a Mac select Command + A and Command + C to copy all the code to your clipboard8. 8. In the rectangular cell that has been created in the notebook, paste the following code: 61 print(json.loads(error.read().decode("utf8", 'ignore'))) Note Don't worry too much about the details of the code. It just submits details of a car and uses the predict-auto-price service you created to get a predicted price. 9. Switch to the browser tab containing the Consume page for the predict-auto-price service, and copy the REST endpoint for your service. The switch back to the tab containing the notebook and paste the key into the code, replacing YOUR_ENDPOINT. 10. Switch to the browser tab containing the Consume page for the predict-auto-price service, and copy the Primary Key for your service. The switch back to the tab containing the notebook and paste the key into the code, replacing YOUR_KEY. 11. Save the notebook. Then use the ▷ button next to the cell to run the code. 12. Verify that predicted price is returned.

Ensemble Algorithms

Algorithms that combine the outputs of multiple base algorithms to improve generalizability.

F1-Score

An average metric that takes both precision and recall into account.

Support

How many instances of this class are there in the test dataset?

TensorFlow

Open Source ML library TensorFlow is a free, and open-source library based on Python. It is mainly used for developing deep learning applications especially those related to machine learning (ML) and artificial intelligence (AI).

Pandas

Popular for data analysis and manipulation. Excel for Python

Confusion Matrix

Predicted vs. Actuals

Indexing rows and cols of a pandas dataframe using iloc, loc

df_students.iloc[0,[1,2]] StudyHours: 10.0 Grade: 50.0 OR df_students.loc[0,'Grade'] OR df_students.loc[df_students['Name']=='Aisha'] OR df_students[df_students['Name']=='Aisha'] OR df_students.query('Name=="Aisha"') OR df_students[df_students.Name == 'Aisha']

Sum of missing values

df_students.isnull().sum()

Fitler the dataframe to include only rows where any of the columns (axis of 1 of the df) are null

df_students[df_students.isnull().any(axis=1)] ** missing values will show up as NaN (not a number)

Optimize Hyperparameters

from sklearn.model_selection import GridSearchCV from sklearn.metrics import make_scorer, r2_score # Use a Gradient Boosting algorithm alg = GradientBoostingRegressor() # Try these hyperparameter values params = { 'learning_rate': [0.1, 0.5, 1.0], 'n_estimators' : [50, 100, 150] } # Find the best hyperparameter combination to optimize the R2 metric score = make_scorer(r2_score) gridsearch = GridSearchCV(alg, params, scoring=score, cv=3, return_train_score=True) gridsearch.fit(X_train, y_train) print("Best parameter combination:", gridsearch.best_params_, "\n") # Get the best model model=gridsearch.best_estimator_ print(model, "\n") # Evaluate the model using the test data predictions = model.predict(X_test) mse = mean_squared_error(y_test, predictions) print("MSE:", mse) rmse = np.sqrt(mse) print("RMSE:", rmse) r2 = r2_score(y_test, predictions) print("R2:", r2) # Plot predicted vs actual plt.scatter(y_test, predictions) plt.xlabel('Actual Labels') plt.ylabel('Predicted Labels') plt.title('Daily Bike Share Predictions') # overlay the regression line z = np.polyfit(y_test, predictions, 1) p = np.poly1d(z) plt.plot(y_test,p(y_test), color='magenta') plt.show()

Using MinMaxScaler

from sklearn.preprocessing import MinMaxScaler # Get a scaler object scaler = MinMaxScaler() # Create a new dataframe for the scaled values df_normalized = df_sample[['Name', 'Grade', 'StudyHours']].copy() # Normalize the numeric columns df_normalized[['Grade','StudyHours']] = scaler.fit_transform(df_normalized[['Grade','StudyHours']]) # Plot the normalized values df_normalized.plot(x='Name', y=['Grade','StudyHours'], kind='bar', figsize=(8,5))

steps for SVM

from sklearn.preprocessing import StandardScaler from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline from sklearn.svm import SVC # Define preprocessing for numeric columns (scale them) feature_columns = [0,1,2,3] feature_transformer = Pipeline(steps=[ ('scaler', StandardScaler()) ]) # Create preprocessing steps preprocessor = ColumnTransformer( transformers=[ ('preprocess', feature_transformer, feature_columns)]) # Create training pipeline pipeline = Pipeline(steps=[('preprocessor', preprocessor), ('regressor', SVC(probability=True))]) # fit the pipeline to train a linear regression model on the training set multi_model = pipeline.fit(x_penguin_train, y_penguin_train) print (multi_model)

NumPy

functionality comparable to MATLAB and R

array mean

grades.mean()

Creating pandas dataframe

import pandas as pd df_students = pd.DataFrame({'Name': ['Dan', 'Joann', 'Pedro', 'Rosie', 'Ethan', 'Vicky', 'Frederic', 'Jimmie', 'Rhonda', 'Giovanni', 'Francesca', 'Rajab', 'Naiyana', 'Kian', 'Jenny', 'Jakeem','Helena','Ismat','Anila','Skye','Daniel','Aisha'], 'StudyHours':student_data[0], 'Grade':student_data[1]}) df_students

One vs One (OVO) Multiclass Classification models

in which a classifier for each possible pair of classes is created. The classification problem with four shape classes would require the following binary classifiers: square or circle square or triangle square or hexagon circle or triangle circle or hexagon triangle or hexagon In both approaches, the overall model must take into account all of these predictions to determine which single category the item belongs to.

variables we get from linear regression

m,b,r,p,se m, b, r, p, se = stats.linregress(df_regression['StudyHours'], df_regression['Grade'])

List vs. Array

python list * 2 makes a list twice the length of the original python array * 2 multiplies each element by 2

ROC

shows the curve of the true and false positive rates for different threshold values between 0 and 1. A perfect classifier would have a curve that goes straight up the left side and straight across the top. The diagonal line across the chart represents the probability of predicting correctly with a 50/50 random prediction; so you obviously want the curve to be higher than that (or your model is no better than simply guessing!). The area under the curve (AUC) is a value between 0 and 1 that quantifies the overall performance of the model. The closer to 1 this value is, the better the model. Once again, scikit-Learn includes a function to calculate this metric.

Clustering (unsupervised learning)

unsupervised learning is used when there is no "ground truth" from which to train and validate label predictions. The most common form of unsupervised learning is clustering, which is simllar conceptually to classification, except that the the training data does not include known values for the class label to be predicted.

Create a density plot

var_data-.plot.density()

Loading a DataFrame from a file

!wget https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/Data/ml-basics/grades.csv df_students = pd.read_csv('grades.csv',delimiter=',',header='infer') df_students.head()

Visual specs such as title, xlabels, ylabels, xticks (explicit figure size)

# Create a Figure fig = plt.figure(figsize=(8,3)) # Create a bar plot of name vs grade plt.bar(x=df_students.Name, height=df_students.Grade, color='orange') # Customize the chart plt.title('Student Grades') plt.xlabel('Student') plt.ylabel('Grade') plt.grid(color='#95a5a6', linestyle='--', linewidth=2, axis='y', alpha=0.7) plt.xticks(rotation=90) # Show the figure plt.show()

Visual specs such as title, xlabels, ylabels, xticks (implicit figure size)

# Create a bar plot of name vs grade plt.bar(x=df_students.Name, height=df_students.Grade, color='orange') # Customize the chart plt.title('Student Grades') plt.xlabel('Student') plt.ylabel('Grade') plt.grid(color='#95a5a6', linestyle='--', linewidth=2, axis='y', alpha=0.7) plt.xticks(rotation=90) # Display the plot plt.show()

using regression coefficients for prediction

# Define a function based on our regression coefficients def f(x): m = 6.3134 b = -17.9164 return m*x + b study_time = 14 # Get f(x) for study time prediction = f(study_time) # Grade can't be less than 0 or more than 100 expected_grade = max(0,min(100,prediction)) #Print the estimated grade print ('Studying for {} hours per week may result in a grade of {:.0f}'.format(study_time, expected_grade))

compare means of multi-dimensional arrays

# Define an array of study hours study_hours = [10.0,11.5,9.0,16.0,9.25,1.0,11.5,9.0,8.5,14.5,15.5, 13.75,9.0,8.0,15.5,8.0,9.0,6.0,10.0,12.0,12.5,12.0] # Create a 2D array (an array of arrays) student_data = np.array([study_hours, grades]) avg_study = student_data[0].mean() avg_grade = student_data[1].mean()

Visualizing Data with Matplotlib

# Ensure plots are displayed inline in the notebook %matplotlib inline from matplotlib import pyplot as plt # Create a bar plot of name vs grade plt.bar(x=df_students.Name, height=df_students.Grade) # Display the plot plt.show()

Indexing rows of a pandas dataframe using iloc

# Get data in the first five rows Input: df_students.iloc[0:5] Output: gives 5 rows

Filtering by mean

# Get students who studied for the mean or more hours df_students[df_students.StudyHours > mean_study]

Get the mean of a dataframe

# Get the mean study hours using to column name as an index mean_study = df_students['StudyHours'].mean() # Get the mean grade using the column name as a property (just to make the point!) mean_grade = df_students.Grade.mean() # Print the mean study hours and mean grade print('Average weekly study hours: {:.2f}\nAverage grade: {:.2f}'.format(mean_study, mean_grade))

Indexing rows of a pandas dataframe using loc

# Get the rows with index values from 0 to 5 Input: df_students.loc[0:5] Output: gives 6 rows

Box Plot

# Get the variable to examine var = df_students['Grade'] # Create a Figure fig = plt.figure(figsize=(10,4)) # Plot a histogram plt.boxplot(var) # Add titles and labels plt.title('Data Distribution') # Show the figure fig.show()

Measures of central tendency in the graphs

# Get the variable to examine var = df_students['Grade'] # Get statistics min_val = var.min() max_val = var.max() mean_val = var.mean() med_val = var.median() mod_val = var.mode()[0] print('Minimum:{:.2f}\nMean:{:.2f}\nMedian:{:.2f}\nMode:{:.2f}\nMaximum:{:.2f}\n'.format(min_val, mean_val, med_val, mod_val, max_val)) # Create a Figure fig = plt.figure(figsize=(10,4)) # Plot a histogram plt.hist(var) # Add lines for the statistics plt.axvline(x=min_val, color = 'gray', linestyle='dashed', linewidth = 2) plt.axvline(x=mean_val, color = 'cyan', linestyle='dashed', linewidth = 2) plt.axvline(x=med_val, color = 'red', linestyle='dashed', linewidth = 2) plt.axvline(x=mod_val, color = 'yellow', linestyle='dashed', linewidth = 2) plt.axvline(x=max_val, color = 'gray', linestyle='dashed', linewidth = 2) # Add titles and labels plt.title('Data Distribution') plt.xlabel('Value') plt.ylabel('Frequency') # Show the figure fig.show()

Plotting a histogram

# Get the variable to examine var_data = df_students['Grade'] # Create a Figure fig = plt.figure(figsize=(10,4)) # Plot a histogram plt.hist(var_data) # Add titles and labels plt.title('Data Distribution') plt.xlabel('Value') plt.ylabel('Frequency') # Show the figure fig.show()

Optimize and Save Models

# Import modules we'll need for this notebook import pandas as pd from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error, r2_score from sklearn.model_selection import train_test_split import numpy as np import matplotlib.pyplot as plt %matplotlib inline # load the training dataset !wget https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/Data/ml-basics/daily-bike-share.csv bike_data = pd.read_csv('daily-bike-share.csv') bike_data['day'] = pd.DatetimeIndex(bike_data['dteday']).day numeric_features = ['temp', 'atemp', 'hum', 'windspeed'] categorical_features = ['season','mnth','holiday','weekday','workingday','weathersit', 'day'] bike_data[numeric_features + ['rentals']].describe() print(bike_data.head()) # Separate features and labels # After separating the dataset, we now have numpy arrays named **X** containing the features, and **y** containing the labels. X, y = bike_data[['season','mnth', 'holiday','weekday','workingday','weathersit','temp', 'atemp', 'hum', 'windspeed']].values, bike_data['rentals'].values # Split data 70%-30% into training set and test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0) print ('Training Set: %d rows\nTest Set: %d rows' % (X_train.shape[0], X_test.shape[0]))

Train and evaluate a classification model (Binary Classification)

# Train the model from sklearn.linear_model import LogisticRegression # Set regularization rate reg = 0.01 # train a logistic regression model on the training set model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train) print (model)

Average time for students who took more than average time to study

# What was their mean grade? df_students[df_students.StudyHours > mean_study].Grade.mean()

Accuracy

(TP +TN )/ (TP+TN+FP+TN) Of all the predictions, how many are correct

RMSE (root mean squared error)

-Common error score used for continous motor skills, you can think of it as absolute error score for continous tasks. It measures the amount of error between the performance curve produced and criterion performance curve for a specific amount of time -Used only for continious tasks or motor skills

Steps for Using Azure ML with Classification

1. Create a Machine Learning Workspace 2. Create Compute Resources 3. Explore Data 4. Create and Run a Training Pipeline 5. Evaluate a Classification Model 6. Create an Inference Pipeline 7. Deploy a predictive service

Create an Inference Pipeline

1. In Microsoft Azure Machine Learning Studio, click the Designer page to view all of the pipelines you have created. Then open the Auto Price Training pipeline you created previously. 2. In the Create inference pipeline drop-down list, click Real-time inference pipeline. After a few seconds, a new version of your pipeline named Auto Price Training-real time inference will be opened. If the pipeline does not include Web Service Input and Web Service Output modules, go back to the Designer page and then re-open the Auto Price Training-real time inference pipeline. 3. Rename the new pipeline to Predict Auto Price, and then review the new pipeline. It contains a web service input for new data to be submitted, and a web service output to return results. Some of the transformations and training steps have been encapsulated in this pipeline so that the statistics from your training data will be used to normalize any new data values, and the trained model will be used to score the new data. You are going to make the following changes to the inference pipeline: Replace the Automobile price data (Raw) dataset with an Enter Data Manually module that does not include the label column (price) Modify the Select Columns in Dataset module to remove any reference to the (now absent) price column. Remove the Evaluate Model module. Insert an Execute Python Script module before the web service output to return only the predicted label. Follow the remaining steps below, using the image and information above for reference as you modify the pipeline. 4. The inference pipeline assumes that new data will match the schema of the original training data, so the Automobile price data (Raw) dataset from the training pipeline is included. However, this input data includes the price label that the model predicts, which is unintuitive to include in new car data for which a price prediction has not yet been made. Delete this module and replace it with an Enter Data Manually module from the Data Input and Output section, containing the following CSV data, which includes feature values without labels for three cars. 5. Connect the new Enter Data Manually module to the same dataset input of the Select Columns in Dataset module as the Web Service Input. 6. Now that you've changed the schema of the incoming data to exclude the price field, you need to remove any explicit uses of this field in the remaining modules. Select the Select Columns in Dataset module and then in the settings pane, edit the columns to remove the price field. 7. The inference pipeline includes the Evaluate Model module, which is not useful when predicting from new data, so delete this module. 8. The output from the Score Model module includes all of the input features as well as the predicted label. To modify the output to include only the prediction: Delete the connection between the Score Model module and the Web Service Output. Add an Execute Python Script module from the Python Language section, replacing all of the the default python script with the following code (which selects only the Scored Labels column and renames it to predicted_price): 8 return scored_results Connect the output from the Score Model module to the Dataset1 (left-most) input of the Execute Python Script, and connect the output of the Execute Python Script module to the Web Service Output. 9. Verify that your pipeline looks similar to the following: 10. Submit the pipeline as a new experiment named mslearn-auto-inference on your compute cluster. This may take a while! 11. When the pipeline has completed, select the Execute Python Script module, and in the settings pane, on the Output + logs tab, visualize the Result dataset to see the predicted prices for the three cars in the input data. 12. Close the visualization window. Your inference pipeline predicts prices for cars based on their features. Now you're ready to publish the pipeline so that client applications can use it.

Azure Machine Learning Experiments-Logging Metrics

1. Log: record a single-name value 2. Log_list: record a named list of values 3. Log_row: record a row with multiple columns 4. Log_table: record a dictionary as a table 5. Log_image: record an image file or a plot

Improve with hyperparameter tuning

1. Preprocessing data: Preprocessing refers to changes you make to your data before it is passed to the model. We have previously read that preprocessing can involve cleaning your dataset. While this is important, preprocessing can also include changing the format of your data, so it's easier for the model to use. For example, data described as 'red', 'orange', 'yellow', 'lime', and 'green', may work better if converted into a format more native to computers, such as numbers stating the amount of red and the amount of green. 2. Scaling Features: The most common preprocessing step is to scale features so they fall between zero and one. For example, the weight of a bike and the distance a person travels on a bike may be two very different numbers, but by scaling both numbers to between zero and one allows models to learn more effectively from the data. 3. Using categories as features: In machine learning, you can also use categorical features such as 'bicycle', 'skateboard' or 'car'. These features are represented by 0 or 1 values in one-hot vectors - vectors that have a 0 or 1 for each possible value. For example, bicycle, skateboard, and car might respectively be (1,0,0), (0,1,0), and (0,0,1).

Steps to train a linear regression model

1. Separate features and labels X, y = bike_data[['season','mnth', 'holiday','weekday','workingday','weathersit','temp', 'atemp', 'hum', 'windspeed']].values, bike_data['rentals'].values print('Features:',X[:10], '\nLabels:', y[:10], sep='\n') 2. Split the dataset from sklearn.model_selection import train_test_split # Split data 70%-30% into training set and test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0) print ('Training Set: %d rows\nTest Set: %d rows' % (X_train.shape[0], X_test.shape[0])) 3. Train the model # Train the model from sklearn.linear_model import LinearRegression # Fit a linear regression model on the training set model = LinearRegression().fit(X_train, y_train) print (model) 4. Evaluate the trained model import numpy as np predictions = model.predict(X_test) np.set_printoptions(suppress=True) print('Predicted labels: ', np.round(predictions)[:10]) print('Actual labels : ' ,y_test[:10]) 5. RMSE and stuff from sklearn.metrics import mean_squared_error, r2_score mse = mean_squared_error(y_test, predictions) print("MSE:", mse) rmse = np.sqrt(mse) print("RMSE:", rmse) r2 = r2_score(y_test, predictions) print("R2:", r2)

Understanding Correlation statistic

1. Values above 0 indicate a positive correlation (high values conicide with high values of another variable) 2. Values below 0 indicate a negative correlation (low values of one variable coincide with low values of another) **best represented by a scatter plot** df_sample.plot.scatter(title='Study Time vs Grade', x='StudyHours', y='Grade')

Remember when working with real-world data to check for:

1. missing values and badly recorded data 2. consider removing obvious outliers 3. consider what real-world factors might affect your analysis and consider if your dataset is large enough to handle this 4. check for biased raw data and consider your options to fix this

Different types of regression (Tree-Based Algorithms)

Algorithms that build a decision tree to reach a prediction. # Evaluate the model using the test data predictions = model.predict(X_test) mse = mean_squared_error(y_test, predictions) print("MSE:", mse) rmse = np.sqrt(mse) print("RMSE:", rmse) r2 = r2_score(y_test, predictions) print("R2:", r2) # Plot predicted vs actual plt.scatter(y_test, predictions) plt.xlabel('Actual Labels') plt.ylabel('Predicted Labels') plt.title('Daily Bike Share Predictions') # overlay the regression line z = np.polyfit(y_test, predictions, 1) p = np.poly1d(z) plt.plot(y_test,p(y_test), color='magenta') plt.show()

Different types of regression (Ensemble Algorithms)

Algorithms that combine the outputs of multiple base algorithms to improve generalizability. from sklearn.ensemble import RandomForestRegressor # Train the model model = RandomForestRegressor().fit(X_train, y_train) print (model, "\n") # Evaluate the model using the test data predictions = model.predict(X_test) mse = mean_squared_error(y_test, predictions) print("MSE:", mse) rmse = np.sqrt(mse) print("RMSE:", rmse) r2 = r2_score(y_test, predictions) print("R2:", r2) # Plot predicted vs actual plt.scatter(y_test, predictions) plt.xlabel('Actual Labels') plt.ylabel('Predicted Labels') plt.title('Daily Bike Share Predictions') # overlay the regression line z = np.polyfit(y_test, predictions, 1) p = np.poly1d(z) plt.plot(y_test,p(y_test), color='magenta') plt.show() # Train the model from sklearn.ensemble import GradientBoostingRegressor # Fit a lasso model on the training set model = GradientBoostingRegressor().fit(X_train, y_train) print (model, "\n") # Evaluate the model using the test data predictions = model.predict(X_test) mse = mean_squared_error(y_test, predictions) print("MSE:", mse) rmse = np.sqrt(mse) print("RMSE:", rmse) r2 = r2_score(y_test, predictions) print("R2:", r2) # Plot predicted vs actual plt.scatter(y_test, predictions) plt.xlabel('Actual Labels') plt.ylabel('Predicted Labels') plt.title('Daily Bike Share Predictions') # overlay the regression line z = np.polyfit(y_test, predictions, 1) p = np.poly1d(z) plt.plot(y_test,p(y_test), color='magenta') plt.show()

azure machine learning experiments

An experiment is a named process (running a script or pipeline that can generate metrics) CODE: from azureml.core import Experiment #create an experiment variable experiment = Experiment(workspace = ws, name = "my-experiment") #start the experiment run = experiment.start_logging() #experiment code #end the experiment run.complete()

Computer Vision

Area of Machine Learning concerned with image recognition. Deep learning is a solid method for predicting because it contains very large arrays that can be handled by deep learning techniques.

Create a dataset in Azure ML Workspace

Basic Info: Name: bike-rentals Dataset type: Tabular Description: Bicycle rental data Datastore and file selection: Select or create a datastore: Currently selected datastore Select files for your dataset: Browse to the daily-bike-share.csv file you downloaded. Upload path: Leave the default selection Skip data validation: Not selected Settings and preview: File format: Delimited Delimiter: Comma Encoding: UTF-8 Column headers: Only first file has headers Skip rows: None

Support Vector Machine (SVM)

Classification Algorithms that define a hyperplane that separates classes.

Inference Clusters

Deployment targets for predictive services that use your trained models.

Compute Instances

Development workstations that data scientists can use to work with data and models.

Residual

Difference between the observed value and the estimated value (aka the sample mean)

Error

Difference between the observed value and the true value (aka from the population mean)

NumPy array

Ex. import numpy as np grades = np.array(data) print(grades)

Encoding categorical features

For example, by using a one hot encoding technique you can create individual binary (true/false) features for each possible category value.

Hierarchical Clustering

Hierarchical clustering methods make fewer distributional assumptions when compared to K-means methods. However, K-means methods are generally more scalable, sometimes very much so. Hierarchical clustering creates clusters by either a divisive method or agglomerative method. The divisive method is a "top down" approach starting with the entire dataset and then finding partitions in a stepwise manner. Agglomerative clustering is a "bottom up** approach. In this lab you will work with agglomerative clustering which roughly works as follows: The linkage distances between each of the data points is computed. Points are clustered pairwise with their nearest neighbor. Linkage distances between the clusters are computed. Clusters are combined pairwise into larger clusters. Steps 3 and 4 are repeated until all data points are in a single cluster. The linkage function can be computed in a number of ways: Ward linkage measures the increase in variance for the clusters being linked, Average linkage uses the mean pairwise distance between the members of the two clusters, Complete or Maximal linkage uses the maximum distance between the members of the two clusters. Several different distance metrics are used to compute linkage functions: Euclidian or l2 distance is the most widely used. This metric is only choice for the Ward linkage method. Manhattan or l1 distance is robust to outliers and has other interesting properties. Cosine similarity, is the dot product between the location vectors divided by the magnitudes of the vectors. Notice that this metric is a measure of similarity, whereas the other two metrics are measures of difference. Similarity can be quite useful when working with data such as images or text documents.

Deploy a model as a service

In Azure Machine Learning, you can deploy a service as an Azure Container Instances (ACI) or to an Azure Kubernetes Service (AKS) cluster. For production scenarios, an AKS deployment is recommended, for which you must create an inference cluster compute target. In this exercise, you'll use an ACI service, which is a suitable deployment target for testing, and does not require you to create an inference cluster. 1. In Azure Machine Learning studio, on the Automated ML page, select the run for your automated machine learning experiment and view the Details tab. 2. Select the algorithm name for the best model. Then, on the Model tab, use the Deploy button to deploy the model with the following settings: Name: predict-rentals Description: Predict cycle rentals Compute type: Azure Container Instance Enable authentication: Selected 3. Wait for the deployment to start - this may take a few seconds. Then, in the Model summary section, observe the Deploy status for the predict-rentals service, which should be Running. Wait for this status to change to Successful. You may need to select ↻ Refresh periodically. 4. In Azure Machine Learning studio, view the Endpoints page and select the predict-rentals real-time endpoint. Then select the Consume tab and note the following information there. You need this information to connect to your deployed service from a client application. The REST endpoint for your service the Primary Key for your service 5. Note that you can use the ⧉ link next to these values to copy them to the clipboard. Test the deployed service Now that you've deployed a service, you can test it using some simple code. 1. With the Consume page for the predict-rentals service page open in your browser, open a new browser tab and open a second instance of Azure Machine Learning studio. Then in the new tab, view the Notebooks page (under Author). 2. In the Notebooks page, under My files, use the "Create" button to create a new file with the following settings: File location: Users/your user name File name: Test-Bikes File type: Notebook Overwrite if already exists: Selected 3. When the new notebook has been created, ensure that the compute instance you created previously is selected in the Compute box, and that it has a status of Running. 4. Use the ≪ button to collapse the file explorer pane and give you more room to focus on the Test-Bikes.ipynb notebook tab.

Train and evaluate deep learning models

In each epoch, the full set of training data is passed forward through the network. There are four features for each observation, and four corresponding nodes in the input layer - so the features for each observation are passed as a vector of four values to that layer. However, for efficiency, the feature vectors are grouped into batches; so actually a matrix of multiple feature vectors is fed in each time. The matrix of feature values is processed by a function that performs a weighted sum using initialized weights and bias values. The result of this function is then processed by the activation function for the input layer to constrain the values passed to the nodes in the next layer. The weighted sum and activation functions are repeated in each layer. Note that the functions operate on vectors and matrices rather than individual scalar values. In other words, the forward pass is essentially a series of nested linear algebra functions. This is the reason data scientists prefer to use computers with graphical processing units (GPUs), since these are optimized for matrix and vector calculations. In the final layer of the network, the output vectors contain a calculated value for each possible class (in this case, classes 0, 1, and 2). This vector is processed by a loss function that converts these values to probabilities and determines how far they are from the expected values based on the actual classes - so for example, suppose the output for a Gentoo penguin (class 1) observation is [0.3, 0.4, 0.3]. The correct prediction would be [0.0, 1.0, 0.0], so the variance between the predicted and actual values (how far away each predicted value is from what it should be) is [0.3, 0.6, 0.3]. This variance is aggregated for each batch and maintained as a running aggregate to calculate the overall level of error (loss) incurred by the training data for the epoch. At the end of each epoch, the validation data is passed through the network, and its loss and accuracy (proportion of correct predictions based on the highest probability value in the output vector) are also calculated. It's important to do this because it enables us to compare the performance of the model using data on which it was not trained, helping us determine if it will generalize well for new data or if it's overfitted to the training data. After all the data has been passed forward through the network, the output of the loss function for the training data (but not the validation data) is passed to the opimizer. The precise details of how the optimizer processes the loss vary depending on the specific optimization algorithm being used; but fundamentally you can think of the entire network, from the input layer to the loss function as being one big nested (composite) function. The optimizer applies some differential calculus to calculate partial derivatives for the function with respect to each weight and bias value that was used in the network. It's possible to do this efficiently for a nested function due to something called the chain rule, which enables you to determine the derivative of a composite function from the derivatives of its inner function and outer functions. You don't really need to worry about the details of the math here (the optimizer does it for you), but the end result is that the partial derivatives tell us about the slope (or gradient) of the loss function with respect to each weight and bias value - in other words, we can determine whether to increase or decrease the weight and bias values in order to decrease the loss. Having determined in which direction to adjust the weights and biases, the optimizer uses the learning rate to determine by how much to adjust them; and then works backwards through the network in a process called backpropagation to assign new values to the weights and biases in each layer. Now the next epoch repeats the whole training, validation, and backpropagation process starting with the revised weights and biases from the previous epoch - which hopefully will result in a lower level of loss. The process continues like this for 50 epochs.

Handling missing values

Input: df_students.isnull() Output: returns a dataframe indicating False where there is not a null value, and True where there is a null value

array shape

Input: grades.shape Output: (22,) 22 elements in position 0

Scikit-Learn

Library used for predictive analytics

Attached Compute

Links to existing Azure compute resources, such as Virtual Machines or Azure Databricks clusters.

Learning Rate

Low learning Rate: results in small adjusments, meaning you need more epochs to adjust High Learning Rate: results in large adjustments, meaning you need less epochs to adjust, but could miss the minimum all together

Evaluate a Regression Model

Mean Absolute Error (MAE): The average difference between predicted values and true values. This value is based on the same units as the label, in this case dollars. The lower this value is, the better the model is predicting. Root Mean Squared Error (RMSE): The square root of the mean squared difference between predicted and true values. The result is a metric based on the same unit as the label (dollars). When compared to the MAE (above), a larger difference indicates greater variance in the individual errors (for example, with some errors being very small, while others are large). Relative Squared Error (RSE): A relative metric between 0 and 1 based on the square of the differences between predicted and true values. The closer to 0 this metric is, the better the model is performing. Because this metric is relative, it can be used to compare models where the labels are in different units. Relative Absolute Error (RAE): A relative metric between 0 and 1 based on the absolute differences between predicted and true values. The closer to 0 this metric is, the better the model is performing. Like RSE, this metric can be used to compare models where the labels are in different units. Coefficient of Determination (R2): This metric is more commonly referred to as R-Squared, and summarizes how much of the variance between predicted and true values is explained by the model. The closer to 1 this value is, the better the model is performing

Different types of regression (Linear Algorithms)

Not just the Linear Regression algorithm we used above (which is technically an Ordinary Least Squares algorithm), but other variants such as Lasso and Ridge. from sklearn.tree import DecisionTreeRegressor from sklearn.tree import export_text # Train the model model = DecisionTreeRegressor().fit(X_train, y_train) print (model, "\n") # Visualize the model tree tree = export_text(model) print(tree)

Precision

Of all the cases predicted to be positive, how many are actually positive TP/(TP+FP)

How do you know how many clusters to separate your data into...? --> One way to find out is through within cluster sum of squares (WCSS)

One way we can try to find out is to use a data sample to create a series of clustering models with an incrementing number of clusters, and measure how tightly the data points are grouped within each cluster. A metric often used to measure this tightness is the within cluster sum of squares (WCSS), with lower values meaning that the data points are closer. You can then plot the WCSS for each model. **wherever the elbow is in the plot is a good indicator of where the optimal number of clusters are**

PyTorch

Open-source ML framework that accelerates the path from research prototyping to production deployment PyTorch is also an open-source and free framework based on the Torch library. It offers greater flexibility and increased speed for deep neural network implementation.

Batch

Processes the training features of a model

Compute Clusters

Scalable clusters of virtual machines for on-demand processing of experiment code.

Run an automated machine learning experiment

Select dataset: Dataset: bike-rentals Configure run: New experiment name: mslearn-bike-rental Target column: rentals (this is the label the model will be trained to predict) Training compute target: the compute cluster you created previously Task type and settings: Task type: Regression (the model will predict a numeric value) Additional configuration settings: Primary metric: Select Normalized root mean square error (more about this metric later!) Explain best model: Selected - this option causes automated machine learning to calculate feature importance for the best model; making it possible to determine the influence of each feature on the predicted label. Blocked algorithms: Block all other than RandomForest and LightGBM - normally you'd want to try as many as possible, but doing so can take a long time! Exit criterion: Training job time (hours): 0.25 - this causes the experiment to end after a maximum of 15 minutes. Metric score threshold: 0.08 - this causes the experiment to end if a model achieves a normalized root mean square error metric score of 0.08 or less. Featurization settings: Enable featurization: Selected - this causes Azure Machine Learning to automatically preprocess the features before training. 3. When you finish submi

MSE

Squared difference between the predicted and actual values. The closer it is to 0 the better fit the model Average squared error between the estimated value and the actual value

Recall

TP/(TP+FN) Of all the positive cases, how many do the model identify

Calculating Loss

Taking the Correct output, comparing to the Produced output, and calculating the absolute variance We usually deal with multiple observations, so we aggregate the variance, calculate the mean, to calculate a single, average loss value Plotting weight vs. loss identifies where weighting of a features reduces the loss function. A positive derivative meaning: incresaing weight will decrease loss A negative derivative meaning: decreasing weight will increase loss

K-Means Clustering

The algorithm we used to create our test clusters is K-Means. This is a commonly used clustering algorithm that separates a dataset into K clusters of equal variance. The number of clusters, K, is user defined. The basic algorithm has the following steps: A set of K centroids are randomly chosen. Clusters are formed by assigning the data points to their closest centroid. The means of each cluster is computed and the centroid is moved to the mean. Steps 2 and 3 are repeated until a stopping criteria is met. Typically, the algorithm terminates when each new iteration results in negligable movement of centroids and the clusters become static. When the clusters stop changing, the algorithm has converged, defining the locations of the clusters - note that the random starting point for the centroids means that re-running the algorithm could result in slightly different clusters, so training usually involves multiple iterations, reinitializing the centroids each time, and the model with the best WCSS is selected. Let's try using K-Means on our seeds data with a K value of 3.

Epoch

The training process for a deep neural network which consists of multiple iterations

Scaling numeric features

This prevents features with large values from producing coefficients that disproportionately affect the predictions.

Compute Instances Features:

Virtual Machine Type: CPU Virtual Machine Size: Standard_DS11_v2 Compute name: **unique name** Enable SSH Access: unselected

Compute Cluster Features:

While the compute instance is being created, switch to the Compute Clusters tab, and add a new compute cluster with the following settings. You'll use this to train a machine learning model: Virtual Machine priority: Dedicated Virtual Machine type: CPU Virtual Machine size: Standard_DS11_v2 (Choose Select from all options to search for and select this machine size) Compute name: enter a unique name Minimum number of nodes: 0 Maximum number of nodes: 2 Idle seconds before scale down: 120 Enable SSH access: Unselected

R^2

correlation between x and y squared. Closer to 1, the better the model is predicting (aka x ~y)

NumPy List

data = [50,50,47,97,49,3,53,42,26,74,82,62,37,15,70,27,36,35,48,52,63,64] print(data)

Left skew data

data has a long tail to the left meaning the mean is being pulled to the left

Right skew data

data has a long tail to the right meaning the mean is being pulled to the right

drop rows with null values

df_students = df_students.dropna(axis=0, how='any')

impute and fill missing values with the mean

df_students.StudyHours = df_students.StudyHours.fillna(df_students.StudyHours.mean())

Code to record the number of observations

from azureml.core import Experiment import pandas as pd #Create an Azure ML experiment in your workspace experiment = Experiment(workspace = ws, name = 'my-experiment') #Start logging data from the experiment Run = experiment.start_logging() #load the dataset and count the rows data = pd.read_csv('data.csv') row_count = (len(data)) #Log the row count run.log('observations', row_count) #Complete the experiment run.complete()

scipy package for linear regression

from scipy import stats # df_regression = df_sample[['Grade', 'StudyHours']].copy() # Get the regression slope and intercept m, b, r, p, se = stats.linregress(df_regression['StudyHours'], df_regression['Grade']) print('slope: {:.4f}\ny-intercept: {:.4f}'.format(m,b)) print('so...\n f(x) = {:.4f}x + {:.4f}'.format(m,b)) # Use the function (mx + b) to calculate f(x) for each x (StudyHours) value df_regression['fx'] = (m * df_regression['StudyHours']) + b # Calculate the error between f(x) and the actual y (Grade) value df_regression['error'] = df_regression['fx'] - df_regression['Grade'] # Create a scatter plot of Grade vs StudyHours df_regression.plot.scatter(x='StudyHours', y='Grade') # Plot the regression line plt.plot(df_regression['StudyHours'],df_regression['fx'], color='cyan') # Display the plot plt.show()

Agglomerative Clustering

from sklearn.cluster import AgglomerativeClustering agg_model = AgglomerativeClustering(n_clusters=3) agg_clusters = agg_model.fit_predict(features.values) agg_clusters

Calculate the ROC

from sklearn.metrics import roc_curve from sklearn.metrics import confusion_matrix import matplotlib import matplotlib.pyplot as plt %matplotlib inline # calculate ROC curve fpr, tpr, thresholds = roc_curve(y_test, y_scores[:,1]) # plot ROC curve fig = plt.figure(figsize=(6, 6)) # Plot the diagonal 50% line plt.plot([0, 1], [0, 1], 'k--') # Plot the FPR and TPR achieved by our model plt.plot(fpr, tpr) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('ROC Curve') plt.show()

RMSE

if RMSE = 3, it means on average the prediction is wrong by about 3 rentals Tells us about the average distance between the predicted values from the model and the actual values in the dataset.

Get metrics from Azure ML

import json #Get logged metrics metrics = run.get_metrics() print(json.dumps(metrics,indent = 2))

Classification Metrics

import pandas as pd from matplotlib import pyplot as plt %matplotlib inline from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # load the training dataset !wget https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/Data/ml-basics/diabetes.csv diabetes = pd.read_csv('diabetes.csv') # Separate features and labels features = ['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age'] label = 'Diabetic' X, y = diabetes[features].values, diabetes[label].values # Split data 70%-30% into training set and test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0) print ('Training cases: %d\nTest cases: %d' % (X_train.shape[0], X_test.shape[0])) # Train the model from sklearn.linear_model import LogisticRegression # Set regularization rate reg = 0.01 # train a logistic regression model on the training set model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train) predictions = model.predict(X_test) print('Predicted labels: ', predictions) print('Actual labels: ' ,y_test) print('Accuracy: ', accuracy_score(y_test, predictions)) from sklearn. metrics import classification_report print(classification_report(y_test, predictions)) from sklearn.metrics import precision_score, recall_score print("Overall Precision:",precision_score(y_test, predictions)) print("Overall Recall:",recall_score(y_test, predictions)) from sklearn.metrics import confusion_matrix # Print the confusion matrix cm = confusion_matrix(y_test, predictions) print (cm) y_scores = model.predict_proba(X_test) print(y_scores)

One vs Rest (OVR) Multiclass Classification models

in which a classifier is created for each possible class value, with a positive outcome for cases where the prediction is this class, and negative predictions for cases where the prediction is any other class. For example, a classification problem with four possible shape classes (square, circle, triangle, hexagon) would require four classifiers that predict: square or not circle or not triangle or not hexagon or not

Concatinating data in a new column

passes = pd.Series(df_students['Grade'] >= 60) df_students = pd.concat([df_students, passes.rename("Pass")], axis=1) df_students


Kaugnay na mga set ng pag-aaral

Toxicology - Chapter 3 Biotransformation

View Set

Postulates and Theorems related to parallel and perpendicular lines

View Set

Chapter 28. The Female Reproductive System (Sections 1-4) Homework Assignment

View Set

Art Appreciation InQuizitive 2.7 - Visual Communication Design

View Set

Psychology - 2. Psychology Research

View Set

Chp 14- Gene Expression at the Molecular Level III: Gene Regulation

View Set

SOS Biology unit 4 Practical Applications in plants

View Set

Chapter 4 Accounting Sample Problems

View Set