Cosc 3337 Week8
Regression Metrics
1.Mean Absolute Error. 2.Mean Squared Error. 3.R^2
Classification Metrics
1.Classification Accuracy. 2.Logarithmic Loss. 3.Area Under ROC Curve. 4.Confusion Matrix. 5.Classification Report.
CROSS VALIDATION - THE IDEAL PROCEDURE
1.Divide data into three sets, training, validation and test sets 2.Find the optimal model on the training set, and use the test set to check its predictive capability 3.See how well the model can predict the test set 4.The validation error gives an unbiased estimate of the predictive power of a model
Best Practice for Reporting Model Fit
1.Use Cross Validation to find the best model 2.Report the RMSE (mean squared error) and MAPE (mean absolute percent error) statistics from the cross validation procedure 3.Report the R Squared from the model as you normally would.
4. Confusion Matrix..
A confusion matrix is an N X N matrix, where N is the number of classes being predicted. Here are a few definitions, you need to remember for a confusion matrix : Accuracy : the proportion of the total number of predictions that were correct. Positive Predictive Value or Precision : the proportion of positive cases that were correctly identified. Negative Predictive Value : the proportion of negative cases that were correctly identified. Sensitivity or Recall : the proportion of actual positive cases which are correctly identified. Specificity : the proportion of actual negative cases which are correctly identified
3. Area Under ROC Curve.(Receiver operating characteristic)
Area under ROC Curve (or AUC for short) is a performance metric for binary classification problems. The AUC represents a model's ability to discriminate between positive and negative classes. An area of 1.0 represents a model that made all predictions perfectly. An area of 0.5 represents a model as good as random. ROC can be broken down into sensitivity and specificity. A binary classification problem is really a trade-off between sensitivity and specificity. Sensitivity is the true positive rate also called the recall. It is the number instances from the positive (first) class that actually predicted correctly. Specificity is also called the true negative rate. Is the number of instances from the negative class (second) class that were actually predicted correctly.
Classification Accuracy
Classification accuracy is the number of correct predictions made as a ratio of all predictions made.
Gain and Lift charts
Lift is a measure of the effectiveness of a predictive model calculated as the ratio between the results obtained with and without the predictive model. Cumulative gains and lift charts are visual aids for measuring model performance Both charts consist of a lift curve and a baseline The greater the area between the lift curve and the baseline, the better the model
2.Logarithmic Loss.
Logarithmic loss (or logloss) is a performance metric for evaluating the predictions of probabilities of membership to a given class. Log-loss is a measurement of accuracy that incorporates the idea of probabilistic confidence given by following expression for binary class: It takes into account the uncertainty of your prediction based on how much it varies from the actual label. In the worst case, let's say you predicted 0.5 for all the observations. So log-loss will become -log(0.5) = 0.69. Hence, we can say that anything above 0.6 is a very poor model considering the actual probabilities.
•What are the consequences of overfitting?
Overfitted models will have high R2 values, but will perform poorly in predicting out-of-sample cases"
Root Mean Squared Error (RMSE)
RMSE is the most popular evaluation metric used in regression problems. It follows an assumption that error are unbiased and follow a normal distribution. •RMSE is highly affected by outlier values. Hence, make sure you've removed outliers from your data set prior to using this metric. • •As compared to mean absolute error, RMSE gives higher weightage and punishes large errors.
TRAINING/TEST DATA SPLIT
Talked about splitting data in training/test sets • training data is used to fit parameters • test data is used to assess how classifier generalizes to new data What if classifier has "non‐tunable" parameters? • a parameter is "non‐tunable" if tuning (or training) it on the training data leads to overfitting
Whick kind of cross validation
Test-Set Dis: May give unreliable estimate of future performance Adv: Cheap Leave one out Dis: expensive Adv: doesnt waste data
1. Mean Absolute Error
The Mean Absolute Error (or MAE) is the sum of the absolute differences between predictions and actual values. It gives an idea of how wrong the predictions were.
2. Mean Squared Error
The Mean Squared Error (or MSE) is much like the mean absolute error in that it provides a gross idea of the magnitude of error.
ROC curve
The ROC curve is the plot between sensitivity and (1- specificity). (1- specificity) is also known as false positive rate and sensitivity is also known as True Positive rate
3. R^2 Metric
The R^2 (or R Squared) metric provides an indication of the goodness of fit of a set of predictions to the actual values(coefficient of determination)
Gini Coefficient
•Gini coefficient is sometimes used in classification problems. Gini coefficient can be straight away derived from the AUC ROC number. •AUC itself is the ratio under the curve and the total area •Gini is nothing but ratio between area between the ROC curve and the diagonal line & the area of the above triangle. Following is the formulae used : • •Gini = 2*AUC - 1 • •Gini above 60% is a good model.
CROSS-VALIDATION
•In cross-validation the original sample is split into two parts. One part is called the training (or derivation) sample, and the other part is called the validation (or validation + testing) sample. 1)What portion of the sample should be in each part? If sample size is very large, it is often best to split the sample in half. For smaller samples, it is more conventional to split the sample such that 2/3 of the observations are in the derivation sample and 1/3 are in the validation sample. 2. How should the sample be split? The most common approach is to divide the sample randomly, thus theoretically eliminating any systematic differences. One alternative is to define matched pairs of subjects in the original sample and to assign one member of each pair to the derivation sample and the other to the validation sample. • •Modeling of the data uses one part only. The model selected for this part is then used to predict the values in the other part of the data. A valid model should show good predictive accuracy. •One thing that R-squared offers no protection against is overfitting. On the other hand, cross validation, by allowing us to have cases in our testing set that are different from the cases in our training set, inherently offers protection against overfitting.
Why cross-validation is needed ?
•One way to address this issue is to literally obtain a new sample of observations. That is, after the MLR equation is developed from the original sample, the investigator conducts a new study, replicating the original one as closely as possible, and uses the new data to assess the predictive validity of the MLR equation. •This procedure is usually viewed as impractical because of the requirement to conduct a new study to obtain validation data, as well as the difficulty in truly replicating the original study. • •An alternative, more practical procedure is cross-validation.
WHY WE NEED CROSS-VALIDATION?
•R^2, also known as coefficient of determination, is a popular measure of quality of fit in regression. However, it does not offer any significant insights into how well our regression model can predict future values. •When an MLR equation is to be used for prediction purposes, it is useful to obtain empirical evidence as to its generalizability, or its capacity to make accurate predictions for new samples of data. This process is sometimes referred to as "validating" the regression equation.
How to check if a model fit is good?
•The R2 statistic has become the almost universally standard measure for model fit in linear models. •What is R^2? • • • •It is the ratio of error in a model over the total variance in the dependent variable. •Hence the lower the error, the higher the R^2 value.