Model Evaluation , Metrics (Week 8)

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

sum of the weights

Sum of weights will penalize small values more

What if classifier has "non‐tunable" parameters?

a parameter is "non‐tunable" if tuning (or training) it on the training data leads to overfitting

Recall u

Effectiveness of a classifier to identify class labels if calculated from sums of per-text decisions

sum of the squared weights

Squared weights penalizes large values more

Basically there are two methods to overcome overfitting ____

1.Reduce the model complexity (e.g PCA) 2.Regularization

Regularizer

- A regularizer is an additional criterion to the loss function to avoid overfitting

Root Mean Squared Error (RMSE)

- RMSE is the most popular evaluation metric used in regression problems. It follows an assumption that error are unbiased and follow a normal distribution. - RMSE is highly affected by outlier values. Hence, make sure you've removed outliers from your data set prior to using this metric. - As compared to mean absolute error, RMSE gives higher weightage and punishes large errors.

Bias and Variance Tradeoff

- Complex models (many parameters) usually have lower bias, but higher variance. - Simple models (few parameters) have higher bias, but lower variance.

When we use regularization?

- Find out if there is Multicollinearity

Gain and Lift charts

- Lift is a measure of the effectiveness of a predictive model calculated as the ratio between the results obtained with and without the predictive model. - The greater the area between the lift curve and the baseline, the better the model

A binary classification problem is really a trade-off between sensitivity and specificity.

- Sensitivity is the true positive rate also called the recall. It is the number instances from the positive (first) class that actually predicted correctly. - Specificity is also called the true negative rate. Is the number of instances from the negative class (second) class that were actually predicted correctly.

Shrinkage

- This shrinkage, aka regularization has the effect of reducing variance. - Depending on what type of shrinkage is performed, some of the coefficients may be estimated to be exactly zero. - This approach fits a model involving all p predictors, however, the estimated coefficients are shrunken towards zero relative to the least squares estimates.

K-FOLD CROSS VALIDATION

1. Split the sample into k subsets of equal size 2. For each fold, estimate a model on all the subsets except one 3. Use the left out subset to test the model, by calculating a CV metric of choice 4. Average the CV metric across subsets to get the CV error

Regression Metrics

1.Mean Absolute Error. 2.Mean Squared Error. 3.R^2

Classification Metrics

1.Classification Accuracy. 2.Logarithmic Loss. 3.Area Under ROC Curve. 4.Confusion Matrix. 5.Classification Report.

Confusion Matrix

A confusion matrix is an N X N matrix, where N is the number of classes being predicted. - Accuracy : the proportion of the total number of predictions that were correct. - Positive Predictive Value or Precision : the proportion of positive cases that were correctly identified. - Negative Predictive Value : the proportion of negative cases that were correctly identified. - Sensitivity or Recall : the proportion of actual positive cases which are correctly identified. - Specificity : the proportion of actual negative cases which are correctly identified.

What if adding predictors is not actually improving the model's fit

Adjusted R-squared (proportion of total variance explained by the model)

Precision u

Agreement of the data class labels with those of a classifiers if calculated from sums of per-text decisions

Precision m

An average per-class agreement of the data class labels with those of a classifiers

Recall m

An average per-class effectiveness of a classifier to identify class labels

Area Under ROC Curve

Area under ROC Curve (or AUC for short) is a performance metric for binary classification problems.

How does metrics influence the performance of the machine learning Algorithm?

Choice of metrics influences how the performance of machine learning algorithms is measured and compared

Classification Accuracy

Classification accuracy is the number of correct predictions made as a ratio of all predictions made.

Leave one out

Downside: expensive Upside: doesn't waste data

Test set

Downside: may give unreliable estimate of future performance Upside: Cheap

What is a good model percentage for Gini coefficient?

Gini above 60% is a good model

Gini Coefficient

Gini is nothing but ratio between area between the ROC curve and the diagonal line & the area of the above triangle

Why do models lose stability?

High error from training data (Under-fitting) , Low training error and generalization of the relationship, Zero training error (Overfitting)

What is cross validation?

In cross-validation the original sample is split into two parts. One part is called the training (or derivation) sample, and the other part is called the validation (or validation + testing) sample.

Model Interpretability

Irrelevant variables leads to unnecessary complexity in the resulting model. By removing them (setting coefficient = 0) we obtain a more easily interpretable model. However, using OLS makes it very unlikely that the coefficients will be exactly zero.

What is R^2?

It is the ratio of error in a model over the total variance in the dependent variable (Lower the error, the higher the R^2)

What does a large coefficient signify?

It means putting a lot of emphasis on that feature, i.e. the particular feature is a good predictor for the outcome.

What portion of the sample should be in each part? (Cross validation)

Large sample (Split it 50/50), Small sample (2/3 training, 1/3 testing & validation)

Logarithmic Loss

Logarithmic loss (or logloss) is a performance metric for evaluating the predictions of probabilities of membership to a given class.

What are the type of machine learning problem metrics demonstrated?

Metrics are demonstrated for both classification and regression type machine learning problems.

What are the consequences of overfitting?

Overfitted models will have high R2 values, but will perform poorly in predicting out-of-sample cases or test cases

Variance has what kind of fitting ____

Overfitting

What is RMSE good for?

RMSE is a good measure of how accurately the model predicts the response, and it is the most important criterion for fit if the main purpose of the model is prediction.

Advantage Of ROC

ROC curve is almost independent of the response rate

Fscore m

Relations between data's positive labels and those given by a classifier based on a per-class average

Fscore u

Relations between data's positive labels and those given by a classifier based on sums of per-text decisions

RIDGE REGRESSION

Ridge Regression is a technique used when the data suffers from multicollinearity ( independent variables are highly correlated).

How do we determine if one model is predicting better than another model?

Take the difference between observed (y) and predicted values (f), when applying the model to unseen data

Which kind of Cross Validation?

Test set, leave one out

Mean Absolute Error

The Mean Absolute Error (or MAE) is the sum of the absolute differences between predictions and actual values. It gives an idea of how wrong the predictions were.

How to check if a model fit is good?

The R2 statistic has become the almost universally standard measure for model fit in linear models.

R^2 Metric

The R^2 (or R Squared) metric provides an indication of the goodness of fit of a set of predictions to the actual values(coefficient of determination)

Why is ridge regression better than least squares?

The advantage is apparent in the bias-variance trade-off. As λ increases, the flexibility of the ridge regression fit decreases. This leads to decrease variance, with a smaller increase in bias - In Ridge, by properly tuning λ and acquiring less variance at the cost of a small amount of bias find a lower potential MSE.

Mean error (ME)

The average dollar amount or percentage points by which forecasts differ from outcomes

Mean Absolute Error (MAE)

The average of absolute dollars amount or percentage points by which a forecast differs from an outcome

Mean Absolute Percentage Error (MAPE)

The average of absolute percentage amount by which forecasts differ from outcomes

Mean Percentage Error (MPE)

The average of percentage errors by which forecasts differ from outcomes

Mean Squared Error (MSE)

The average of squared errors over the sample period

Error Rate

The average per class classification error

Average Accuracy

The average per class effectiveness of a classifier

What is the impact of model complexity on the magnitude of coefficients?

The size of coefficients increase exponentially with increase in model complexity

Shrinkage also performs variable selection

The two best-known techniques for shrinking the coefficient estimates towards zero are the ridge regression and the lasso.

Subset Selection

This approach identifies a subset of the p predictors that we believe to be related to the response

Bias has what kind of fitting ____

Underfitting

What is Maximum likelihood estimation (MLE)?

is a method of estimating the parameters by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable

Ridge regression

is similar to least squares except that the coefficients are estimated by minimizing a slightly different quantity.

Disadvantage of RIDGE REGRESSION

it includes all p predictors in the final model. The penalty term will set many of them close to zero, but never exactly to zero. This isn't generally a problem for prediction accuracy, but it can make the model more difficult to interpret the results.

What does F-Test determine?

the F-test determines whether the proposed relationship between the response variable and the set of predictors is statistically reliable


Kaugnay na mga set ng pag-aaral

What is ChatGPT? The new AI wonder tool explained (hun)

View Set

🔬bio midterm (quizizz) 2020🔬

View Set

Intermediate Accounting II - Ch. 15 Conceptual (Stockholders' Equity)

View Set

NU226 CH 39 Oxygenation and perfusion

View Set

1) Chapter 26: Management of Patients With Dysrhythmias and Conduction Problems

View Set