Logistic Regression

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Accuracy

(TP+TN) / (TP+TN+FN+FP) (label everyone as not a terrorist)

Range of sigmoid function is ______.

0 to 1

Confusion Matix

1 is (+) ; 0 is (-)

types of sigmoid curve

2 most common logic and probi function with logistic is most popular

Which of the following scenarios/models might not have a 0.5 as a threshold?

A model predicting if an accused is guilty. A model predicting if a person has COVID-19 A model predicting if a person should be given a loan

want precision and recall to be high

F1

example of regression

House Price prediction based on area, number of rooms, lawn, pool, etc.

The odds ratio can be defined as (where P is the probability of an event occurring):

Odds = P / (1- P)

example of classification

Predicting whether a person will be diabetic in the future or not based on BP, glucose, insulin, etc.

TN (specificity)

TN / TN + FP (porportion of total negatives that were correctly identified) ideally high #s

Precision

TP/(TP+FP) out of all predicted terr..what fraction were really terrorist

The log of the odds ratio is a linear function

True

The output of the sigmoid function is between 0 and 1 which can be interpreted as the probability of target being equal to 1.

True

Recall

True P/(TrueP+FalseN) (label all as a terrorist)

Example of confusion matrix

True Positives (TP): A person has diabetes and the model predicted that person is diabetic. True Negatives (TN): A person doesn't have diabetes and the model predicted that person doesn't have diabetes. False Positives (FP): A person doesn't have diabetes but the model predicted that person has diabetes. False Negatives (FN): The person has diabetes but the model predicted that person doesn't have diabetes.

When do we label encode and create dummy variables for categorical levels?

We generally prefer xxxxxxxxxx when there is a sense of order on the values, for example, let's say a variable has values bad, good, very good in such a case we know that there is an order and we can encode them as 1,2,3 respectively. But let's say the values are red, blue, green in this case there is no definite order in values and hence creating dummy variables would be a better choice.

Logistic Reg Pros

a classification model that does give probabilities easily extended to multiple classes (mul regreesions) quick to train and very fast at classifying unknown records

TP (also called Recall)(sensitivity)

all that had disease/what fraction did i catch TP/TP + FN (proportion of total positives that were correctly identified)ideally high #s

gini coefficient

area A to B gini coeff = A / A + B AUC - 0.5 / 0.5

Logistic Reg Cons

constructs linear boundaries assumes that variables are independent (does not include interaction terms) interpretation of coefficients is difficult

Which of the following is minimized in logistic regression?

crosss entropy

Classification example (black and white)

either or belongs to one category like pass or fail did a student get a passing or failing grade? trying to predict whether a student passes or fails is a xxxx problem

Aloev Hospitals are using a machine learning algorithm for 1st round of cancer detection. For this algorithm - False Positive (classifying a fit (normal) person as having cancer) is more expensive than False Negative (Classifying a patient having cancer as fit)

false

In a classification model that gives probabilities as output, you get only one confusion matrix for different thresholds

false

ROC curves,

for classification problems with probability outputs, a threshold can covert probability

Sigmoid Curve (S-Curve)

functions that always look like S instead of this y = a + bx (regression) we use this y = f(a + bx)

Logistic Regression ___ a misnomer because it is used for __________ tasks.

is ; classification

Logistic Regression

look for best logic thats fits our data (using it for classification)

logictic regression

minimize log loss; probability of pass/fail

logic function

multiple dimensions

What is misclassification

occurs when values are predicted incorrectly or the model assigns the observation to a different class instead of the class it should be in. For example, for observation, the actual label is class 0 but the model predicts this observation as class 1.

Threshold is the value which is used to convert ______ to ________.

probabilities, classes

what are two types of supervised learning

regression and classification

When should we use Recall as model performance evaluation criteria?

should be used when you want to minimize False Negatives, i.e. one wants at least positives should not be predicted as negatives. Also, in cases where the loss of opportunity is high.

When should we use Precision as model performance evaluation criteria?

should be used when you want to minimize False Positives i.e. one wants at least negatives should not be predicted as positives. Also, in cases where the loss of resources is high.

classification error rate

sum of type I (FP) and type 2 (FN) errors

Misclassified points add significantly to the log loss.

true

Regression example 2

trying to predict what score a stuident might get out of a hundred is a regression

False positive

type I errors

False negative

type II errors

Supervised Learning definition

we already know the target variable

Logistic Formula

y = 1 / 1 + e (1(a + bx) denominator (1 and infinity) y - 0 and 1


Kaugnay na mga set ng pag-aaral

Macroeconomics Test: Part 2 Study Guide

View Set

Chapter 24 Performance appraisal

View Set

Cervical Edema, Rigidity, Anterior Lip, Retained Placental Fragments

View Set

Equivalent Fractions, Decimals and Percentages

View Set

Nutrition & Drugs - Ch. 6 Fats & Lipids

View Set