Binary Logistic Regression
Why can't a logistic regression coefficient be interpreted in the same manner as an OLS coefficient?
-A continuous IV cannot be directly interpreted in probability terms. e.g., For every one year increase in age, the log of the odds of using safe sex increases by .13 units. (doesn't make sense!!) -Also, because the relationship between the IVs and DVs in a logistic regression analysis is nonlinear, the logistic regression coefficient cannot be interpreted in the same manner as an OLS coefficient.
Problems with using an OLS regression model on binary dependent variables
1. It may lead to predicted probabilities of the DV that lie outside the limiting values of 0 and 1.0 (can be less than 0.0 or greater than 1.0. **This is because the OLS model assumes the line that best fits the data is a straight or linear line across. It assumes a relationship b/w the IV and the probability of the DV is linear across ALL POSSIBLE VALUES of the IV. Also outliers can exist. 2. The relationship between the IV and the Dv may be nonlinear. **Means that the expected change in the DV for a 1-unit change in the IV is not constant for each value of the IV (as the linear model assumes) 3. The error terms for a binary variable will not be normally distributed across all values of x. The error terms do not have a constant variance (heteroscedastic). **This is because the variance of a binomial distribution is greatest where p = .50 and declines as p approaches both 0 and 1. In using OLS, the result would be biased hypothesis tests.
What assumptions do OLS and Binary Logistic Regression share?
1. The DV is function of one or more IVs 2. The data are randomly selected and independent observations 3. The is no multicollinearity
Compare and contrast OLS model and logistic regression model.
Both can be used to predict the values of a dichotomous dependent variable using one or more independent variables, and we can determine whether a logistic regression coefficient is significantly different from zero. However, there are problems that can arise when using an OLS model (linear probability model) for binary DV ,which makes the logistic regression model more appropriate to use when the DV is binary or dichotomous.
How can we examine the relative strength of our independent variables?
By comparing the magnitude of the t or Wald coefficients. The one with the largest absolute value of t, has the greater explanatory "punch."
How can you tell if an independent variable is useful in explaining the dependent variable?
By looking at the fit of the model (-2LL). It should be better than the baseline or constant-only model. A good model, one wherein the probability of the observed results is high, is one with a small value of -2LL, so small values of -2LL are better than larger values. Or By looking at the chi-square statistic. The chi-square tests the extent to which this is significantly different. If you can reject the null, you can conclude that the model is significantly predicting the probability of the DV=1.
Example of interpretation of odds ratio or Exp(b) Predicting Whether Homicide Defendants were Convicted (0=no, 1=yes) by Victim Provocation (0=No provocation, 1=provocation claimed). Exp(b)=.388.
Exp(b)=.388 (.388-1) X 100=61.2% When homicide defendants claim victim provocation, the odds of being convicted decreased by 61.2%.
Null hypothesis for b
Just like the null for an OLS slope - instead of using t sampling distribution, the WALD statistic is used Ho = No relationship between x and y OR, B=0.
Heteroscedasticity
Occurs when the assumption of homoscedasticity is violated and indicates that the error terms are not constant across all values of x.
Positive vs. negative coefficients.
Odds Ratios above 1 (positive) indicate an increase in Y and Odds Ratios below 1 (negative) indicate a decrease in Y - for both, we can use this formula to predict the change in the odds: Exp(b) = 1.14 (1.14-1) X 100 = 14% As x increases by 1 unit, the ODDS of y equaling 1 increases 14% Exp(b) = .72 (.72 - 1) = -.28 X 100 = -28% As x increases by 1 unit, the ODDS of y equaling 1 DECREASE by 28%
Model fit?
One way to test the goodness of the fit for the model is to determine the likelihood of a given model, which is the probability of the observed results given the sign and magnitude of the estimated regression parameters. Basically, you can test the null hypothesis that all the IVs in the model are equal to 0 by using a chi-square statistic.
What is the maximum likelihood principle?
The coefficients are estimated so as to maximize the probability or likelihood of obtaining the observed data. It is how the logistic model fits the S shaped curve of the data well.
In 6 sentences or less, describe what the Logistic Regression is actually doing statistically and why we can't use OLS for a dichotomous dependent variable.
The logistic regression model, which is based on the cumulative probability distribution, can be used to estimate the probability of a binary event occurring (or something called the log of the odds of the dependent variable occurring). Essentially, it is used to estimate the probability of a binary response based on one or more independent variables. Unlike for OLS, the logistic regression coefficient cannot be interpreted directly in probability terms, it can be used to estimate the predicted probability of the dependent variable at different values of the independent variable. The reasons why OLS cannot be used for a dichotomous dependent variable is because it may lead to predicted probabilities of the DV that lie outside the limited values of 0.0 and 1.0. Additionally, the relationship between the IV and DV may be nonlinear. Also, the error terms do not have a constant variance. That is, they will not be normally distributed across all values of x, indicating heteroscedasticity.
What is the best way to interpret b?
The most understandable way to interpret b is to calculate probabilities from the regression equation. (equation with exponent of 10 in denominator and numerator)
Dichotomous variable
Two category variable that is typically coded "0" or "1". E.g., gender=male or female.
Logistic regression model
Used to predict a dependent variable with two categories (0, 1), called a binary or dichotomous variable. It is used to estimate the probability of a binary response based on one or more independent variables. e.g., the probability of a victim of violent crime reporting to the police.
Interpretation of b1
We are predicting the natural logarithm of an event occurring (DV=1) over the probability of an event Not occurring - this ratio is referred to as the odds of an event occurring. In other words, we are predicting the log of the odds of an event occurring.
What information is obtained from a logistic regression SPSS output?
o Constant and estimated logistic regression coefficient for each IV in the model o Estimated standard error of each coefficient o A Wald or t test for the statistical significance of the estimated regression coefficient (whether it's significantly different from 0). o Various indicators of how well the model fits the data