Multiple Ordinary Least Square Regression
In 5 sentences of less, describe what the Ordinary-Least Regression (OLS) Multivariate Regression is actually doing statistically?
The OLS regression model is a very good and general tool for analyzing the relationship between independent variables and a continuous dependent variable (i.e., measured at the interval/ratio level). The purpose of this model is to estimate the effect of each of these independent variables simultaneously. Specifically, it allows for the same kind of statistical control that exists in the partial cross-tabulation tables. That is, we can separate the effect of one independent variable from the effects of other independent variables in the regression model. Each independent variable has a corresponding partial slope coefficient , and it is this partial slope coefficient that measures the effect of one independent variable on the dependent variable when the effects of all the other independent variables in the model on the dependent variable have been considered (e.g., statistically controlled for).
Homoscedasticity
The assumption that the error term is independent of and therefore uncorrelated with each of the independent or x variables, that it is normally distributed, and has an expected value of 0, and constant variance across all levels of x.
What is the major difference between the bivariate and multiple regression models?
The multiple regression model has more than one independent variable. --Also, interpretation is somewhat different from the bivariate case.
T obtained value and significance in an output
The output provided by SPSS gives the exact probability of each t statistic (2-tailed) under the assumption that the null hypothesis is true. The exact probability is displayed under the column labeled Sig (for the significance of t) for each slope. Only the significance for the slope coefficients (b) matters, not the slope for the intercept/constant (a).
Example for an interpretation of a dichotomous dependent variable (DV: murder rate), (IV: region-south or non-south, poverty).
When states reside in the South, the murder rate increases by 1.812 units after controlling for poverty.
Caution of comparing partial slope coefficients
You cannot compare the relative strength of a relationship between x and y based on the magnitude of the partial slope coefficient.
What are the similarities between a bivariate and multivariate regression model?
1. The a and B terms are population parameters that are estimated by sample data 2. For both, the OLS multivariate rehression equation estimates the "best-fitting" regression line to the data. 3. It's best-fitting according to the same principle of least squares--it minimizes the sum of the squared deviations between the predicted y values and the observed y values. 4. Goal is the same: to provide the best-fitting line between a continuous dependent variable and several independent variables. 5. You CANNOT determine which independent variable has the strongest effect on the dependent variable by comparing unstandardized partial slope coefficients.
Assumptions of the Multivariate Regression Model
1. The observations are independent (were randomly selected). 2. It is assumed that all populations are normally distributed. All values of y are normally distributed at each value of x. 3. The dependent variable is measured at the interval/ratio level (data is continuous). 4. The relationship between the dependent variable and each of the independent variables is linear. 5. The assumption of homoscedasticity. 6. The assumption that the independent variables are not highly correlated among themselves or multicollinear.
What are the 3 criteria for causation?
1. There must be an empirical association between the independent and dependent variables (Association) 2. The independent variable must precede the dependent variable in time (Time Order) 3. The relationship between IV and DV must be nonspurious, which means it is not caused by a third variable.
Multivariate Regression Model
A regression model predicting one dependent variable with 2 or more independent variables.
Partial correlation coefficient (or Multiple R)
Correlation between two variables after controlling for a third variable. e.g., multiple r=.700 for DV: murder rate per 100k IV: region, poverty There is a moderate-to-strong positive relationship between the murder rates in states and independent variables of poverty and regional location.
Interpretation of slope coefficient in a multivariate regression equation
Each slope coefficient indicates the expected change in the y variable (dependent variable) associated with a 1-unit change in a given independent variable, when all other independent variable in the model are held constant. e.g., For every one-unit increase in poverty in states, the murder rate increases by .475 units even after holding constant regional location. 1 DV (murder rate), 2 IVs (poverty, region)
Partial slope coefficient (b) (aka partial regression coefficient)
Effect of an independent variable on the dependent variable after controlling for or more other independent variables. **There is one slope coefficient for each independent variable that is in the model.
What is the multiple OLS regression equation?
Equation estimated with two or more independent variables predicting one dependent variable. y=a+b1x1+b2x2+b3x3.......bkxk+E y=dependent variable x=independent variable a=intercept b=slope coefficient k=the # of independent variables E=error term
Example of a null hypothesis and alternative hypothesis. (DV: delinquency) (IV: gender, perceptions of risk)
H0: No relationship between gender and delinquency after perceptions of risk are controlled, or B=0. H0: No relationship between perceptions of risk and delinquency after gender is controlled, or B=0. H1: There is a relationship between gender and delinquency after perceptions of risk are controlled, or B≠0. H1: There is a relationship between perceptions of risk and delinquency after gender is controlled, or B≠0.
Decision for null hypothesis
If the reported significance for the partial slope coefficient is less than or equal to your chosen alpha level, your decision is to reject the null hypothesis. If the reported significance is greater, your decision is to fail to reject the null.
What does good regression analysis consist of?
Including in the model those explanatory variables that are most strongly related to the dependent variable--and unrelated to the other independent variables included in the model.
Multiple Coefficient of Determination (R^2)
Indicates the proportion of variance in the dependent variable that is explained by all the independent variables combined. e.g., muliple R^2=.430 for DV: murder rate per 100k IV: region, poverty Together, these IVs explain 43% (.430) of the variance in murder rates.
What is the purpose of the multivariate regression model?
It allows us to estimate the effect of each of the independent variables simultaneously.
What is (a) in a multivariate equation
It is the predicted value of y when ALL independent variables are equal to 0.
What does "holding constant" mean?
It means looking at the relationship between two variables (e.g., boot camp attendance and recidivism) within separate levels of a third variable (gender).
What is the error term, E, in a multiple regression equation?
It reflects those explanatory variables that are not included in the model.
Multicollinearity
Problem occurs whenever the independent variables are highly correlated with one another.
