Belle's Ecom - Multiple Linear Regression - Estimation
Effect of additional regressors on R-squared
R-squared ALWAYS rises when regressors are added to the regression unless the estimated coefficient on a regressor is exactly 0 (rare) - R-squared will either go up or not change w/ additional regressors We often work with adjusted R-squared as it does not necessarily increase when a new regressor is added
Imperfect multicollinearity
Arises when one of the regressors is highly correlated, but not perfectly correlated, with other regressors This results in regression coefficients being estimate imprecisely and having large standard errors, and therefore statistically insignificant regression coeffcients This makes it hard to disentangle their individual impacts on the dependent variable in the regression
Implications of OVB
As n gets large, beta1_hat does not get close to beta with high probability
Signing OVB
Correlation between Y and OV, and X and OV
Distribution of OLS estimators in multiple linear regression
Different samples produce different values for the OLS estimators These estimators are random variables with a distribution OLS estimators are unbiased and consistent estimators of their population true values
Heteroskedasticity
Error term in the regression is homoskedastic if the variance condition on all of the regressors is constant Otherwise, the error term is heteroskedastic
Standard error of the regression (SER)
Estimates the standard deviation of the error term u_i:
Magnitude of the OLS estimate
If beta1_hat < beta1: estimate of the relationship will be too large relative to the true value of beta
Omitted Variable Bias (OVB)
If the regressor is correlated w/ a variable that has been omitted from the analysis AND that determines, in part, the dependent variable, the OLS estimator of the effect of interest will suffer from OVB
Dummy variable trap/perfect multicollinearity (3)
Multicollinearity arises when multiple dummy variables are used as regressors When a group of dummy variables add up to always equal another dummy variable (or the constant regressor, which is uncommon) This can be avoided by dropping one of the dummy variables or the constant
Multiple linear regression
Multiple linear regression model extends the single linear regression model to include additional variables as regressors Model allows us to estimate the effect on Y of changing one variable while holding other regressors constant (or fixed)
OVB and OLS Assumption #1
OVB means the first least square assumption fails.
When does OVB occur (2)
Occurs when the omitted variable satisfies two conditions: 1. Correlated with the included regressor 2. Helps determine the dependent variable If OVB exists, the estimate of beta1 is now biased
Population regression model with k regressors
Regressors may be any combination of continuous or dummy variables
Impacts of adding additional regressors
Rich and richer regressions allows us to control for many other variables that predict Y in avoiding OVB to isolate relationships More variables on the RHS means more stuff we're taking out of the variable term (less subject to OVB)
Fixing OVB
Taking a sub-sample and plot for this sub-sample Can determine whether the relationship is driven by a direct or indirect relationship
OLS estimation with multiple linear regression
The OLS estimator aims to find the regression coefficients that together minimise the mistakes the model makes in predicting the dependent Y given the k regressors
Perfect multicollinearity
Two regressors exhibit perfect multicollinearity if one regressor is a perfect linear combination of other regressors Assumption 4 (no perfect multicollinearity) requires that no regressors exhibit perfect multicollinearity
Control variables
Variables that are held fixed to estimate the effect of Y by changing another variable (eliminates OVB)
Coefficient interpretation and partial effect
When interpreting, we imagine changing only one regressor at a time, leaving the others fixed