Quantitative Methods: Multiple Regression

Ace your homework & exams now with Quizwiz!

Adjusted R^2

- Used as a measure of model goodness of fit since it does not automatically increase as independent variables are added to the model - Adjusts for degrees of freedom by incorporating the number of independent variables - Increases if a variable is added to a model that has a coefficient t-stat with an absolute value of >1 - Decreases if a variable is added to a model that has a coefficient t-stat with an absolute value of <1 - The higher the better 1 - [ [(n-1)/(n-k-1)] x (1-R^2)] R^2 = SSR / SST

Studentized Residuals (ti*)

- Used to identify outliers Steps: 1. Estimate the regression model using the original sample size of n. Delete one observation and re-estimate the regression using n-1 observations. Perform this sequentially for all observation, deleting one at a time 2. Compare the actual Y value of the deleted observation i to the predicted Y-values using the model parameters estimated with that observation deleted (ei* = Yi - Yi*) 3. Studentized Residual is the residual in step 2 divided by its standard deviation (ti* = ei* / s*) 4. We can then compare this studentized residual to critical values from a t-distribution with n-k-2 degrees of freedom to determine if the observation is influential (ABSOLUTE VALUE AGAINST A 2 TAIL DISTRIBUTION)

Homoskedasticity

- Variance of regression residuals is the same for all observations - Beat with scatter plots that that compares regression residuals vs predicted values

Outliers

Extreme observations of the dependent (Y) variable

Influential Data Points

Extreme observations that, when excluded, cause a significant change to model coefficients

Changes in Slope Coefficient

For every 1 unit change in the independent variable, the dependent variable changes by the slope amount

Likelihood Ratio

- A method to assess the fit of logistic regression models that is based on the log-likelihood metric that describes the model's fit to the data - Log-Likelihood is always a negative number, so higher values (closer to 0) indicate a better-fitting model - Chi-Square Distribution = -2(Log Likelihood Restricted Model - Log Likelihood Unrestricted Model)

Normal Q-Q Plot

- A visual used to compare the distribution of the residuals from a regression to a theoretical normal distribution - The residuals should lie along a diagonal if they are normally distributed

Dummy Variable

- An independent variable that takes on a value of either 1 or 0, depending on a specified condition - Used to quantify the impact of qualitative events - Can be the intercept or the slope - AKA: Indicator Variable

High-Leverage Points

- Extreme observations of the independent (X) variables - Identified using Leverage (Lij) - Leverage is the measure of the distance between the "j"th observation of the independent variable (i) relative to its sample mean

Leverage (Lij)

- Measure of the distance between the "j"th observation of the independent variable (i) relative to its sample mean - Takes a value between 0 and 1 (The higher the value, the greater the distance, and hence the higher the potential influence on the observation on the estimated regression parameters - If leverage is greater than 3 x [(k + 1) / n], then the observation is potentially influential - k = # of Independent Variables

Nested Model

- Models in which one regression model has a subset of the independent variables of another regression model - F-Statistic for this one

Coefficient of Determination (R^2)

- Percentage of the variation of the dependent variable that is explained by the independent variables - EXPLANATORY POWER LIMITATIONS: - Can't provide info on whether coefficients are statistically significant - Can't provide info on whether there are biases in the estimated coefficients and predictions - Can't tell whether the model fit is good SSR/SST OR [SST-SSE] / SST

Logistic Regression (Logit) Model

- Regression model that uses an exponential function of variables to estimate a response between 0 and 1. - Slopes are interpreted as the change in the log odds of the event occurring per 1-unit change in the independent variables while holding all other variables constant - Intercept is interpreted as the log odds of the response variable occurring when all independent (predictor) variables are 0 ln(p / (1-p)) = b + mx + e p: Odds / (1 + Odds) OR p: 1 /( 1 + e ^ (- linear_model))

Restricted Model

- Regression model with a subset of the complete set of independent variables - In hypothesis testing, the model obtained after imposing all of the restrictions required under the null

Unrestricted Model

- Regression model with the complete set of independent variables - In hypothesis testing, the model that has no restrictions placed on its parameters.

Akaike Information Criterion (AIC)

- Statistic used to compare sets of independent variables for explaining a dependent variable - Preferred for finding the model that is best suited for a prediction - The lower the better n ln(SSE/n) + 2 (k+1)

Schwarz Bayesian Information Criterion (BIC or SBC)

- Statistic used to compare sets of independent variables for explaining a dependent variable - Preferred for finding the model with the best goodness of fit - The lower the better n ln(SSE/n) + ln(n) (k+1)

Partial Regression Coefficient

- The regression coefficient in a multiple regression equation - Describes the effect of a one unit change in the independent variable on the dependent variable, holding all other independent variables constant - AKA: Partial Slope Coefficient

F-Statistic

MSR/MSE OR JOINT [(SSE Restricted - SSE Unrestricted) / q] / [SSE Unrestricted / (n-k-1)] OR [(SST Unrestricted - SSE Unrestricted) / k] / [SSE Unrestricted / (n-k-1)] OR [RSS Unrestricted / k] / [SSE Unrestricted / (n-k-1)] Q = # of variables omitted in restricted model n = DOF k = # of Independent Variables

Multiple Linear Regression

Modeling and estimation method that uses two or more independent variables to describe the variation of the dependent variable. Also referred to as multiple regression. Y = (M1X1) + (M2X2) +...+ (MnXn) + b + e

Log-Likelihood Criteria

The higher the criteria, the better the fit

# of Independent Variables in a Model

k

Probability

1 / [1+e^(-Regression Model)]

Assumptions of Multiple Regression

1) Linearity 2) Homoskedasticity 3) Independence of errors 4) Independence of Independent Variables 5) Normality

Qualitative Dependent Variable

A categorical variable (usually binary) which takes on a value of either 1 or 0

Cooks Distance (Di)

A composite metric for evaluating if a specific observation is influential (i.e., it takes into account both the leverage and outliers) e^2 / [(k+1) x MSE] x [hii / (1-hii)^2] e: Residual to the ith observation k: # of independent variables MSE: Mean Square Error of Regression Model hii: Leverage value for the ith observation - Values greater than (k/n)^(0.5) indicate that the ith observation is highly likely to be an influential data point - Value > 1 indicate a high likelihood of an influential observation - Value > 0.5 indicate further research is required

Maximum Likelihood Estimation (MLE)

A method that estimates values for the intercept and slope coefficients in a logistic regression that make the data in the regression sample most likely.

Logistic Regression

A statistical analysis which determines an individual's risk of the outcome as a function of a risk factor. The outcome of interest has two categories. Best for when the dependent variable is NOT continuous

Analysis of Variance (ANOVA)

A table that presents the sum of squares, degrees of freedom, mean squares, and F-Statistic for a regression model

Interaction Term

A term that combines two or more independent variables and represents their joint influence on the dependent variable

Influence Plot

A visual that shows, for all observations, studentized residuals on the y-axis, leverage on the x-axis, and Cook's D as circles whose size is proportional to the degree of influence of the given observation

Slope Dummy

Allow the slope of the relationship between the dependent variable and an independent variable to be different depending on whether the condition specified by a dummy variable is met D=0: Y = b + mx + e D=1: b + (m + D)x + e

Intercept Dummy

Changes the constant or intercept term, depending on whether the qualitative condition is met D=0: Y = b + mx + e D=1: Y = (b + D) + mx + e

P-Value

The smallest level of significance for which the null hypothesis can be rejected - P-Value < Level of Significance: Null is rejected - P-Value > Level of Significance: Null cannot be rejected


Related study sets

Chapter 2: The Founding and the Constitution

View Set

27 Grievances of the Declaration of Independence

View Set

EAQ Chronic Inflammatory Bowel Disorders

View Set

Ch. 25 Drug Therapy for Seizures

View Set

Rasmussen Adult 3 Exam 1 with NCLEX Oncology Questions

View Set