Research Methodology 6

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Adjusted R^2

- Adding more independent variables to a multiple regression always leads to a change in R^2 >= 0, even if the added variables do not explain significantly more variance - Adjusted R^2 measure takes into account the change in degree of freedom that results from the intro of additional IV's - Adjusted R^2 always <= R^2 and difference between the two measures increases when nonsignificant independent variables are added to the regression equation

Multicollinearity

- Common for IVs in multiple regression to be correlated not only with the dv but also between themselves - When correlations between two or more IV's are very high, problems of multicollinearity may arise Multicollinearity leads to undesirable consequences, including: - inflated sterrors for the coefficient estimates - as a result, small t-values for the regression coefficients - regression coefficients with the wrong sign (opposite of what would be expected) - regression coefficients that "flip" (change their sign) when additional variables are introduced into the equation

Multiple Coefficient of Determination (R^2)

- Describes the proportion of variance in the DV that is "explained: by the set of IVs in the multiple regression equation - Ranges from 0 (no variance explained) to 1 (all variance explained) - Computed by: R^2 = (SSR/ SSY) = 1 - (SSE/SSY) - In excel, values for SSR, SSE, & SSY appear in ANOVA

Testing Overall Significance of the Model

- In multiple regression we can test the overall significance of the model with an F test: H0: β1 = β2 = β3 = ... = βk = 0 Ha: at least one of the regression coefficients is ≠ 0 - Rejection of null hypothesis implies at least one of the IV's in the model explains a significant amount of variance in the DV. - To test the hypothesis, we compute F value (in picture

Standardized Regression Coefficients

- Occasionally variables are measured on different scales (eg. square meters, years, CHF) - Magnitudes of unstandardized coefficients are hard to compare when the underlying variables are measured on different scales. - Standardized regression coefficients (beta coefficients) are a solution - They result from a multiple regression equation in which all variables are standardized (transformed to standard scores) - In Excel and R we must standardize by hand before entering them

Standard Error of the Estimate

- Same as in simple regression, provides measure of overall regression error - residual in a multiple regression model is the difference between the actual Y value and predicted Y value (Y - Ŷ) - Sum of square error (SSE) are the same as in simple regression: SSE = Σ (Y-Ŷ)^2 - Sterror computed as (image) N = number of observations k = number of IVs

Coefficients Table

- Shows significance of the coefficients

Output From Other Software Packages

- Some advances software packages report unstandardized and standardized coefficients side by side

Standardizing Variables in Excel

- Unfortunately the Data Analysis ToolPak has no option for standardizing variables - You have to accomplish the by hand applying the following steps - Compute mean and standard deviation for each variable that you want to standardize - Use standardize function with the following arguments x = the value to be standardized mean = the mean of the variable standard_dev = standard deviation of the variable - A frequently used convention is to name the standardized variables as "z.VARIABLENAME" eg. "z.AGE" or "z_Age"

Multicollinearity Diagnostics in Excel

- by hand - run k regressions (one for each of the k IVs) - in each of the k models, one of the IV's is cast as DV and is regressed on all other IVs - R^2 from this model allows for the computation of tolerance and VIF for the focal IV

Testing the Significance of the Individual Regression Coefficients

- regression coefficients obtained in multiple regression are sample estumates - since we're interested in population coefficients, we need to conduct a hypothesis test for each coefficient, and test these hypotheses using t-tests

Standardized vs. Unstandardized Regression Coefficients: Interpretation of Significance

- t-values and p-values for unstandardized and standardized are exactly the same since standardization is only a linear transformation of the variables and doesn't change the linear relationship between them - therefore, if unstandardized coefficient is significant, so is the standardized coefficient and vice versa

Detecting Multicollinearity

Two Approaches: 1. Inspecting a matrix of bivariate correlations between all pairs of IV's and identifying correlations beyond a predefined cutoff value (0.75 or 0.8) - not iptimal given that one IV may be a linear combo of several other IVs without necessarily being very highly correlated with any single one of them 2. Regression every independent variable on all other IV's examining the R^2 in each of the resulting multiple regression equations - an unusually high R^2 value in one of these equations indicates that the combined set of all other independent variables explains a large proportion of the variance in the focal IV - this approach is superior to the first option

Measures for Diagnosing Multicollinearity

Two commonly used measures to detect: 1. Tolerance - Tolerance = 1 - R^2 2. Variance inflation factor (VIF) - VIF = 1/Tolerance = 1/(1-R^2) - The R^2 in these equations represents the value from a regression in which a focal IV is regressed on all the other IV's - A frequently used rule of thumb indicates that any variance inflation factor (VIF) > 10 raises serious concerns about multicollinearity

Standardized vs. Unstandardized Regression Coefficients: Interpretation of Magnitude

Unstandardized: A one unit change in the IV is associated with an X unit change in DV while holding other IV's constant Standardized: A one standard deviation change in the IV is associated with an X standard deviation change in the DV (while holding the other IV's constant) Where X represents the values of the standardized and standardized coefficients, respectively.

Estimating Regression Coefficients in Multiple Regression

As population parameters for β0, β1, β2... βk are generally not known, they must be estimated with the sample statistics b0, b1, b2... bk. Using the sample data, the values for b0, b1, b2...bk are estimated by process called ordinary least squares.

Standardizing Variables in R

Data -> Manage variables in active data set -> standardize variables Select variables you want to standardize -> ok The standardized variables will appear as additional columns at the end of your data set They will automatically be named Z.VARIABLENAME

The Multiple Regression Model

Extend the simple regression equation to the case of multiple regression by adding more independant variables.

Simple vs. Multiple Regression: Differences in Coefficients

Simple: The coefficient b1 is a "full" regression coefficient (represents the change in the dependent variable that is associated with a one unit change in IV) Multiple: Coefficients b1, b2...bk are "partial" regression coefficients (each coefficient represents the change in the dv that is associated with a one unit change in the focal IV if all other independent variables are held constant. The partial regression coefficient will generally differ from the full coefficient that would be obtained from a simple regression with a focal IV.

Simple vs. Multiple Regression: Differences in Shape of Regression Model

Simple: The regression model yields a regression line Multiple: The regression model yields a response surface of a dimension that depends on the number of independant variables in the model. For the simplest case of two IV's the response surface is a response plane that can be depicted in 3 dimensional space (Y, X1, X2)

F Value & Significance in R & Excel

multiple regression output

Multicollinearity Diagnostics in R

Preliminary step in R Studio: - tools -> install packages - locate VIF package and install R commander - run your regression model (stats -> fit models -> linear regression - In R script windown type vif(NameOfRegressionModel), highlight the command line and hit submit - NameOfRegressionModel indicates the name that R has assigned or that you have assigned to your regression analysis (eg. RegModel.1)


संबंधित स्टडी सेट्स

Chapter 13: Palliative and End-of-Life Care

View Set

Introduction Vocabulary Explanations

View Set

Symphony No. 94 (Surprise Symphony)

View Set