Exam 2 Inferential

Ace your homework & exams now with Quizwiz!

The ________ correlation refers to that type of correlation that is false or the correlation that actually didn't exist.

The spurious correlation refers to that type of correlation that is false or the correlation that actually didn't exist.

Benefits of first entry

Benefits of first entry The first entry is more likely to be significant. The predictor that is first entered is assigned account over the entire relationship with the DV when determining significance. Each predictor entered after that in the DV in a hierarchical regression must account for variability in the criterion above and beyond those predictors already entered in order to be significant.

Hierarchical Regression •Definition and purpose -Above and beyond... Hierarchical regression _____ terms to the regression model in ______. At each stage, an _____ term or terms are added to the model and the _____ in R2 is calculated. An hypothesis test is done to test whether the change in R2 is significantly different from ______.

Hierarchical Regression •Definition and purpose -Above and beyond... Hierarchical regression adds terms to the regression model in stages. At each stage, an additional term or terms are added to the model and the change in R2 is calculated. An hypothesis test is done to test whether the change in R2 is significantly different from zero.

In Cross Validation reports... Variable with largest beta weight is strongest predictor or contributing the most, it is not more _________

In Cross Validation reports... Variable with largest beta weight is strongest predictor or contributing the most, it is not more significant

Multiple Regression • How do ____________ variables in combination predict a ____________ variable? • Human behavior and other phenomena can be very complex - e.g., job performance

Multiple Regression • How do several variables in combination predict a dependent variable? • Human behavior and other phenomena can be very complex - e.g., job performance • Examples will be with 2 IVs for now

Multiple linear regression analysis makes several key assumptions: ____________ relationship ____________ normality No or little ____________ No ____________ -correlation ____________

Multiple linear regression analysis makes several key assumptions: Linear relationship Multivariate normality No or little multicollinearity No auto-correlation Homoscedasticity

The change in R2 is a way to evaluate how much _______ power was added to the model by the addition of another variable in step 2. In this case, the % of variability accounted for went up from 12.1% to 12.5% - not much of an increase.

The change in R2 is a way to evaluate how much predictive power was added to the model by the addition of another variable in step 2. In this case, the % of variability accounted for went up from 12.1% to 12.5% - not much of an increase.

Three methods to assess stability A. B. C. Or...

Three methods to assess stability A. Confidence intervals for predicted scores B. Adjusted R2 C. Cross-validation Or... shrinkage

When you have more than one predictor variable, you cannot compare the contribution of each predictor variable by simply comparing the correlation coefficients. The ______________ regression coefficient is computed to allow you to make such comparisons and to assess the strength of the relationship between each predictor variable to the criterion variable.

When you have more than one predictor variable, you cannot compare the contribution of each predictor variable by simply comparing the correlation coefficients. The beta (B) regression coefficient is computed to allow you to make such comparisons and to assess the strength of the relationship between each predictor variable to the criterion variable.

• Squared multiple correlation The multiple correlation (R) is equal to the correlation between the ______ scores and the ______ scores. In this example, it is the correlation between UGPA' and UGPA, which turns out to be 0.79. That is, R = 0.79. Note that R will never be ______ since if there are ______ correlations between the predictor variables and the criterion, the regression weights will be ______ so that the correlation between the predicted and actual scores will be ______ . In statistics, the coefficient of determination, denoted R2 or r2 and pronounced R squared, is a number that indicates how well data ______ a statistical model - sometimes simply a line or a curve. An R2 of 1 indicates that the regression line ______ fits the data, while an R2 of 0 indicates that the line ______ ______ fit the data. This latter can be because the data is utterly ______ -linear, or because it is ______ . -- There are several definitions of R2 that are only sometimes equivalent. One class of such cases includes that of simple linear regression where r2 is used instead of R2. --- In this case, if an intercept is included, then r2 is simply the ______ of the sample correlation coefficient (i.e., r) between the outcomes and their ______ values. If additional explanators are included, R2 is the square of the coefficient of multiple correlation. In both such cases, the coefficient of determination ranges from 0 to 1.

• Squared multiple correlation The multiple correlation (R) is equal to the correlation between the predicted scores and the actual scores. In this example, it is the correlation between UGPA' and UGPA, which turns out to be 0.79. That is, R = 0.79. Note that R will never be negative since if there are negative correlations between the predictor variables and the criterion, the regression weights will be negative so that the correlation between the predicted and actual scores will be positive. In statistics, the coefficient of determination, denoted R2 or r2 and pronounced R squared, is a number that indicates how well data fit a statistical model - sometimes simply a line or a curve. An R2 of 1 indicates that the regression line perfectly fits the data, while an R2 of 0 indicates that the line does not fit the data at all. This latter can be because the data is utterly non-linear, or because it is random. --There are several definitions of R2 that are only sometimes equivalent. One class of such cases includes that of simple linear regression where r2 is used instead of R2. --- In this case, if an intercept is included, then r2 is simply the square of the sample correlation coefficient (i.e., r) between the outcomes and their predicted values. If additional explanators are included, R2 is the square of the coefficient of multiple correlation. In both such cases, the coefficient of determination ranges from 0 to 1.

A similar technique of cross-validation uses split samples. Once the sample has been selected from the population, it is randomly divided into 2 subgroups. One subgroup becomes the "_______" group and the other is used as the "_______" group. Again, values for ___ are compared and model stability is assessed by calculating "_________."

A similar technique of cross-validation uses split samples. Once the sample has been selected from the population, it is randomly divided into 2 subgroups. One subgroup becomes the "exploratory" group and the other is used as the "validatory" group. Again, values for R2 are compared and model stability is assessed by calculating "shrinkage."

Singular Matrix

A square matrix which does not have an inverse. A matrix is singular if and only if its determinant is zero. (linear dependence?)

ASSESSING STABILITY OF THE MODEL FOR PREDICTION use of 2 independent samples: Using 2 independent samples involves random selection of 2 groups from the same population. One group becomes the "exploratory" group used for establishing the model of prediction. The second group, the "______" or "______" group is used to assess the model for stability. compares __ values from the 2 groups and assessment of "shrinkage," the difference between the two values for ___ , is used as an indicator of model stability. There is no rule of thumb for interpreting the differences, but it is suggested that "shrinkage" values of less than 0.10 indicate a stable model. While preferable, the use of independent samples is rarely used due to cost considerations.

ASSESSING STABILITY OF THE MODEL FOR PREDICTION Several methods can be employed for cross-validation, including the use of 2 independent samples: Using 2 independent samples involves random selection of 2 groups from the same population. One group becomes the "exploratory" group used for establishing the model of prediction. The second group, the "confirmatory" or "validatory" group is used to assess the model for stability. The researcher compares R2 values from the 2 groups and assessment of "shrinkage," the difference between the two values for R2, is used as an indicator of model stability. There is no rule of thumb for interpreting the differences, but it is suggested that "shrinkage" values of less than 0.10 indicate a stable model. While preferable, the use of independent samples is rarely used due to cost considerations.

Additional Stability Information Consider statistical significance of regression coefficient across samples Do the same predictors remain statistically significant across _______? Consider rank order of beta values within each sample across samples Does the predictor that contributes the most remain consistent across _______? "Most stable," "best," and "strongest" do not always have the same interpretation when it comes to predictors Remember it is not unusual for b, beta, and t values to be _______ when it comes to rank order of magnitude

Additional Stability Information Consider statistical significance of regression coefficient across samples Do the same predictors remain statistically significant across sampling? Consider rank order of beta values within each sample across samples Does the predictor that contributes the most remain consistent across sampling? "Most stable," "best," and "strongest" do not always have the same interpretation when it comes to predictors Remember it is not unusual for b, beta, and t values to be inconsistent when it comes to rank order of magnitude

All b coefficients are _______________, which means that the magnitude of their values is ........

All b coefficients are unstandardized, which means that the magnitude of their values is relative to the means and standard deviations of the independent and dependent variables in the equation.

Another method of determining the best model for prediction is to test the significance of adding one or more variables to the model using the partial F-test. This process allows for ______ of predictors that do not contribute significantly to the prediction, allowing determination of the most efficient model of prediction. In general, the partial F-test is similar to the F-test used in analysis of variance. It assesses the statistical significance of the ______ between values for R2 derived from 2 or more prediction models using a subset of the variables from the ______ equation.

Another method of determining the best model for prediction is to test the significance of adding one or more variables to the model using the partial F-test. This process allows for exclusion of predictors that do not contribute significantly to the prediction, allowing determination of the most efficient model of prediction. In general, the partial F-test is similar to the F-test used in analysis of variance. It assesses the statistical significance of the difference between values for R2 derived from 2 or more prediction models using a subset of the variables from the original equation.

Beta (______________ regression coefficients) ---

Beta (standardised regression coefficients) --- The beta value is a measure of how strongly each predictor variable influences the criterion (dependent) variable. The beta is measured in units of standard deviation. -compare different variables within the same sample -In standardized, unit-free scores, what is the mirroring to Beta-value of the IV variable change in Y (by standard deviation unit) associated with a 1 unit change in X -Analogous to z-scores / correlation --- measure of how strongly each predictor variable influences the (dependent) variable.

Beta weights One answer to the issue of predictors explaining some of the same _______ of the criterion is standardized regression (β) weights. The utility of β weights lie squarely with their function in the standardized regression equation, which speaks to how much _______ each predictor variable is receiving in the equation for predicting the dependent variable, while holding all other independent variables constant. As such, a β weight coefficient informs us as to how much _______ (in standardized metric) in the criterion variable we might expect with a one-unit _______ (in standardized metric) in the predictor variable, again holding all other predictor variables constant. This interpretation of a β weight suggests that its computation must simultaneously take into account the predictor variable's _______ with the criterion as well as the predictor variable's _______ with all other predictors.

Beta weights One answer to the issue of predictors explaining some of the same variance of the criterion is standardized regression (β) weights. The utility of β weights lie squarely with their function in the standardized regression equation, which speaks to how much credit each predictor variable is receiving in the equation for predicting the dependent variable, while holding all other independent variables constant. As such, a β weight coefficient informs us as to how much change (in standardized metric) in the criterion variable we might expect with a one-unit change (in standardized metric) in the predictor variable, again holding all other predictor variables constant. This interpretation of a β weight suggests that its computation must simultaneously take into account the predictor variable's relationship with the criterion as well as the predictor variable's relationships with all other predictors.

Interpretation of beta weights (β = 0.47, t(1253)= 16.27, p = .000).

Beta= standardized regression coefficient -compare different variables within the same sample -In standardized, unit-free scores, what is the change in Y associated with a 1 unit change in X -Analogous to z-scores / correlation (β = 0.47, t(1253)= 16.27, p = .000). Since the statistically significant beta scores were positive, this means their directions in increases and decreases mirror one another; for every standard deviation unit increase Verbal Test scores, College GPA will encounter a .47 standard deviation unit increase.

CHOICE OF THE NUMBER OF VARIABLES in multiple regression You can as many predictor variables as you can think of and usually at least a few of them will come out significant. This is because you are ___________ on ___________ when simply including as many variables as you can think of as predictors of some other variable of interest. problem is compounded when number of observations is relatively low. ex. you can hardly draw conclusions from an analysis of 100 questionnaire items based on 10 respondents. Most authors recommend that you should have at least 10 to 20 times as many observations (cases, respondents) as you have variables; otherwise the estimates of the regression line are probably very ___________ and unlikely to replicate if you were to conduct the study again.

CHOICE OF THE NUMBER OF VARIABLES You can as many predictor variables as you can think of and usually at least a few of them will come out significant. This is because you are capitalizing on chance when simply including as many variables as you can think of as predictors of some other variable of interest. problem is compounded when number of observations is relatively low. Intuitively, it is clear that you can hardly draw conclusions from an analysis of 100 questionnaire items based on 10 respondents. Most authors recommend that you should have at least 10 to 20 times as many observations (cases, respondents) as you have variables; otherwise the estimates of the regression line are probably very unstable and unlikely to replicate if you were to conduct the study again.

Capital R2 (as opposed to r2) should generally be the multiple R2 in a multiple regression model. In ______ regression, there is - no multiple R, -and R2=r2. So one difference is applicability: -"multiple R" implies multiple regressors, -And "R2" doesn't necessarily. Another simple difference is interpretation. -- In multiple regression, the multiple R is the ______ of multiple correlation, whereas its ______ is the coefficient of determination. --- R can be interpreted somewhat like a ______ correlation coefficient, the main difference being that the multiple correlation is between the dependent variable and a linear combination of the predictors, not just any one of them, and not just the average of those ______ correlations. -- R2 can be interpreted as the percentage of variance in the dependent variable that can be explained by the predictors; as above, this is also true if there is only one predictor.

Capital R2 (as opposed to r2) should generally be the multiple R2 in a multiple regression model. In simple regression, there is - no multiple R, -and R2=r2. So one difference is applicability: -"multiple R" implies multiple regressors, -And "R2" doesn't necessarily Another simple difference is interpretation. -- In multiple regression, the multiple R is the coefficient of multiple correlation, whereas its square is the coefficient of determination (R2). --- R can be interpreted somewhat like a simple correlation coefficient (Pearson's r), the main difference being that the multiple correlation is between the dependent variable and a linear combination of the predictors, not just any one of them, and not just the average of those simple correlations. -- R2 can be interpreted as the percentage of variance in the dependent variable that can be explained by the predictors; as above, this is also true if there is only one predictor.

Change in squared multiple correlation (How different than regular multiple correlation) Difference is that in hierarchical, we... In multiple regression, R² measure... In Hierarchical, a statistical test of the change in R² ...

Change in squared multiple correlation (How different than regular multiple correlation) Difference is that in hierarchical, we test the R2 significance after each entry of predictors (steps) and look at the changes in R2 between those entries. In multiple regression, R² measure the strength of the relationship between a single set of independent variables and the dependent variable. In Hierarchical, a statistical test of the change in R² from the first stage is used to evaluate the importance of the variables entered in the second stage. R2 always increase or stagnate when variables

Cross Validation Suppose we compute regression estimates on a sample of people. We then collect data on a second sample, but we do NOT compute ____ values of a and b for this sample. Instead, we compute predicted Y values using the coefficients from the _____ sample. We then find the correlation between Y and Y' and _____ it. This gives us a _____ -_____ R2. Because we have not estimated b weights, there will be no capitalization on _____ , and our estimate of R2 should be good on average. It tells us what the correlation to expect when we actually use the equation.

Cross Validation Suppose we compute regression estimates on a sample of people. We then collect data on a second sample, but we do NOT compute new values of a and b for this sample. Instead, we compute predicted Y values using the coefficients from the previous sample. We then find the correlation between Y and Y' and square it. This gives us a cross-validated R2. Because we have not estimated b weights, there will be no capitalization on chance, and our estimate of R2 should be good on average. It tells us what the correlation to expect when we actually use the equation.

A model, or equation, is said to be "stable" if it can be applied to different samples from the same population without losing the accuracy of the prediction. - This is accomplished through...

Cross-validation of the model. Cross-validation determines how well the prediction model developed using one sample performs in another sample from the same population.

Cross-validation. To perform cross-validation, a researcher will either gather ___ large samples, or ___ very large sample which will be ____ into ____ samples via random selection procedures. The ____ equation is created in the first sample. That equation is then used to create predicted scores for the members of the second sample. The ____ scores are then correlated with the ____ scores on the dependent variable (ryy'). This is called the ____ -____ coefficient. The difference between the original R-squared and ryy'2 is the ____ . The smaller the ____ , the more confidence we can have in the generalizability of the equation.

Cross-validation. To perform cross-validation, a researcher will either gather two large samples, or one very large sample which will be split into two samples via random selection procedures. The prediction equation is created in the first sample. That equation is then used to create predicted scores for the members of the second sample. The predicted scores are then correlated with the observed scores on the dependent variable (ryy'). This is called the cross-validity coefficient. The difference between the original R-squared and ryy'2 is the shrinkage. The smaller the shrinkage, the more confidence we can have in the generalizability of the equation.

Diagonal matrix and Identity matrix

Diagonal Matrix: A square matrix which has zeros everywhere other than the main diagonal. Identity matrix: A square matrix which has a 1 for each element on the main diagonal and 0 for all other elements.

Differences in solving hierarchical regression problems: R² change, i.e. the increase when the predictors variables are added to the analysis is interpreted rather than the ______ R² for the model with all variables entered. In the interpretation of ______ relationships, the relationship between the predictors and the dependent variable is presented. Similarly, in the validation analysis, we are only concerned with verifying the ______ of the predictor variables. Differences in ______ variables are ignored.

Differences in solving hierarchical regression problems: R² change, i.e. the increase when the predictors variables are added to the analysis is interpreted rather than the overall R² for the model with all variables entered. In the interpretation of individual relationships, the relationship between the predictors and the dependent variable is presented. Similarly, in the validation analysis, we are only concerned with verifying the significance of the predictor variables. Differences in control variables are ignored.

hierarchical regression The null hypothesis for the addition of each block of variables to the analysis is that the change in R² (contribution to the explanation of the variance in the dependent variable) is ____ . If the null hypothesis is rejected, then our interpretation indicates that the variables in block 2 had a relationship to the dependent variable, after controlling for the relationship of the block 1 variables to the dependent variable.

Differences in statistical results with hierarchical regression The null hypothesis for the addition of each block of variables to the analysis is that the change in R² (contribution to the explanation of the variance in the dependent variable) is zero. If the null hypothesis is rejected, then our interpretation indicates that the variables in block 2 had a relationship to the dependent variable, after controlling for the relationship of the block 1 variables to the dependent variable.

Hierarchical Regression • Specifying _____ of entry of IVs • _____ based on theory and/or logic • Answers the following question: Does IV #2 add to the prediction of the DV above and beyond IV #1? Or Does IV #3 add to the prediction of the DV above and beyond IVs #1 and #2 combined? Example • Criterion Y: Sales by reps • Predictor 1: tenure/job experience • Predictor 2: interview • Predictor 3: selection test • Does the selection test predict sales performance above and beyond job experience and the job interview? Example • Criterion Y: Number of owl offspring reaching adulthood per geographic area • Predictor 1: % land cultivated • Predictor 2: years in conservation program • Does the length of time in the program influence offspring above and beyond the % of land in cultivated?

Hierarchical Regression • Specifying order of entry of IVs • Order based on theory and/or logic • Answers the following question: Does IV #2 add to the prediction of the DV above and beyond IV #1? Or Does IV #3 add to the prediction of the DV above and beyond IVs #1 and #2 combined? Example • Criterion Y: Sales by reps • Predictor 1: tenure/job experience • Predictor 2: interview • Predictor 3: selection test • Does the selection test predict sales performance above and beyond job experience and the job interview? Example • Criterion Y: Number of owl offspring reaching adulthood per geographic area • Predictor 1: % land cultivated • Predictor 2: years in conservation program • Does the length of time in the program influence offspring above and beyond the % of land in cultivated?

Hierarchical regression is the practice of building _____ linear regression models, each _____ _____ predictors. For example, one common practice is to start by adding only demographic control variables to the model in one step. In the next model, you can add predictors of interest, to see if they predict the DV above and beyond the effect of the controls. You're actually building _____ but related models in each step. But SPSS has a nice function where it will compare the models, and actually test if successive models fit better than previous ones. So hierarchical regression is really a series of regular old OLS (ordinary least squares) regression models-nothing fancy, really.

Hierarchical regression is the practice of building successive linear regression models, each adding more predictors. For example, one common practice is to start by adding only demographic control variables to the model in one step. In the next model, you can add predictors of interest, to see if they predict the DV above and beyond the effect of the controls. You're actually building separate but related models in each step. But SPSS has a nice function where it will compare the models, and actually test if successive models fit better than previous ones. So hierarchical regression is really a series of regular old OLS (ordinary least squares) regression models-nothing fancy, really.

How Are Prediction Equations Evaluated?

How Are Prediction Equations Evaluated? A regression equation is produced for the sample. Because this process capitalizes on chance and error in the sample, the equation produced in one sample will not generally fare as well in another sample (i.e., R-squared in a subsequent sample using the same equation will not be as large as R-squared from original sample), a phenomenon called shrinkage. The most desirable outcome in this process is for minimal shrinkage, indicating that the prediction equation will generalize well to new samples or individuals from the population examined. While there are equations that can estimate shrinkage, the best way to estimate shrinkage, and test the prediction equation is through cross-validation or double-cross validation. While there are equations that can estimate shrinkage, the best way to estimate shrinkage, and test the prediction equation is through cross-validation or double-cross validation.

Hierarchical regression How and why regression weights change when predictors are added Regression Weights • What will happen to the regression weight of X1 when we add X2? • If X1 and X2 are correlated, it will ____ and may change in ____ • Thus, we must have a theoretical/research question reason to enter variables in a certain order • The ____ variable entered must account for variability in the DV above and beyond the all other IVs already entered to be significant • More likely to be significant if entered ____

How and why regression weights change when predictors are added Regression Weights • What will happen to the regression weight of X1 when we add X2? • If X1 and X2 are correlated, it will decrease and may change in significance • Thus, we must have a theoretical/research question reason to enter variables in a certain order • The last variable entered must account for variability in the DV above and beyond the all other IVs already entered to be significant • More likely to be significant if entered first

In Cross Validation reports... Report significance of ________ correlations and ___________ •Report relevant correlations -those that influence interpretations -SEE significant r, insignificant weight in validation dataset Don't compare size of betas across samples OR equations-e.g., hgpa beta with nach beta -change as predictors change •Comparing size of b's across data sets really isn't telling us much -random error / sampling error -but not wrong -Typically, just talk shrinkage (which is b's) and betas •Remember that df for regression weights is residual df for that regression model -still too many df errors

In Cross Validation reports... Report significance of multiple correlations (R) and weights •Report relevant correlations -those that influence interpretations -SEE significant r, insignificant weight in validation dataset Don't compare size of betas across samples OR equations-e.g., hgpa beta with nach beta -change as predictors change •Comparing size of b's across data sets really isn't telling us much -random error / sampling error -but not wrong -Typically, just talk shrinkage (which is b's) and betas •Remember that df for regression weights is residual df for that regression model -still too many df errors

In hierarchical multiple regression: Adding a Predictor • Almost always an increase in __ • Is that increase meaningful? • Does it increase our _______ of the DV significantly? • Is adding the variable "worth it?" • Test for the increment/increase in R2 or change in R2

In hierarchical multiple regression: Adding a Predictor • Almost always an increase in R2 • Is that increase meaningful? • Does it increase our prediction of the DV significantly? • Is adding the variable "worth it?" • Test for the increment/increase in R2 or change in R2

In hierarchical regression, the independent variables are entered into the analysis in a _____ of blocks, or _____ that may contain one or more variables.

In hierarchical regression, the independent variables are entered into the analysis in a sequence of blocks, or groups that may contain one or more variables.

In multiple regression, to interpret the direction of the relationship between variables, look at the signs (plus or minus) of the _ (____) coefficients. If a _ coefficient is positive, then the relationship of this variable with the dependent variable is positive (e.g., the greater the IQ the better the grade point average); if the _ coefficient is negative then the relationship is negative (e.g., the lower the class size the better the average test scores). Of course, if the B coefficient is equal to 0 then there is no relationship between the variables.

In multiple regression, to interpret the direction of the relationship between variables, look at the signs (plus or minus) of the B (Beta) coefficients. If a B coefficient is positive, then the relationship of this variable with the dependent variable is positive (e.g., the greater the IQ the better the grade point average); if the B coefficient is negative then the relationship is negative (e.g., the lower the class size the better the average test scores). Of course, if the B coefficient is equal to 0 then there is no relationship between the variables.

In partial correlation... The control variables are ..... The order of correlation refers.....

In partial correlation... The control variables are the variables which extract the variance which is obtained from the initial correlated variables. The order of correlation refers to the correlation with control variables. For example, first order is the one which has a single control variable.

Inferences About the Parameters The inferences and interpretations made in multiple linear regression are similar to those made in simple linear regression with four major differences, one of which is: 1. The t−tests for the slope coefficients are conditional tests. That is, the tests are analyzing the _______ predictive value that a variable adds to the model when the other _______ are already included in the model

Inferences About the Parameters The inferences and interpretations made in multiple linear regression are similar to those made in simple linear regression with four major differences, one of which is: 1. The t−tests for the slope coefficients are conditional tests. That is, the tests are analyzing the significant predictive value that a variable adds to the model when the other variables are already included in the model

Inferences About the Parameters The inferences and interpretations made in multiple linear regression are similar to those made in simple linear regression with four major differences, one of which is: The ANOVA F−test is a test that all of the slopes in the model are equal to _____ (this is the null hypothesis, Ho, versus the alternative hypothesis, Ho that the slopes are not all equal to _____ ; i.e. at least one slope does not equal zero.) This test is called the F−test for Overall ____________.

Inferences About the Parameters The inferences and interpretations made in multiple linear regression are similar to those made in simple linear regression with four major differences, one of which is: The ANOVA F−test is a test that all of the slopes in the model are equal to zero (this is the null hypothesis, Ho, versus the alternative hypothesis, Ho that the slopes are not all equal to zero; i.e. at least one slope does not equal zero.) This test is called the F−test for Overall Significance.

Inferences About the Parameters The inferences and interpretations made in multiple linear regression are similar to those made in simple linear regression with four major differences, one of which is: The coefficient of determination, R2, still measures the amount of variation in the response variable Y that is explained by all of the predictor variables in the model. -- However, where before the _____ ____ of R2 could be interpreted as the correlation between X and Y, this result no longer holds true in multiple linear regression. ---Since we now have more than one X, this _____ ____ is no longer representative of a linear relationship between two variables which is what correlation measures.

Inferences About the Parameters The inferences and interpretations made in multiple linear regression are similar to those made in simple linear regression with four major differences, one of which is: The coefficient of determination, R2, still measures the amount of variation in the response variable Y that is explained by all of the predictor variables in the model. -- However, where before the square root of R2 could be interpreted as the correlation between X and Y, this result no longer holds true in multiple linear regression. ---Since we now have more than one X, this square root is no longer representative of a linear relationship between two variables which is what correlation measures.

Inferences About the Parameters The inferences and interpretations made in multiple linear regression are similar to those made in simple linear regression with four major differences, one of which is: The values of the ____ are interpreted as to how much of a unit change in Y will occur for a unit increase in a particular X predictor variable, given that the other variables are held constant

Inferences About the Parameters The inferences and interpretations made in multiple linear regression are similar to those made in simple linear regression with four major differences, one of which is: The values of the slopes are interpreted as to how much of a unit change in Y will occur for a unit increase in a particular X predictor variable, given that the other variables are held constant

Inferences About the Parameters The inferences and interpretations made in multiple linear regression are similar to those made in simple linear regression (in fact, the assumptions required for multiple linear regression are the same as those for simple linear regression except here they must hold for all predictors and not just the one used in simple linear regression) with four major differences: _−tests for the slope coefficients are conditional tests. The coefficient of _______, __, still measures the amount of variation in the response variable Y that is explained by all of the predictor variables in the model. -However, where before the square root of __ could be interpreted as the correlation between X and Y, this result no longer holds true in multiple linear regression. The _______ _−test is a test that all of the slopes in the model are equal to zero

Inferences About the Parameters The inferences and interpretations made in multiple linear regression are similar to those made in simple linear regression (in fact, the assumptions required for multiple linear regression are the same as those for simple linear regression except here they must hold for all predictors and not just the one used in simple linear regression) with four major differences: t−tests for the slope coefficients are conditional tests. The coefficient of determination, R2, still measures the amount of variation in the response variable Y that is explained by all of the predictor variables in the model. - However, where before the square root of R2 could be interpreted as the correlation between X and Y, this result no longer holds true in multiple linear regression. The ANOVA F−test is a test that all of the slopes in the model are equal to zero

Interpret: Multiple Regression Equation Y' = 50.17 + 1.20X1 + .23X2 Criterion Y: Salary Predictor 1: Years since Ph.D. Predictor 2: Number of Pubs

Interpret: Multiple Regression Equation Y' = 50.17 + 1.20X1 + .23X2 Criterion Y: Salary Predictor 1: Years since Ph.D. Predictor 2: Number of Pubs • Holding number of publications constant, for every additional year since getting a Ph.D., predict salary to increase by $1200. • Holding years since getting Ph.D. constant, every unit increase in number of publications is predicted to increase salary by $230. Y' = 50.17 + 1.20X1 + .23X2 Simple regressions: Y' = 50.89 + 1.31X1 (years) Y' = 52.59 + 1.43X2 (pubs)

LIMITATIONS The major conceptual limitation of all regression techniques is that you can only __________ relationships, but never be sure about underlying __________ mechanism.

LIMITATIONS The major conceptual limitation of all regression techniques is that you can only ascertain relationships, but never be sure about underlying causal mechanism.

___________ AND ___________ ILL-CONDITIONING What is correlation among predictors called? Why is a problem? Commonly a problem because...

MULTICOLLINEARITY AND MATRIX ILL-CONDITIONING What is correlation among predictors called? Multicollinearity Why is a problem? The higher the correlation among the predictors, the higher the standard error of b Commonly a problem because... some predictors are too redundant/similar. interferes in determining the precise effect of each predictor, but... doesn't affect the overall fit of the model or produce bad predictions. Depending on your goals, multicollinearity isn't always a problem. However, because of the difficulty in choosing the correct model when severe multicollinearity is present, it's always worth exploring. i. e. Imagine that you have two predictors (X variables) of a person's height: (1) weight in pounds and (2) weight in ounces. Obviously, our two predictors are completely redundant; weight is one and the same variable, regardless of whether it is measured in pounds or ounces. Trying to decide which one of the two measures is a better predictor of height would be rather silly; however, this is exactly what you would try to do if you were to perform a multiple regression analysis with height as the dependent (Y) variable and the two measures of weight as the independent (X) variables. When there are very many variables involved, it is often not immediately apparent that this problem exists, and it may only manifest itself after several variables have already been entered into the regression equation. Nevertheless, when this problem occurs it means that at least one of the predictor variables is (practically) completely redundant with other predictors. There are many statistical indicators of this type of redundancy (tolerances, semi-partial R, etc., as well as some remedies (e.g., Ridge regression).

Multiple Linear Regression In simple linear regression we only consider one predictor variable. When we include more than one predictor variable, we have what is now a multiple linear regression model. This new model is just an extension of the simple model where we now include parameter (i.e. _______) estimates for each predictor variable in the model. These coefficient values for each predictor are the _______ estimates. As with simple linear regression, we have one Y or response variable (also called the dependent variable), but now have more than one X variable, also called explanatory, independent, or predictor variables.

Multiple Linear Regression In simple linear regression we only consider one predictor variable. When we include more than one predictor variable, we have what is now a multiple linear regression model. This new model is just an extension of the simple model where we now include parameter (i.e. slope) estimates for each predictor variable in the model. These coefficient values for each predictor are the slope estimates. As with simple linear regression, we have one Y or response variable (also called the dependent variable), but now have more than one X variable, also called explanatory, independent, or predictor variables.

Multiple Regression Equation vs Simple Regression Equation • Goal is still to __________ _________ of prediction • Need to solve for all _'s simultaneously - Matrix Algebra • Idea is the same - Just not as easy

Multiple Regression Equation vs Simple Regression Equation • Goal is still to minimize errors of prediction • Need to solve for all b's simultaneously - Matrix Algebra • Idea is the same - Just not as easy

Multiple Regression vs Simple Regression Y = a + b1X1 + b2X2 + ... + bkXk + e Will become something like... And with both, - _________ are the same

Multiple Regression vs Simple Regression Y = a + b1X1 + b2X2 + ... + bkXk + e Will become something like... Y' = a + b1X1 + b2X2 + ... + bkXk And with both, - Assumptions are the same e is the error term; the error in predicting the value of Y, given the value of X (it is not displayed in most regression equations).

Multiple Regression: • Instead of just using one predictor variable as in __________ linear regression, use __________ predictor variables. • The probability distribution of Y depends on several predictor variables x1, x2, . . . , xp __________ . • Rather than perform p simple linear regressions on each predictor variable separately, use information from all p predictor variables __________ .

Multiple Regression: • Instead of just using one predictor variable as in simple linear regression, use several predictor variables. • The probability distribution of Y depends on several predictor variables x1, x2, . . . , xp simultaneously. • Rather than perform p simple linear regressions on each predictor variable separately, use information from all p predictor variables simultaneously.

Partial correlation •Definition and purpose Partial Correlation measures the correlation between X and Y controlling for Z Comparing the bivariate (simple) ("___-order") correlation to the partial ("___-order") correlation allows us to determine if the relationship between X and Y is _____, _____, or _____ Interaction ______ be determined with partial correlations

Partial correlation •Definition and purpose Partial Correlation measures the correlation between X and Y controlling for Z Comparing the bivariate ("zero-order") correlation to the partial ("first-order") correlation allows us to determine if the relationship between X and Y is direct , spurious , or intervening Interaction cannot be determined with partial correlations

Definition and purpose of Partial correlation

Partial correlation is the measure of association between two variables, while controlling or adjusting the effect of one or more additional variables. Partial correlations can be used in many cases that assess for relationship, like whether or not the sale value of a particular commodity is related to the expenditure on advertising when the effect of price is controlled Questions answered: What is the relationship between test scores and GPA scores after controlling for hours spent studying? After controlling for age, what is the relationship between Z drugs with XY symptoms?

Partial correlation •Relation to multiple correlation and change in squared multiple correlation

Partial r - relationship between two variables while removing the influence of one or more other variables from both of those two variables of interest. Multiple correlations - the combined influence of one or more predictor variables on a criterion variable. Change in R squared - variance accounted for in the criterion by the predictor(s) above and beyond other predictors already entered. Partial r and change in r both reflect statistical control, but in the case of partial r, we are controlling a variable's influence on both variables - we want to get rid of that variable's influence. With a change in R, we don't want to completely get rid of a predictor's influence, we want to see if another predictor variable predicts after taking that variable into account (but I won't get that technical on the exam since we haven't discussed this yet).

Partial regression plots attempt to show the effect of adding an additional variable to the model (given that one or more indpendent variables are already in the model). Partial regression plots are formed by: Compute the residuals of regressing the response variable against the indpendent variables but omitting Xi Compute the residuals from regressing Xi against the remaining indpendent variables. Plot the residuals from (1) against the residuals from (2).

Partial regression plots attempt to show the effect of adding an additional variable to the model (given that one or more indpendent variables are already in the model). Partial regression plots are formed by: Compute the residuals of regressing the response variable against the indpendent variables but omitting Xi Compute the residuals from regressing Xi against the remaining indpendent variables. Plot the residuals from (1) against the residuals from (2).

Shrinkage What is it and what affects it?

Shrinkage & stability What are they and what affects them? Reduction in R2 that occurs when we apply a regression equation in another sample Recall: R = correlation between predicted Y and actual Y As sample size increases, shrinkage decreases R2 is more stable Sample is better representation of population Rule of thumb: 10% shrinkage or less is acceptable Note: NOT 10% decrease in variance accounted for (R2 value) The less shrinkage, the more stability.

Shrinkage is the amount of reduction in ___ when a regression equation is applied to ______ sample -It is not the difference between two squared multiple correlations (R2) when the same regression is computed in two different samples

Shrinkage is the amount of reduction in R2 when a regression equation is applied to another sample -It is not the difference between two squared multiple correlations (R2) when the same regression is computed in two different samples

Simple Regression: The probability distribution of a random variable Y may depend on the value x of some predictor variable. Ingredients: • Y is a continuous response variable (_________ variable). • x is an explanatory or predictor variable (_________ variable). • Y is the variable we're mainly interested in understanding, and we want to make use of x for explanatory reasons.

Simple Regression: The probability distribution of a random variable Y may depend on the value x of some predictor variable. Ingredients: • Y is a continuous response variable (dependent variable). • x is an explanatory or predictor variable (independent variable). • Y is the variable we're mainly interested in understanding, and we want to make use of x for explanatory reasons.

Sole interpretation of β weights is troublesome because...

Sole interpretation of β weights is troublesome because... multicollinearity. because they must account for all relationships among the variables, β weights are heavily affected by the variances and covariances of the variables This sensitivity to covariance (i.e., multicollinear) relationships can result in very sample-specific weights which can dramatically change with slight changes in covariance relationships in future samples, thereby decreasing generalizability. For example, β weights can even change in sign as new variables are added or as old variables are deleted.

Square & symmetrical matrices

Square Matrix: # rows = # columns 1. Asymmetrical Matrix (not the same when transposed) 2. Symmetrical Matrix (Remains the same when transposed)

Stability is:

Stability is: A model, or equation, that can be applied to different samples from the same population without losing the accuracy of the prediction.

The general purpose of multiple regression is to learn more about the relationship between several ____________ or ____________ variables and a ____________ or ____________ variable. For example, a real estate agent might record for each listing the size of the house (in square feet), the number of bedrooms, the average income in the respective neighborhood according to census data, and a subjective rating of appeal of the house. Once this information has been compiled for various houses it would be interesting to see whether and how these measures relate to the price for which a house is sold. For example, you might learn that the number of bedrooms is a better ____________ of the price for which a house sells in a particular neighborhood than how "pretty" the house is (subjective rating). You may also detect "outliers," that is, houses that should really sell for more, given their location and characteristics.

The general purpose of multiple regression is to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable. For example, a real estate agent might record for each listing the size of the house (in square feet), the number of bedrooms, the average income in the respective neighborhood according to census data, and a subjective rating of appeal of the house. Once this information has been compiled for various houses it would be interesting to see whether and how these measures relate to the price for which a house is sold. For example, you might learn that the number of bedrooms is a better predictor of the price for which a house sells in a particular neighborhood than how "pretty" the house is (subjective rating). You may also detect "outliers," that is, houses that should really sell for more, given their location and characteristics.

The manner is which regression weights are computed guarantee that they will provide an optimal fit with respect to the least square criterion for the existing set of data. If a statistician wishes to predict a different set of data, the regression weights are no longer optimal. There will be substantial ____ in the value of R2 if the weights estimated on one set of data are used on a second set of data. The amount of ____ can be estimated using a cross-validation procedure.

The manner is which regression weights are computed guarantee that they will provide an optimal fit with respect to the least square criterion for the existing set of data. If a statistician wishes to predict a different set of data, the regression weights are no longer optimal. There will be substantial shrinkage in the value of R2 if the weights estimated on one set of data are used on a second set of data. The amount of shrinkage can be estimated using a cross-validation procedure.

The values of _ (_1 and _2) are sometimes called "regression coefficients" and sometimes called "regression weights." These two terms are synonymous.

The values of b (b1 and b2) are sometimes called "regression coefficients" and sometimes called "regression weights." These two terms are synonymous.

Understanding and Interpretation of R2 R2 = .3906 DV: faculty salary IV's: years since obtaining a Ph.D and number of publications.

Understanding and Interpretation of R2 R2 = .3906 Approximately 39% of the variability in faculty salary can be accounted for by years since obtaining a Ph.D and number of publications.

Unstandardized relationships (b) say that for a one-____-unit increment on a predictor, the outcome variable increases (or if b is negative, decreases) by a number of its ____ units corresponding to what the b coefficient is.

Unstandardized relationships (b) say that for a one-raw-unit increment on a predictor, the outcome variable increases (or if b is negative, decreases) by a number of its raw units corresponding to what the b coefficient is.

Unstandardized relationships are expressed in terms of the variables' original, ____ units. Standardized results represent what happens after all of the variables (________ and ________) have initially been converted into z-scores (formula). recall that z scores convey information in ________-________ (__) units

Unstandardized relationships are expressed in terms of the variables' original, raw units. Standardized results represent what happens after all of the variables (predictors and outcome) have initially been converted into z-scores (formula). recall that z scores convey information in standard-deviation (SD) units

When predictors are correlated, the sum of the squared bivariate correlations no longer yields the R2 effect size. Instead, βs can be used to adjust the level of correlation credit a predictor gets in creating the effect: R2=(β1)(rY−X1)+(β2)(rY−X2)+...+(βk)(rY−Xk). (2) This equation highlights the fact that β weights are not _______ measures of relationship between predictors and outcomes. Instead, they simply reflect how much _______ is being given to predictors in the regression equation in a particular context. The accuracy of β weights are theoretically dependent upon having a perfectly specified model, since adding or removing predictor variables will inevitably _______ β values. The problem is that the true model is rarely, if ever, known.

When predictors are correlated, the sum of the squared bivariate correlations no longer yields the R2 effect size. Instead, βs can be used to adjust the level of correlation credit a predictor gets in creating the effect: R2=(β1)(rY−X1)+(β2)(rY−X2)+...+(βk)(rY−Xk). (2) This equation highlights the fact that β weights are not direct measures of relationship between predictors and outcomes. Instead, they simply reflect how much credit is being given to predictors in the regression equation in a particular context. The accuracy of β weights are theoretically dependent upon having a perfectly specified model, since adding or removing predictor variables will inevitably change β values. The problem is that the true model is rarely, if ever, known.

When predictors are multicollinear, ______ in the criterion that can be explained by multiple predictors is often not equally divided among the predictors. A predictor might have a large correlation with the outcome variable, but might have a near-zero β weight because another predictor is receiving the credit for the ______ explained. As such, β weights are ______ -specific to a given specified model. Due to the limitation of these standardized coefficients, some researchers have argued for the interpretation of structure coefficients in addition to β weights.

When predictors are multicollinear, variance in the criterion that can be explained by multiple predictors is often not equally divided among the predictors. A predictor might have a large correlation with the outcome variable, but might have a near-zero β weight because another predictor is receiving the credit for the variance explained. As such, β weights are context-specific to a given specified model. Due to the limitation of these standardized coefficients, some researchers have argued for the interpretation of structure coefficients in addition to β weights.

Within the same regression equation, the different predictor variables' unstandardized b coefficients are not directly comparable to each other, because the raw units for each are (usually) different. In other words, the largest b coefficient will not necessarily be the most significant, as it must be judged in connection with its standard error (_/SE = t, which is used to test for statistical significance). On the other hand, with standardized analyses, all variables have been converted to a common metric, namely standard-deviation (_-score) units, so the _ coefficients can meaningfully be compared in magnitude. In this case, whichever predictor variable has the largest _ (in absolute value) can be said to have the most potent relationship to the dependent variable, and this predictor will also have the greatest significance (smallest p value).

Within the same regression equation, the different predictor variables' unstandardized b coefficients are not directly comparable to each other, because the raw units for each are (usually) different. In other words, the largest b coefficient will not necessarily be the most significant, as it must be judged in connection with its standard error (b/SE = t, which is used to test for statistical significance). On the other hand, with standardized analyses, all variables have been converted to a common metric, namely standard-deviation (z-score) units, so the β coefficients can meaningfully be compared in magnitude. In this case, whichever predictor variable has the largest β (in absolute value) can be said to have the most potent relationship to the dependent variable, and this predictor will also have the greatest significance (smallest p value).

Definition of Regression Coefficients

b = unstandardize regression coefficient - Compare same variable across samples— -In raw score units, what is the change in Y associated with a 1 unit change in X

multiple linear regression analysis requires that there is little or no autocorrelation in the data. Autocorrelation occurs when the residuals are not __________ from each other. In other words when the value of y(x+1) is not __________ from the value of y(x). This for instance typically occurs in stock prices, where today's price is not __________ from yesterday's price.

multiple linear regression analysis requires that there is little or no autocorrelation in the data. Autocorrelation occurs when the residuals are not independent from each other. In other words when the value of y(x+1) is not independent from the value of y(x). This for instance typically occurs in stock prices, where today's price is not independent from yesterday's price.

•Cross validation example In our example of predicting twelfth-grade achievement test scores from eighth-grade variables a sample of 700 students (a subset of the larger National Education Longitudinal Survey of 1988) were randomly split into two groups. In the first group, analyses revealed that the following eighth-grade variables were significant predictors of twelfth-grade achievement: GPA, parent education level, race (white=0, nonwhite=1), and participation in school-based extracurricular activities (no=0, yes=1), producing the following equation: Y'= -2.45+1.83(GPA) -0.77(Race) +1.03(Participation) +0.38(Parent Ed) In the first group, this analyses produced an R-squared of .55. This equation was used in the second group to create predicted scores, and those predicted scores correlated ryy' = .73 with observed achievement scores. With a ryy'2 of .53 (cross-validity coefficient), shrinkage was 2%, a good outcome.

•Cross validation example In our example of predicting twelfth-grade achievement test scores from eighth-grade variables a sample of 700 students (a subset of the larger National Education Longitudinal Survey of 1988) were randomly split into two groups. In the first group, analyses revealed that the following eighth-grade variables were significant predictors of twelfth-grade achievement: GPA, parent education level, race (white=0, nonwhite=1), and participation in school-based extracurricular activities (no=0, yes=1), producing the following equation: Y'= -2.45+1.83(GPA) -0.77(Race) +1.03(Participation) +0.38(Parent Ed) In the first group, this analyses produced an R-squared of .55. This equation was used in the second group to create predicted scores, and those predicted scores correlated ryy' = .73 with observed achievement scores. With a ryy'2 of .53 (cross-validity coefficient), shrinkage was 2%, a good outcome.

Element (a)

Element (a) Each numbers in matrix, denoted by aij (i=row, j=column) Element = Each # in matrix Element a11 = 4 (lower case a)

Future prediction is important to consider __________of prediction over time Goal is usually to predict people/observations that do not have __________ scores (people not in current sample) Critically important if variable selection was data driven

Future prediction is important to consider Stability of prediction over time Goal is usually to predict people/observations that do not have criterion scores (people not in current sample) Critically important if variable selection was data driven

Data matrix (Rows versus columns) (A)

Generic: n by k: n rows by k columns Element (a)

Prediction Selecting variables to enter... You'll want to use ____ or ______. Never select at random, because...

If you don't base variable selection on some sort of theoretical reasoning, you run the risk of finding an anomaly of the dataset itself, which could change from dataset to dataset. But by using theory, you know what you're looking for and your findings should support or contradict it.

Regression coefficient (b)- when the regression line is linear (y = ax + b) the regression coefficient is the constant (a) that represents the _____ of change of one variable (y) as a function of changes in the other (x); it is the _____ of the regression line

Regression coefficient (b)- when the regression line is linear (y = ax + b) the regression coefficient is the constant (a) that represents the rate of change of one variable (y) as a function of changes in the other (x); it is the slope of the regression line

Vectors

Series of number in a matrix: column vectors (nx1) ; row vectors (1xn)

Several methods can be employed for cross-validation, including ... [2 ways]

Several methods can be employed for cross-validation, including the use of 2 independent samples, split samples.

Standardized relationships (β) say that for a one-standard deviation increment on a predictor, the outcome variable increasess (or decreases) by some number of SD's corresponding to what the β coefficient is.

Standardized relationships (β) say that for a one-standard deviation increment on a predictor, the outcome variable increasess (or decreases) by some number of SD's corresponding to what the β coefficient is.

•Suppression / suppressor variable (just conceptually)

Suppresses or controls for irrelevant variance that is shared with the predictor but not the criterion Suppressor is an IV with partial correlation larger than zero-order correlation (with the dependent)....

Principal diagonal

The elements of a matrix starting in the upper left corner and proceeding down and to the right.

Weakness of hierarchical multiple regression:

• It is heavily reliant on the researcher knowing what he or she is doing!


Related study sets

Salesforce Platform Developer 1 - Set 3

View Set

Chapter 16: Gene Regulation in Eukaryotes II: Epigenetics

View Set

CNIT 155 Final Exam In class exercises VLAB

View Set

Cloud Academy KC: Cloud Concepts (CLF-C01)

View Set

AP US Government Chapter 4 AP Classroom MCQ

View Set