QMET352 Chapter 14
What is the range of values for multiple R^2?
0% to +100% inclusive
The best example of a null hypothesis for testing an individual regression coefficient is:
H0: B1 = 0
Which statistic is used to test hypotheses about individual regression coefficients?
t-statistic
In a regression analysis, three independent variables are used in the equation based on a sample of forty observations. What are the degrees of freedom associated with the F-statistic?
3 and 36 df = k / n-k-1 = 3 / 40-3-1 = 3/36
A manager at a local bank analyzed the relationship between monthly salary and three independent variables: length of service (measured in months), gender (0 = female, 1 = male) and job type (0 = clerical, 1 = technical). The following ANOVA summarizes the regression results: ANOVA df SS MS F Regression 3 1004346.8 334782.3 5.96 Residual 26 1461134.6 56197.5 Total 29 2465481.4 Coefficients Standard Error t-Stat P-value Intercept 784.92 322.25 2.44 0.02 Service 9.19 3.20 2.87 0.01 Gender 222.78 89.00 2.50 0.02 Job -28.21 89.61 -0.31 0.76 Based on the ANOVA, the multiple coefficient of determination is:
40.7% R^2 = SSR / SS Total =104346.8/2465481.4 = .40736 or 40.736%
Twenty-one executives in a large corporation were randomly selected to study the effect of several factors on annual salary (expressed in $000s). The factors selected were age, seniority, years of college, number of company divisions they had been exposed to, and the level of their responsibility. The results of the regression analysis follow: Constant 23.00371 Std Error of the estimate 2.91933 R^2 0.914 n 21 Degree of Freedom 15 Age, Seniority, Years of College, # Divisions Level Regression Coefficients: -0.031, 0.381, 1.452, -0.089, 3.554 Coefficient Std Error: 0.183, 0.158, 0.387, 0.541, 0.833 What proportion of the total variation in salary is not accounted for by the set of independent variables?
8.6% R2 statistic measures the proportion of the total variation in salary accounted for by the set of independent variables. Reading from the chart, 0.914 indicating that 91.4 percent of the total variation in salary is accounted for by these independent variables, thus the proportion not accounted is 1 - 0.914 = 8.6%
Twenty-one executives in a large corporation were randomly selected to study the effect of several factors on annual salary (expressed in $000s). The factors selected were age, seniority, years of college, number of company divisions they had been exposed to, and the level of their responsibility. The results of the regression analysis follow: Constant 23.00371 Std Error of the estimate 2.91933 R^2 0.914 n 21 Degree of Freedom 15 Age, Seniority, Years of College, # Divisions Level Regression Coefficients: -0.031, 0.381, 1.452, -0.089, 3.554 Coefficient Std Error: 0.183, 0.158, 0.387, 0.541, 0.833 What proportion of the total variation in salary is accounted for by the set of independent variables?
91.4% R^2 statistic measures the proportion of the total variation in salary accounted for by the set of independent variables. Reading from the chart, 0.914 indicating that 91.4 percent of the total variation in salary is accounted for by these independent variables.
Twenty-one executives in a large corporation were randomly selected to study the effect of several factors on annual salary (expressed in $000s). The factors selected were age, seniority, years of college, number of company divisions they had been exposed to, and the level of their responsibility. The results of the regression analysis follow: Constant 23.00371 Std Error of the estimate 2.91933 R^2 0.914 n 21 Degree of Freedom 15 Age, Seniority, Years of College, # Divisions Level Regression Coefficients: -0.031, 0.381, 1.452, -0.089, 3.554 Coefficient Std Error: 0.183, 0.158, 0.387, 0.541, 0.833 Which is the Dependent Variable?
Annual Salary
Consider the multiple regression model shown next between the dependent variable Y and three independent variables X1, X2, and X3, which result in the following function: Ŷ = 33 + 8X1 - 6X2 + 16X3 + 18X4 For this model, there were 35 observations; SSR = 1400 and SSE = 600. The critical value at the 1% level of significance is:
F Critical Value = 4.02 df = k / (n - k - 1) = 4 / (35 - 4 - 1) = 4 / 30
It has been hypothesized that overall academic success for college freshmen as measured by grade point average (GPA) is a function of IQ scores, X1, hours spent studying each week, X2, and one's high school average, X3. Suppose the regression equation is Ŷ = 6.9 + 0.055*X1 + 0.107*X2 + 0.0853*X3 . The multiple standard error is 6.313 and R2 = 0.826. Which independent variable has the smallest effect on GPA?
IQ Scores (0.055*X1)
The following correlations were computed as part of a multiple regression analysis that used education, job, and age to predict income. Income Education Job Age Income 1.000 Education 0.677 1.000 Job 0.173 -0.181 1.000 Age 0.369 0.073 0.689 1.000 What is the dependent variable?
Income
A researcher is styling the effect of ten different variables on a critical measure of business performance. A multiple regression analysis including all ten variables is performed. What criterion could be used to eliminate one fo the ten variables?
Largest p-value or Smallest t-value. This is the Individual t-test where reject H0 mean we should "consider keeping" the independent variable.
Twenty-one executives in a large corporation were randomly selected to study the effect of several factors on annual salary (expressed in $000s). The factors selected were age, seniority, years of college, number of company divisions they had been exposed to, and the level of their responsibility. The results of the regression analysis follow: Constant 23.00371 Std Error of the estimate 2.91933 R^2 0.914 n 21 Degree of Freedom 15 Age, Seniority, Years of College, # Divisions Level Regression Coefficients: -0.031, 0.381, 1.452, -0.089, 3.554 Coefficient Std Error: 0.183, 0.158, 0.387, 0.541, 0.833 Which independent variable has the most significant effect on annual salary?
Level of their Responsibility, .3.554X5 is larger than the other Regression Coefficients (-0.031, 0.381, 1.452, and -0.089)
A manager at a local bank analyzed the relationship between monthly salary and three independent variables: length of service (measured in months), gender (0 = female, 1 = male) and job type (0 = clerical, 1 = technical). The following ANOVA summarizes the regression results: ANOVA df SS MS F Regression 3 1004346.8 334782.3 5.96 Residual 26 1461134.6 56197.5 Total 29 2465481.4 Coefficients Standard Error t-Stat P-value Intercept 784.92 322.25 2.44 0.02 Service 9.19 3.20 2.87 0.01 Gender 222.78 89.00 2.50 0.02 Job -28.21 89.61 -0.31 0.76 Based on the hypothesis tests for the independent regression coefficients, use alpha = 0.05:
Only months of service and gender are significantly related to monthly salary. Individual t-test, Decision Rule: Reject H0 is p-value < alpha 0.05 p-value of "job" falls in the not reject region, therefor the beta = 0 and we should drop this independent variable.
In an ANOVA table for multiple regression analysis, the global test of significance is based on the:
Regression mean square divided by the mean square error In an ANOVA table for a multiple regression analysis, the global test of significance is based on the F-statistic computed as the regression mean square divided by the mean square error.
The strength of the association between a set of independent variables X and a dependent variable Y is measured by the:
Standard Error of the Estimate Coefficient of Correlation Coefficient of Determination (Answer is ALL OF THE OTHERS)
In an ANOVA table for a multiple regression analysis, the regression mean square is:
The regression sum of squares divided by the regression degrees of freedom. MSR = SSR / k See Table in textbook on p. 483
A manager at a local bank analyzed the relationship between monthly salary and three independent variables: length of service (measured in months), gender (0 = female, 1 = male) and job type (0 = clerical, 1 = technical). The following ANOVA summarizes the regression results: ANOVA df SS MS F Regression 3 1004346.8 334782.3 5.96 Residual 26 1461134.6 56197.5 Total 29 2465481.4 Coefficients Standard Error t-Stat P-value Intercept 784.92 322.25 2.44 0.02 Service 9.19 3.20 2.87 0.01 Gender 222.78 89.00 2.50 0.02 Job -28.21 89.61 -0.31 0.76 Based on the ANOVA and a 0.05 significance level, the global null hypothesis test of the multiple regression model
Will be rejected and conclude that monthly salary is related to at least one of the independent variables. Global F test df = k / n-k-1 = 3/29-3-1 = 3/25 F Critical Value = 3.29 Decision Rule: Reject H0 if F > 3.29 F = 5.96, Reject H0, At least 1 independent variable can be used to predict the dependent variable.
Multiple regression analysis is applied when analyzing the relationship between:
a dependent variable and several independent variables
What can we conclude if the global test of regression rejects the null hypothesis?
at least one of the net regression coefficients is not equal to zero
The coefficient of determination measures the proportion of
explained variation relative to total variation
It has been hypothesized that overall academic success for college freshmen as measured by grade point average (GPA) is a function of IQ scores, X1, hours spent studying each week, X2, and one's high school average, X3. Suppose the regression equation is Ŷ = 6.9 + 0.055*X1 + 0.107*X2 + 0.0853*X3 . The multiple standard error is 6.313 and R2 = 0.826. Which independent variable has the greatest effect on GPA?
hours spent studying each week (0.107*X2)
What can we conclude if the global test of regression does not reject the null hypothesis (H0)?
no relationship exists between the dependent variable and any of the independent variables
Which of the following is a characteristic of the F distribution?
positively skewed Also: there is a family of F distributions, F distributions cannot be negative, F distributions are continuous, F distributions are positively skewed, F distributions are symphonic (as X values increase, the F curve approaches the horizontal axis but never touches it)
If there are four independent variables in a multiple regression equation, there are also four:
regression coefficients
To evaluate the assumption of linearity, a multiple regression analysis should include:
scatter diagrams of the dependent variable plotted as a function of each independent variable The evaluation of a multiple regression equation should always include a scatter diagram that plots the dependent variable against each independent variable. These graphs help us to visualize the relationships and provide some initial information about the direction (positive or negative), linearity, and strength of the relationships.
A valid multiple regression analysis assumes or requires that:
the independent variables and the dependent variables have a linear relationship
When does multicollinearity occur in a multiple regression analysis?
the independent variables are highly correlated
In an ANOVA table, for a multiple regression analysis, the variation of the dependent variable explained by the variation of the independent variables is represented by:
the regression sum of squares
In multiple regression, a dummy variable is significantly related to the dependent variable when:
the test of the dummy variable's regression coefficient is rejected.