Cook's Distance

the distance between the fitted values of the model with all the data versus the fitted values of the model discarding the ith observation:

The alternative hypothesis of ANOVA can be stated as, the means of all pairs of groups are different the means of all groups are equal the means of at least one pair of groups is different None of the above

the means of at least one pair of groups is different See 2.4 Test for Equal Means "Using the hypothesis testing procedure for equal means, we test: The null hypothesis, which that the means are all equal (mu 1 = mu 2...=mu k) versus the alternative hypothesis, that some means are different. Not all means have to be different for the alternative hypothesis to be true -- at least one pair of the means needs to be different."

In logistic regression, the relationship between the probability of success and the predicting variables is nonlinear.

True We model the probability of success given the predictors by linking the probability to the predicting variables through a nonlinear link function.

The pooled variance estimator, s pooled^2, in ANOVA is synonymous with the variance estimator, σ ^ 2, in simple linear regression because they both use mean squared error (MSE) for their calculations. (T/F)

True. See 1.2 Estimation Method for simple linear regression See 2.2 Estimation Method for ANOVA The pooled variance estimator is, in fact, the variance estimator.

The prediction interval of one member of the population will always be larger than the confidence interval of the mean response for all members of the population when using the same predicting values. (T/F)

True. See 1.7 Regression Line: Estimation & Prediction Examples "Just to wrap up the comparison, the confidence intervals under estimation are narrower than the prediction intervals because the prediction intervals have additional variance from the variation of a new measurement."

In evaluating a multiple linear model the coefficient of variation is interpreted as the percentage of variability in the response variable explained by the model.


In evaluating a multiple linear model, Residual analysis is used for goodness of fit assessment.


In evaluating a multiple linear model, the F test is used to evaluate the overall regression.


In evaluating a multiple linear model, the coefficient of variation is interpreted as the percentage of variability in the response variable explained by the model.


In evaluating a simple linear model residual analysis is used for goodness of fit assessment.


In evaluating a simple linear model the coefficient of variation is interpreted as the percentage of variability in the response variable explained by the model.


In evaluating a simple linear model there is a direct relationship between coefficient of variation and the correlation between the predicting and response variables.


In multiple linear regression, controlling variables are used to control for sample bias.


In simple linear regression, we can diagnose the assumption of constant-variance by plotting the residuals against fitted values.


It is possible to produce a model where the overall F-statistic is significant but all the regression coefficients have insignificant t-statistics.


Let LaTeX: Y^* Y ∗ be the predicted response at LaTeX: x^* x ∗ . The variance of LaTeX: Y^* Y ∗ given LaTeX: x^* x ∗ depends on both the value of LaTeX: x^* x ∗ and the design matrix.


Multicolinearity in multiple linear regression means that the columns in the design matrix are (nearly) linearly dependent.


Multiple linear regression is a general model encompassing both ANOVA and simple linear regression.


Multiple linear regression is a general model encompassing both ANOVA and simple linear regression. correct


Partial F-Test can also be defined as the hypothesis test for the scenario where a subset of regression coefficients are all equal to zero.


Residual analysis can only be used to assess uncorrelated errors.


Studying the relationship between a single response variable and more than one predicting quantitative and/or qualitative variable is termed as Multiple linear regression.


The ANOVA is a linear regression model with one or more qualitative predicting variables.


The ANOVA model with a qualitative predicting variable with LaTeX: k k levels/classes will have LaTeX: k+1 k + 1 parameters to estimate.


The LaTeX: R^2 R 2 value represents the percentage of variability in the response that can be explained by the linear regression on the predictors. Models with higher LaTeX: R^2 R 2 are always preferred over models with lower LaTeX: R^2 R 2 .


The Partial F-Test can test whether a subset of regression coefficients are all equal to zero.


The equation to find the estimated variance of the error terms can be obtained by summing up the squared residuals and dividing that by n - p - 1, where n is the sample size and p is the number of predictors.


The estimators for the regression coefficients are Uunbiased regardless of the distribution of the data. correct


The estimators of the error term variance and of the regression coefficients are random variables.


The larger the coefficient of determination or R-squared, the higher the variability explained by the simple linear regression model.


The linear regression model with a qualitative predicting variable with k levels/classes will have k + 1 parameters to estimate


The mean sum of square errors in ANOVA measures variability within groups.


The number of parameters to estimate in the case of a multiple linear regression model containing 5 predicting variables and no intercept is 6.


The one-way ANOVA is a linear regression model with one qualitative predicting variable.


The prediction intervals are centered at the predicted value.


The prediction intervals need to be corrected for simultaneous inference when multiple predictions are made jointly.


The prediction of the response variable has higher uncertainty than the estimation of the mean response.


The regression coefficients can be estimated only if the predicting variables are not linearly dependent.


The regression coefficients that are estimated serve as unbiased estimators.


The sampling distribution of the estimated regression coefficients is Centered at the true regression parameters.


The sampling distribution of the estimated regression coefficients is centered at the true regression parameters.


The sampling distribution of the estimated regression coefficients is dependent on the design matrix.


The sampling distribution of the estimated regression coefficients is the t-distribution assuming that the variance of the error term is unknown an replaced by its estimate.


The sampling distribution of the prediction of a new response is a t-distribution.


Under the normality assumption, the estimator for LaTeX: \beta_1 β 1 is a linear combination of normally distributed random variables.


Under the normality assumption, the estimator for β 1 is a linear combination of normally distributed random variables.


We assess the assumption of constant-variance by plotting the residuals against fitted values.


We can assess the assumption of constant-variance in multiple linear regression by plotting the standardized residuals against fitted values.


When estimating confidence values for the mean response for all instances of the predicting variables, we should use a critical point based on the F-distribution to correct for the simultaneous inference.


A negative value of β 1 is consistent with an inverse relationship between the predictor variable and the response variable. (T/F)

True See 1.2 Estimation Method "A negative value of β 1 is consistent with an inverse relationship"

Under the normality assumption, the estimator for β 1 is a linear combination of normally distributed random variables. (T/F)

True See 1.4 Statistical Inference "Under the normality assumption, β 1 is thus a linear combination of normally distributed random variables... β ^ 0 is also linear combination of random variables"

If the model assumptions hold, then the estimator for the variance, σ ^ 2, is a random variable. (T/F)

True See 1.8 Statistical Inference We assume that the error terms are independent random variables. Therefore, the residuals are independent random variables. Since σ ^ 2 is a combination of the residuals, it is also a random variable.

An ANOVA model with a single qualitative predicting variable containing k groups will have k + 1 parameters to estimate. (T/F)

True See 2.2 Estimation Method We have to estimate the means of the k groups and the pooled variance estimator, s pooled ^2.

If the constant variance assumption in ANOVA does not hold, the inference on the equality of the means will not be reliable. (T/F)

True See 2.8 Data Example "This is important since without a good fit, we cannot rely on the statistical inference." Only when the model is a good fit, i.e. all model assumptions hold, can we rely on the statistical inference.

If the pairwise comparison interval between groups in an ANOVA model includes zero, we conclude that the two means are plausibly equal. (T/F)

True See 2.8 Data Example If the comparison interval includes zero, then the two means are not statistically significantly different, and are thus, plausibly equal.

The logit function is the log of the ratio of the probability of success to the probability of failure. It is also known as the log odds function.

True The logit link function is also known as the log odds function.

A study was conducted to measure the effect of a fungicide treatment on the survival rate of botrytis blight. Botrytis blight samples were divided into 20 groups, each consisting of about 100 samples and exposed to different levels of chemicals in a fungicide. The output of a logistic regression model is below, where concS represents the concentration of a sulfur in the fungicide and concCu represents the concentration of a copper in the fungicide. Use it to answer the following multiple-choice questions. Call: glm(formula = cbind(Survived, Died) ~ concS + concCu,family = "binomial", data = data) Deviance Residuals: Min 1Q Median 3Q Max -9.5366 -2.4594 0.1223 3.9710 6.3566 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.58770 0.22958 15.63 <2e-16 *** concS -4.32735 0.26518 16.32 <2e-16 *** concCu -0.27483 0.01784 15.40 <2e-16 *** Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 718.76 on 19 degrees of freedom Residual deviance: 299.43 on 17 degrees of freedom AIC: 363.53 Construct an approximate 95% confidence interval for the coefficient of concCu. (-0.322, -0.249) (-4.931, -3.724) (-4.847, -3.808) (-0.310,-0.240)

(-0.310,-0.240) [-0.27483-1.96*0.01784, -0.27483+1.96*0.01784]

In the presence of near multicollinearity, the coefficient of variation decreases.


An experiment was conducted to determine the effect of gamma radiation on the numbers of chromosomal abnormalities observed in cells. A multiple linear regression model was fitted to estimate the effect of the number of cells, amount of the radiation dose (Grays), and the rate of the radiation dose (Grays/hour) on the number of chromosomal abnormalities observed. The data frame has 27 observations. Here is the model summary and Cook's Distance plot. Coefficient Estimate SE t-value Pr(>|t|) (Intercept) -74.15392 42.24544 -1.755 0.092518 cells 0.06871 0.02196 3.129 0.004709** doseamt 41.33160 9.13907 4.523 0.000153*** doserate 20.28402 8.29071 2.447 0.022482* ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 54.05 on X degrees of freedom Multiple R-squared: 0.5213, Adjusted R-squared: 0.4588 F-statistic: 8.348 on X and Y DF, p-value: 0.0006183 Suppose you wanted to test if the coefficient for doseamt is equal to 50. What t-value would you use for this test? 1.54 -0.948 0.692 -0.882

-0.948 t-value = (41.33160−50)/ 9.13907 = −8.6684/ 9.13907 = -0.9484991

A study was conducted to measure the effect of a fungicide treatment on the survival rate of botrytis blight. Botrytis blight samples were divided into 20 groups, each consisting of about 100 samples and exposed to different levels of chemicals in a fungicide. The output of a logistic regression model is below, where concS represents the concentration of a sulfur in the fungicide and concCu represents the concentration of a copper in the fungicide. Use it to answer the following multiple-choice questions. Call: glm(formula = cbind(Survived, Died) ~ concS + concCu,family = "binomial", data = data) Deviance Residuals: Min 1Q Median 3Q Max -9.5366 -2.4594 0.1223 3.9710 6.3566 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.58770 0.22958 15.63 <2e-16 *** concS -4.32735 0.26518 16.32 <2e-16 *** concCu -0.27483 0.01784 15.40 <2e-16 *** Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 718.76 on 19 degrees of freedom Residual deviance: 299.43 on 17 degrees of freedom AIC: 363.53 Suppose you wanted to test if the coefficient for concCu is equal to -0.2. What z-value would you use for this test? 0.095 -0.073 -4.195 1.411

-4.195 (-0.27483-(-0.2))/0.01784 = -4.195

In the presence of near multicollinearity, the prediction will not be impacted.


In the presence of near multicollinearity, the regression coefficients will tend to be identified as statistically significant even if they are not.


A study was conducted to measure the effect of a fungicide treatment on the survival rate of botrytis blight. Botrytis blight samples were divided into 20 groups, each consisting of about 100 samples and exposed to different levels of chemicals in a fungicide. The output of a logistic regression model is below, where concS represents the concentration of a sulfur in the fungicide and concCu represents the concentration of a copper in the fungicide. Use it to answer the following multiple-choice questions. Call: glm(formula = cbind(Survived, Died) ~ concS + concCu,family = "binomial", data = data) Deviance Residuals: Min 1Q Median 3Q Max -9.5366 -2.4594 0.1223 3.9710 6.3566 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.58770 0.22958 15.63 <2e-16 *** concS -4.32735 0.26518 16.32 <2e-16 *** concCu -0.27483 0.01784 15.40 <2e-16 *** Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 718.76 on 19 degrees of freedom Residual deviance: 299.43 on 17 degrees of freedom AIC: 363.53 What is the probability of survival for a botrytis blight sample exposed to a sulfer concentration of 0.7 and a copper concentration of 0.9? 0.826 0.674 0.311 0.577

0.577 exp(3.58770 - 4.32735*0.7 - 0.27483*0.9) / (1 + exp(3.58770 - 4.32735*0.7 - 0.27483*0.9)) = 0.577

The following output was captured from the summary output of a simple linear regression model that relates the duration of an eruption with the waiting time since the previous eruption. Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.374016 A -1.70 0.045141 * waiting 0.043714 0.011098 B 0.000052 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.4965 on 270 degrees of freedom Multiple R-squared: 0.8115, Adjusted R-squared: 0.8108 F-statistic: 1162 on 1 and 270 DF, p-value: < 2.2e-16 Using the table above, what is the standard error of the intercept, labeled A, and rounded to three decimal places? 2.336 0.808 0.806 -0.806 None of the above

0.808 See 1.4 Statistical Inference Std.Err = Estimate /t-value = -1.374016/-1.70 = 0.808


1 / 1 - R^2

In the regression model, the variable of interest for study is the predicting variable.


Box-cox transformation

A common transformation is the power transformation y to the lambda used to improve the normality and/or constant variance assumption.

Influential points

A data point that is far from the mean both x and y and change the value of the estimated parameters significantly. It can change the statistical significance, it can change the magnitude. It can change even the sign.

A study was conducted to measure the effect of a fungicide treatment on the survival rate of botrytis blight. Botrytis blight samples were divided into 20 groups, each consisting of about 100 samples and exposed to different levels of chemicals in a fungicide. The output of a logistic regression model is below, where concS represents the concentration of a sulfur in the fungicide and concCu represents the concentration of a copper in the fungicide. Use it to answer the following multiple-choice questions. Call: glm(formula = cbind(Survived, Died) ~ concS + concCu,family = "binomial", data = data) Deviance Residuals: Min 1Q Median 3Q Max -9.5366 -2.4594 0.1223 3.9710 6.3566 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.58770 0.22958 15.63 <2e-16 *** concS -4.32735 0.26518 16.32 <2e-16 *** concCu -0.27483 0.01784 15.40 <2e-16 *** Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 718.76 on 19 degrees of freedom Residual deviance: 299.43 on 17 degrees of freedom AIC: 363.53 The p-value for testing the overall regression can be obtained from which of the following? 1-pchisq(718.76,19) 1-pchisq(419.33,2) 1-pchisq(363.53,3) 1-pchisq(299.43,17

1-pchisq(419.33,2) The chi-squared test statistic is the difference between the null deviance (718.76) and the residual deviance (299.43), which is 419.33. The degrees of freedom is the difference between the null deviance degrees of freedom (19) and the residual deviance degrees of freedom (17), which is 2.

In evaluating a multiple linear model residual analysis is used for goodness of fit assessment.


The objective of multiple linear regression is

1. To predict future new responses 2. To model the association of explanatory variables to a response variable accounting for controlling factors. 3. To test hypothesis using statistical inference on the model.

You were hired to consult on a study for the attendance behavior of high school students at two different schools. The data set you were given contains for each 316 students: the number of days he/she was absent in an academic year (daysabs), his/her math scores (math), his/her language arts scores (langarts), and whether the student is male or female (1 = male, 0 = female). A Poisson regression model was fitted to evaluate the relationship between the number of days of absence in an academic year and all the predictors. The R output for the model summary is as follows: Coefficient Estimate SE z value Pr(>|z|) (Intercept) 2.687666 0.072651 36.994 <2e-16 math -0.003523 0.001821 -1.934 0.0531 langarts -0.012152 0.001835 -6.623 3.52e-11 male -0.400921 0.048412 -8.281 <2e-16 Also, assume the average language arts scores (across all students) is 50, and the average math scores (across all students) is 45.5. For students with average math and language arts scores, how many more days on average are female students absent compared to their male counterparts? 4.8545 3.5729 2.2525 0.6697

2.2525 λ(Xmath, Xlangarts, Xmale) = e^( 2.687666−0.003523Xmath−0.012152Xlangarts−0.400921∗Xmale ) λ(Xmath = 45.5, Xlangarts = 50, Xmale = 0) = e^( 2.687666−0.003523∗(45.5)−0.012152∗(50)−0.400921∗(0) = 6.819386 ) λˆ(Xmath = 45.5, Xlangarts = 50, Xmale = 1) = e ^(2.687666−0.003523∗(45.5)−0.012152∗(50)−0.400921∗(1) = 4.566963 ) λ(Xmath = 45.5, Xlangarts = 50, Xmale = 0) − λ(Xmath = 45.5, Xlangarts = 50, Xmale = 1) = 2.252423

An experiment was conducted to determine the effect of gamma radiation on the numbers of chromosomal abnormalities observed in cells. A multiple linear regression model was fitted to estimate the effect of the number of cells, amount of the radiation dose (Grays), and the rate of the radiation dose (Grays/hour) on the number of chromosomal abnormalities observed. The data frame has 27 observations. Here is the model summary and Cook's Distance plot. Coefficient Estimate SE t-value Pr(>|t|) (Intercept) -74.15392 42.24544 -1.755 0.092518 cells 0.06871 0.02196 3.129 0.004709** doseamt 41.33160 9.13907 4.523 0.000153*** doserate 20.28402 8.29071 2.447 0.022482* ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 54.05 on X degrees of freedom Multiple R-squared: 0.5213, Adjusted R-squared: 0.4588 F-statistic: 8.348 on X and Y DF, p-value: 0.0006183 For an F-test of overall significance of the regression model, what degrees of freedom would be used? 3 , 24 2, 27 3, 23 2, 23

3, 23 The numerator degrees of freedom (ndf) is equal to p and the denominator degrees of freedom (ddf) is equal to n-p-1, where n: number of observations and p: number of predictors. Hence, ndf = 3 and ddf = 27-3-1 = 23

Which one is correct? A. The prediction intervals need to be corrected for simultaneous inference when multiple predictions are made jointly. B. The prediction intervals are centered at the predicted value. C. The sampling distribution of the prediction of a new response is a t-distribution. D. All of the above.

3.2 - Knowledge Check 3

The following output was captured from the summary output of a simple linear regression model that relates the duration of an eruption with the waiting time since the previous eruption. Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.374016 A -1.70 0.045141 * waiting 0.043714 0.011098 B 0.000052 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.4965 on 270 degrees of freedom Multiple R-squared: 0.8115, Adjusted R-squared: 0.8108 F-statistic: 1162 on 1 and 270 DF, p-value: < 2.2e-16 Using the table above, what is the t-value of the coefficient for waiting, labeled B, and rounded to three decimal places? 3.939 3.931 3.935 None of the above

3.939 See 1.4 Statistical Inference t-value = Estimate /Std.Err= 0.043714/0.011098 = 3.939

You were hired to consult on a study for the attendance behavior of high school students at two different schools. The data set you were given contains for each 316 students: the number of days he/she was absent in an academic year (daysabs), his/her math scores (math), his/her language arts scores (langarts), and whether the student is male or female (1 = male, 0 = female). A Poisson regression model was fitted to evaluate the relationship between the number of days of absence in an academic year and all the predictors. The R output for the model summary is as follows: Coefficient Estimate SE z value Pr(>|z|) (Intercept) 2.687666 0.072651 36.994 <2e-16 math -0.003523 0.001821 -1.934 0.0531 langarts -0.012152 0.001835 -6.623 3.52e-11 male -0.400921 0.048412 -8.281 <2e-16 Also, assume the average language arts scores (across all students) is 50, and the average math scores (across all students) is 45.5. What is the expected number of days missed for a female student with a langarts of 48 and a math score of 50 based on the model? 6.8773 1.9106 6.6363 4.5251

6.8773 λ(Xmath, Xlangarts, Xmale) = e^( 2.687666−0.003523Xmath−0.012152Xlangarts−0.400921Xmale) λ(Xmath = 50, Xlangarts = 48, Xmale = 0) = e^( 2.687666−0.003523∗50−0.012152∗48−0.400921∗0) = 6.877258

An experiment was conducted to determine the effect of gamma radiation on the numbers of chromosomal abnormalities observed in cells. A multiple linear regression model was fitted to estimate the effect of the number of cells, amount of the radiation dose (Grays), and the rate of the radiation dose (Grays/hour) on the number of chromosomal abnormalities observed. The data frame has 27 observations. Here is the model summary and Cook's Distance plot. Coefficient Estimate SE t-value Pr(>|t|) (Intercept) -74.15392 42.24544 -1.755 0.092518 cells 0.06871 0.02196 3.129 0.004709** doseamt 41.33160 9.13907 4.523 0.000153*** doserate 20.28402 8.29071 2.447 0.022482* ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 54.05 on X degrees of freedom Multiple R-squared: 0.5213, Adjusted R-squared: 0.4588 F-statistic: 8.348 on X and Y DF, p-value: 0.0006183 Calculate the Sum of Squared Regression from the model summary. 17,484.25 73,163.60 67,181.18 55,284.40

73,163.60 You could calculate this value in several ways. This is one possible way. Fstat = MSReg/MSE = (SSReg/p)/MSE SSReg = Fstat ∗ MSE ∗ p = (8.348)(54.052 )(3) = 73,163.60

A study was conducted to measure the effect of a fungicide treatment on the survival rate of botrytis blight. Botrytis blight samples were divided into 20 groups, each consisting of about 100 samples and exposed to different levels of chemicals in a fungicide. The output of a logistic regression model is below, where concS represents the concentration of a sulfur in the fungicide and concCu represents the concentration of a copper in the fungicide. Use it to answer the following multiple-choice questions. Call: glm(formula = cbind(Survived, Died) ~ concS + concCu,family = "binomial", data = data) Deviance Residuals: Min 1Q Median 3Q Max -9.5366 -2.4594 0.1223 3.9710 6.3566 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.58770 0.22958 15.63 <2e-16 *** concS -4.32735 0.26518 16.32 <2e-16 *** concCu -0.27483 0.01784 15.40 <2e-16 *** Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 718.76 on 19 degrees of freedom Residual deviance: 299.43 on 17 degrees of freedom AIC: 363.53 Interpret the coefficient for concCu. A 1-unit increase in the concentration of copper decreases the odds of botrytis blight surviving by 0.27483 holding sulfer constant. A 1-unit increase in the concentration of copper decreases the number of samples of botrytis blight surviving by 0.27483 holding sulfer constant. A 1-unit increase in the concentration of copper decreases the log odds of botrytis blight surviving by 0.27483 holding sulfer constant. A 1-unit increase in the concentration of copper decreases the probability of botrytis blight surviving by 0.27483 holding sulfer constant.

A 1-unit increase in the concentration of copper decreases the log odds of botrytis blight surviving by 0.27483 holding sulfer constant.

In evaluating a multiple linear model the F test is used to evaluate the overall regression.


Assuming that the data are normally distributed, the estimated variance has the following sampling distribution under the simple linear model: A. Chi-square with n-2 degrees of freedom B. T-distribution with n-2 degrees of freedom C. Chi-square with n degrees of freedom D. T-distribution with n degrees of freedom

A. Chi-square with n-2 degrees of freedom 1.1 - Knowledge Check 1

The estimators of the linear regression model are derived by: A. Minimizing the sum of squared differences between observed and expected values of the response variable. B. Maximizing the sum of squared differences between observed and expected values of the response variable. C. Minimizing the sum of absolute differences between observed and expected values of the response variable. D. Maximizing the sum of absolute differences between observed and expected values of the response variable.

A. Minimizing the sum of squared differences between observed and expected values of the response variable. 1.1 - Knowledge Check 1

Which one is correct? A. The regression coefficients can be estimated only if the predicting variables are not linearly dependent. B. The estimated regression coefficient 𝛽∧𝑖 is interpreted as the change in the response variable associated with one unit of change in the i-th predicting variable . C. The estimated regression coefficients will be the same under marginal and conditional model, only their interpretation is not. D. Causality is the same as association in interpreting the relationship between the response and the predicting variables.

A. The regression coefficients can be estimated only if the predicting variables are not linearly dependent. 3.1 - Knowledge Check 1

The pooled variance estimator is: A. The variance estimator assuming equal variances. B. The variance estimator assuming equal means and equal variances. C. The sample variance estimator assuming equal means. D. None of the above.

A. The variance estimator assuming equal variances. 2.1 - Knowledge Check 1

The mean squared errors (MSE) measures: A. The within-treatment variability. B. The between-treatment variability. C. The sum of the within-treatment and between-treatment variability. D. None of the above.

A. The within-treatment variability. 2.1 - Knowledge Check 2

The objective of the residual analysis is A. To evaluate goodness of fit B. To evaluate whether the means are equal. C. To evaluate whether only the normality assumptions holds. D. None of the above.

A. To evaluate goodness of fit 2.2 - Knowledge Check 3

We detect departure from the assumption of constant variance A. When the residuals increase as the fitted values increase also. B. When the residuals vs fitted are scattered randomly around the zero line. C. When the histogram does not have a symmetric shape. D. All of the above.

A. When the residuals increase as the fitted values increase also. 1.3 - Knowledge Check 4

The sampling distribution of β ^ 0 is a: A. t-distribution B. chi-squared distribution C. normal distribution D. None of the above

A. t-distribution See 1.4 Statistical Inference The distribution of β 0 is normal. Since we are using a sample and not the full population, the sampling distribution of β ^ 0 is the t-distribution.

How can we diagnose multicollinearity?

An approach to diagnose collinearities through the computation of the Variance Inflation Factor, which you will compute for EACH predicting variable.


Any data point that is far from the majority from the data in both x and y

The estimated versus predicted regression line for a given x*: A. Have the same variance B. Have the same expectation C. Have the same variance and expectation D. None of the above

B. Have the same expectation 1.2 - Knowledge Check 3

The fitted values are defined as: A. The difference between observed and expected responses. B. The regression line with parameters replaced with the estimated regression coefficients. C. The regression line. D. The response values.

B. The regression line with parameters replaced with the estimated regression coefficients. 1.1 - Knowledge Check 1

The total sum of squares divided by N-1 is A. The mean sum of squared errors B. The sample variance estimator assuming equal means and equal variances C. The sample variance estimator assuming equal variances. D. None of the above.

B. The sample variance estimator assuming equal means and equal variances 2.1 - Knowledge Check 2

The objective of the pairwise comparison is A. To find which means are equal. B. To identify the statistically significantly different means. C. To find the estimated means which are greater or lower than other. D. None of the above.

B. To identify the statistically significantly different means. 2.2 - Knowledge Check 3

To test if a coefficient is less than a critical value, C, we conduct a one-sided test on the _________ tail of a ___________ distribution. A. left, normal B. left, t C. right, normal D. right, t E. None of the above

B. left, t See 1.4 Statistical Inference "For β 1 greater than zero we're interested on the right tail of the distribution of the β ^ 1."

The F-test is a _________ tailed test with ______ and ______ degrees of freedom. A. one, k, N-1 B. one, k-1, N-k C. two, k-1, N-k D. two, k, N-1 E. None of the above.

B. one, k-1, N-k See 2.4 Test for Equal Means The F-test is a one tailed test that has two degrees of freedom, namely k − 1 and N − k.

Which is correct? A. If we reject the test of equal means, we conclude that all treatment means are not equal. B. If we do not reject the test of equal means, we conclude that means are definitely all equal C. If we reject the test of equal means, we conclude that some treatment means are not equal. D. None of the above.

C. If we reject the test of equal means, we conclude that some treatment means are not equal. 2.1 - Knowledge Check 2

The assumption of normality: A. It is needed for deriving the estimators of the regression coefficients. B. It is not needed for linear regression modeling and inference. C. It is needed for the sampling distribution of the estimators of the regression coefficients and hence for inference. D. It is needed for deriving the expectation and variance of the estimators of the regression coefficients.

C. It is needed for the sampling distribution of the estimators of the regression coefficients and hence for inference. 1.2 - Knowledge Check 2

Which one is correct? A. A multiple linear regression model with p predicting variables but no intercept has p model parameters. B. The interpretation of the regression coefficients is the same whether or not interaction terms are included in the model. C. Multiple linear regression is a general model encompassing both ANOVA and simple linear regression. D. None of the above.

C. Multiple linear regression is a general model encompassing both ANOVA and simple linear regression. 3.1 - Knowledge Check 1

Which one is correct? A. Independence assumption can be assessed using the residuals vs fitted values. B. Independence assumption can be assessed using the normal probability plot. C. Residual analysis can be used to assess uncorrelated errors. D. None of the above

C. Residual analysis can be used to assess uncorrelated errors. 1.3 - Knowledge Check 4

The variability in the prediction comes from: A. The variability due to a new measurement. B. The variability due to estimation C. The variability due to a new measurement and due to estimation. D. None of the above.

C. The variability due to a new measurement and due to estimation. 1.2 - Knowledge Check 3

The alternative hypothesis of ANOVA can be stated as, A. the means of all pairs of groups are different B. the means of all groups are equal C. the means of at least one pair of groups is different D. None of the above

C. the means of at least one pair of groups is different See 2.4 Test for Equal Means "Using the hypothesis testing procedure for equal means, we test: The null hypothesis, which that the means are all equal (mu 1 = mu 2...=mu k) versus the alternative hypothesis, that some means are different. Not all means have to be different for the alternative hypothesis to be true -- at least one pair of the means needs to be different."

When do we use transformations? A) If the linearity assumption with respect to one or more predictors does not hold, then we use transformations of the corresponding predictors to improve on this assumption. B) If the normality assumption does not hold, we transform the response variable, commonly using the Box-Cox transformation. C) If the constant variance assumption does not hold, we transform the response variable. D) All of the above.


Which one is correct? A) The residuals have constant variance for the multiple linear regression model. B) The residuals vs. fitted can be used to assess the assumption of independence. C) The residuals have a t-distribution if the error term is assumed to have a normal distribution. D) None of the above.


Assuming that the data are normally distributed, under the simple linear model, the estimated variance has the following sampling distribution:

Chi-square with n-2 degrees of freedom

You were hired to consult on a study for the attendance behavior of high school students at two different schools. The data set you were given contains for each 316 students: the number of days he/she was absent in an academic year (daysabs), his/her math scores (math), his/her language arts scores (langarts), and whether the student is male or female (1 = male, 0 = female). A Poisson regression model was fitted to evaluate the relationship between the number of days of absence in an academic year and all the predictors. The R output for the model summary is as follows: Coefficient Estimate SE z value Pr(>|z|) (Intercept) 2.687666 0.072651 36.994 <2e-16 math -0.003523 0.001821 -1.934 0.0531 langarts -0.012152 0.001835 -6.623 3.52e-11 male -0.400921 0.048412 -8.281 <2e-16 Also, assume the average language arts scores (across all students) is 50, and the average math scores (across all students) is 45.5. The approximated distribution of the residual deviance is ____ with ____ degrees of freedom. Normal, 315 Chi-squared, 312 Chi-squared, 315 t, 312

Chi-squared, 312 The approximated distribution of the residual deviance is Chi-square with n-p-1 degrees of freedom. In this example n = 316 and p = 3 ; Hence df= 312.

Plotting the residuals versus fitted values checks for which assumption?

Constant variance & Independence

What can we use to identify outliers?

Cook's Distance

In evaluating a multiple linear model: A) The F test is used to evaluate the overall regression. B) The coefficient of variation is interpreted as the percentage of variability in the response variable explained by the model. C) Residual analysis is used for goodness of fit assessment. D) All of the above.


In the presence of near multicollinearity: A) The coefficient of variation decreases. B) The regression coefficients will tend to be identified as statistically significant even if they are not. C) The prediction will not be impacted. D) None of the above.


In evaluating a simple linear model A. There is a direct relationship between the coefficient of determination and the correlation between the predicting and response variables. B. The coefficient of determination is interpreted as the percentage of variability in the response variable explained by the model. C. Residual analysis is used for goodness of fit assessment. D. All of the above.

D. All of the Above 1.3 - Knowledge Check 4

In evaluating a multiple linear model, A. The F test is used to evaluate the overall regression. B. The coefficient of determination is interpreted as the percentage of variability in the response variable explained by the model. C. Residual analysis is used for goodness of fit assessment. D. All of the above.

D. All of the Above 3.3 - Knowledge Check 4

When do we use transformations? A. If the linearity assumption with respect to one or more predictors does not hold, then we use transformations of the corresponding predictors to improve on this assumption. B. If the normality assumption does not hold, we transform the response variable, commonly using the Box-Cox transformation. C. If the constant variance assumption does not hold, we transform the response variable. D. All of the above.

D. All of the Above 3.3 - Knowledge Check 4

The objective of multiple linear regression is A. To predict future new responses B. To model the association of explanatory variables to a response variable accounting for controlling factors. C. To test hypothesis using statistical inference on the model. D. All of the above.

D. All of the above. 3.1 - Knowledge Check 1

The sampling distribution of the estimated regression coefficients is A. Centered at the true regression parameters. B. The t-distribution assuming that the variance of the error term is unknown an replaced by its estimate. C. Dependent on the design matrix. D. All of the above.

D. All of the above. 3.2 - Knowledge Check 2

In the presence of near multicollinearity, A. The coefficient of determination decreases. B. The regression coefficients will tend to be identified as statistically significant even if they are not. C. The prediction will not be impacted. D. None of the above.

D. None of the Above 3.3 - Knowledge Check 4

Which one is correct? A. The residuals have constant variance for the multiple linear regression model. B. The residuals vs fitted can be used to assess the assumption of independence. C. The residuals have a t-distribution distribution if the error term is assumed to have a normal distribution. D. None of the above.

D. None of the Above 3.3 - Knowledge Check 4

Which one is correct? A. If a departure from normality is detected, we transform the predicting variable to improve upon the normality assumption. B. If a departure from the independence assumption is detected, we transform the response variable to improve upon this assumption. C. The Box-Cox transformation is commonly used to improve upon the linearity assumption. D. None of the above

D. None of the above. 1.3 - Knowledge Check 4

Which are all the model parameters in ANOVA? A. The means of the k populations. B. The sample means of the k populations. C. The sample means of the k samples. D. None of the above.

D. None of the above. 2.1 - Knowledge Check 1

Which one correctly characterizes the sampling distribution of the estimated variance? A. The estimated variance of the error term has a 𝜒2distribution regardless of the distribution assumption of the error terms. B. The number of degrees of freedom for the 𝜒2 distribution of the estimated variance is n-p-1 for a model without intercept. C. The sampling distribution of the mean squared error is different of that of the estimated variance. D. None of the above.

D. None of the above. 3.1 - Knowledge Check 1

We can test for a subset of regression coefficients A. Using the F statistic test of the overall regression. B. Only if we are interested whether additional explanatory variables should be considered in addition to the controlling variables. C. To evaluate whether all regression coefficients corresponding to the predicting variables excluded from the reduced model are statistically significant. D. None of the above.

D. None of the above. 3.2 - Knowledge Check 2

The estimators for the regression coefficients are: A. Biased but with small variance B. Biased with large variance C. Unbiased under normality assumptions but biased otherwise. D. Unbiased regardless of the distribution of the data.

D. Unbiased regardless of the distribution of the data. 1.2 - Knowledge Check 2

The estimators for the regression coefficients are: A. Biased but with small variance B. Unbiased under normality assumptions but biased otherwise. C. Biased regardless of the distribution of the data. D. Unbiased regardless of the distribution of the data.

D. Unbiased regardless of the distribution of the data. 3.2 - Knowledge Check 2

Leverage points

Data points that are far from the mean of the x's

You were hired to consult on a study for the attendance behavior of high school students at two different schools. The data set you were given contains for each 316 students: the number of days he/she was absent in an academic year (daysabs), his/her math scores (math), his/her language arts scores (langarts), and whether the student is male or female (1 = male, 0 = female). A Poisson regression model was fitted to evaluate the relationship between the number of days of absence in an academic year and all the predictors. The R output for the model summary is as follows: Coefficient Estimate SE z value Pr(>|z|) (Intercept) 2.687666 0.072651 36.994 <2e-16 math -0.003523 0.001821 -1.934 0.0531 langarts -0.012152 0.001835 -6.623 3.52e-11 male -0.400921 0.048412 -8.281 <2e-16 Also, assume the average language arts scores (across all students) is 50, and the average math scores (across all students) is 45.5. How does an increase in 1 unit in langarts affect the expected number of days missed, given that the other predictors in the model are held constant? Increase by 0.012152 days Increase by 0.9879 days Increase by 1.22% Decrease by 1.21%

Decrease by 1.21% The estimated coefficient for langarts is -0.012152. A one unit increase in langarts gives us e −0.012152 = 0.9879215. In terms of percentages, this should be interpreted as the expected number of days missed decreasing by 1.21% (1-0.9879215). Hence, given that the other predictors in the model are held constant, a one unit increase in langarts results in the expected number of days missed decreasing by 1.21%, holding all other predictors constant.

A data point far from the mean of the x's and y's is always: A. an influential point and an outlier B. a leverage point but not an outlier C. an outlier and a leverage point D. an outlier but not a leverage point E. None of the above

E. None of the Above. See 1.9 Outliers and Model Evaluation We only know that the data point is far from the mean of x's and y's. It only fits the definition of a leverage point because the only information we know is that it is far from the mean of the x's. So you can eliminate the answers that do not include a leverage point. That leaves us with remaining possibilities, "a leverage point but not an outlier" and "an outlier and a leverage point" , both of which we can eliminate. We do not have enough information to know if it is or is not an outlier . None of the answers above fit the criteria of it being always being a leverage point.

T/F: If the VIF for each predicting variable is smaller than a certain threshold, then we can say that multicollinearity does not exist in this model.


T/F: In multiple linear regression, a VIF value of 6 for a predictor means that 80% of the variation in that predictor can be modeled by the other predictors.


T/F: In multiple linear regression, if the coefficient of a quantitative predicting variable is negative, that means the response variable will decrease as this predicting variable increases.


T/F: In multiple linear regression, we need the linearity assumption to hold for at least one of the predicting variables


T/F: Multicollinearity in multiple linear regression means that the rows in the design matrix are (nearly) linearly dependent.


T/F: The coefficient of variation is used to evaluate goodness-of-fit.


T/F: The prediction of the response variable and the estimation of the mean response have the same interpretation.


T/F: The prediction of the response variable has the same levels of uncertainty compared with the estimation of the mean response.


T/F: The sampling distribution of the prediction of the response variable is a χ 2(chi-squared) distribution.


T/F; A linear regression model has high predictive power if the coefficient of determination is close to 1.


A multiple linear regression model with p predicting variables but no intercept has p model parameters.


Causality is the same as association in interpreting the relationship between the response and the predicting variables.


For a multiple regression model, both the true errors LaTeX: \epsilon ϵ and the estimated residuals LaTeX: \hat \epsilon ϵ ^ have a constant mean and a constant variance.


For estimating confidence intervals for the regression coefficients, the sampling distribution used is a normal distribution.


For testing if a regression coefficient is zero, the normal test can be used.


For the model LaTeX: y=\beta_0+\beta_1x_1+...+\beta_px_p+\epsilon y = β 0 + β 1 x 1 + ... + β p x p + ϵ , where LaTeX: \epsilon\sim N(0,\sigma^2) ϵ ∼ N ( 0 , σ 2 ) , there are p+1 parameters to be estimated


Given a categorial predictor with 4 categories in a linear regression model with intercept, 4 dummy variables need to be included in the model.


If a departure from normality is detected, we transform the predicting variable to improve upon the normality assumption.


If a departure from the independence assumption is detected, we transform the response variable to improve upon this assumption.


If one confidence interval in the pairwise comparison does not include zero, we conclude that the two means are plausibly equal.


If the confidence interval for a regression coefficient contains the value zero, we interpret that the regression coefficient is definitely equal to zero.


If the non-constant variance assumption does not hold in multiple linear regression, we apply a transformation to the predicting variables.


If the p-value of the overall F-test is close to 0, we can conclude all the predicting variable coefficients are significantly nonzero.


If we do not reject the test of equal means, we conclude that means are definitely all equal


If we reject the test of equal means, we conclude that all treatment means are not equal.


In a multiple linear regression model with quantitative predictors, the coefficient corresponding to one predictor is interpreted as the estimated expected change in the response variable when there is a one unit change in that predictor.


In linear regression, outliers do not impact the estimation of the regression coefficients.


In the ANOVA, the number of degrees of freedom of the chi-squared distribution for the variance estimator is N-k-1 where k is the number of groups.


In the simple linear regression model, we lose three degrees of freedom because of the estimation of the three model parameters LaTeX: \beta_0,\:\beta_1,\sigma^2 β 0 , β 1 , σ 2 .


In the simple linear regression model, we lose three degrees of freedom because of the estimation of the three model parameters β 0 , β 1 , σ 2 .


Independence assumption can be assessed using the normal probability plot.


Independence assumption can be assessed using the residuals vs fitted values.


LaTeX: \beta_1 β 1 is an unbiased estimator for LaTeX: \beta_0 β 0 .


Observational studies allow us to make causal inference.


One-way ANOVA is a linear regression model with more than one qualitative predicting variables.


Only the log-transformation of the response variable can be used when the normality assumption does not hold.


Only the log-transformation of the response variable should be used when the normality assumption does not hold.


Prediction is the only objective of multiple linear regression.


Suppose x1 was not found to be significant in the model specified with lm(y ~ x1 + x2 + x3). Then x1 will also not be significant in the model lm(y ~ x1 + x2).


The Box-Cox transformation is commonly used to improve upon the linearity assumption.


The F-test can be used to evaluate the relationship between two qualitative variables.


The causation effect of a predicting variable to the response variable can be captured using multiple linear regression, conditional of other predicting variables in the model.


The causation of a predicting variable to the response variable can be captured using Multiple linear regression, conditional of other predicting variables in the model.


The constant variance assumption is diagnosed by plotting the predicting variable vs. the response variable.


The constant variance is diagnosted using the quantile-quantile normal plot.


The estimated regression coefficient \beta^hat_i is interpreted as the change in the response variable associated with one unit of change in the i-th predicting variable .


The estimated regression coefficients will be the same under marginal and conditional model, only their interpretation is not.


The estimated variance of the error term has a \chi^2 distribution regardless of the distribution assumption of the error terms.


The estimator LaTeX: \hat \sigma^2 σ ^ 2 is a fixed variable.


The estimator σ ^ 2 is a fixed variable.


The interpretation of the regression coefficients is the same whether or not interaction terms are included in the model.


The means of the k populations is a model parameter in ANOVA.


The number of degrees of freedom for the \chi^2 distribution of the estimated variance is n-p-1 for a model without intercept.


The number of degrees of freedom of the LaTeX: \chi^2 χ 2 (chi-square) distribution for the pooled variance estimator is LaTeX: N-k+1 N − k + 1 where LaTeX: k k is the number of samples.


The number of degrees of freedom of the χ 2 (chi-square) distribution for the variance estimator is N − k + 1 where k is the number of samples.


The only assumptions for a linear regression model are linearity, constant variance, and normality.


The only assumptions for a simple linear regression model are linearity, constant variance, and normality.


The regression coefficient corresponding to one predictor is interpreted in a multiple regression in terms of the estimated expected change in the response variable when there is a change of one unit in the corresponding predicting variable.


The regression coefficient is used to measure the linear dependence between two variables.


The residuals have a t-distribution distribution if the error term is assumed to have a normal distribution.


The residuals have constant variance for the multiple linear regression model.


The residuals vs fitted can be used to assess the assumption of independence.


The sample means of the k populations is a model parameter in ANOVA.


The sample means of the k samples is a model parameter in ANOVA.


The sampling distribution for the variance estimator in ANOVA is LaTeX: \chi^2 χ 2 (chi-square) with N - k degrees of freedom.


The sampling distribution for the variance estimator in ANOVA is χ 2 (chi-square) regardless of the assumptions of the data.


The sampling distribution of the mean squared error is different of that of the estimated variance.


The statistical inference for linear regression under normality relies on large size of sample data.


There are four assumptions needed for estimation with multiple linear regression: mean zero, constant variance, independence, and normality.


We can test for a subset of regression coefficients only if we are interested whether additional explanatory variables should be considered in addition to the controlling variables.


We can test for a subset of regression coefficients to evaluate whether all regression coefficients corresponding to the predicting variables excluded from the reduced model are statistically significant.


We can test for a subset of regression coefficients using the F statistic test of the overall regression.


We cannot estimate a multiple linear regression model if the predicting variables are linearly independent.


We do not need to assume normality of the response variable for making inference on the regression coefficients.


β 1 is an unbiased estimator for β 0 .


The log-likelihood function is a linear function with a closed-form solution.

False Maximizing the log-likelihood function with respect to the coefficients in closed form expression is not possible because the log-likelihood function is non-linear.

The sampling distribution for the variance estimator in simple linear regression is χ 2 (chi-squared) regardless of the assumptions of the data. (T/F)

False See 1.2 Estimation Method "The sampling distribution of the estimator of the variance is chi-squared, with n - 2 degrees of freedom (more on this in a moment). This is under the assumption of normality of the error terms."

We assess the constant variance assumption by plotting the error terms, ϵ i, against fitted values. (T/F)

False See 1.2 Estimation Method "We use ϵ ^ i as proxies for the deviances or the error terms. We don't have the deviances because we don't have β 0 and β 1.

The simple linear regression coefficient, β ^ 0, is used to measure the linear relationship between the predicting and response variables. (T/F)

False See 1.2 Estimation Method β ^ 0 is the intercept and does not tell us about the relationship between the predicting and response variables.

β ^ 1 is an unbiased estimator for β 0.

False See 1.4 Statistical Inference "What that means is that β ^ 1 is an unbiased estimator for β 1." It is not an unbiased estimator for β 0.

β ^ 1 is an unbiased estimator for β 0. (T/F)

False See 1.4 Statistical Inference "What that means is that β ^ 1 is an unbiased estimator for β 1." It is not an unbiased estimator for β 0.

The p-value is a measure of the probability of rejecting the null hypothesis. (T/F)

False See 1.5 Statistical Inference Data Example "p-value is a measure of how rejectable the null hypothesis is... It's not the probability of rejecting the null hypothesis, nor is it the probability that the null hypothesis is true."

For a multiple linear regression model to be a good fit, we need the linearity assumption to hold for only one of the predicting variables. (T/F)

False See Lesson 3.11: Assumptions and diagnostics In multiple linear regression, we need the linearity assumption to hold for all of the predicting variables, for the model to be a good fit. "For example, if the linearity does not hold with one or more predicting variables, then we could transform the predicting variables to improve the linearity assumption."

Given a quantitative predicting variable and a qualitative predicting variable with 7 categories in a linear regression model with intercept, 7 dummy variables need to be included in the model.

False See Lesson 3.2: Basic Concepts We only need 7 dummy variables. "When we have qualitative variables with k levels, we only include k-1 dummy variables if the regression model has an intercept."

In simple linear regression models, we lose three degrees of freedom when estimating the variance because of the estimation of the three model parameters β 0 , β 1 , σ^2. (T/F)

False. See 1.2 Estimation Method "The estimator for σ 2 is σ ^ 2, and is the sum of the squared residuals, divided by n - 2." We lose two degrees of freedom because the variance estimator, σ ^ 2, uses only the estimates for β 0, and β 1 in its calculation.

With the Box-Cox transformation, when λ = 0 we do not transform the response. (T/F)

False. See 1.8 Diagnostics When λ = 0, we transform using the normal log.

In ANOVA, the linearity assumption is assessed using a plot of the response against the predicting variable. (T/F)

False. See 2.2 - Estimation Method Linearity is not an assumption of Anova.

In multiple linear regression, a VIF value of 6 for a predictor means that 90% of the variation in that predictor can be modeled by the other predictors. (T/F)

False. See Lesson 3.13: Model Evaluation and Multicollinearity A VIF value of 6 for a predictor means that 83.3% of the variation in that predictor can be modeled by the other predictors in the model.

Multicollinearity in multiple linear regression means that the rows in the design matrix are (nearly) linearly dependent. (T/F)

False. See Lesson 3.13: Model Evaluation and Multicollinearity Multicollinearity in multiple linear regression means that the columns in the design matrix are (nearly) linearly dependent.

Multicollinearity among the predicting variables will not impact the standard errors of the estimated regression coefficients. (T/F)

False. See Lesson 3.13: Multicollinearity Multicollinearity in the predicting variables can impact the standard errors of the estimated coefficients. "However, the bigger problem is that the standard errors will be artificially large."

In multiple linear regression, the prediction of the response variable and the estimation of the mean response have the same interpretation. (T/F)

False. See Lesson 3.2.9: Regression Line and Predicting a New Response. In multiple linear regression, the prediction of the response variable and the estimation of the mean response do not have the same interpretation.

A multiple linear regression model contains 6 quantitative predicting variables and an intercept. The number of parameters to estimate in this model is 7. (T/F)

False. See Lesson 3.2: Basic Concepts The number of parameters to estimate in a multiple linear regression model containing 6 quantitative predicting variables and an intercept is 8: 7 regression coefficients (β0,β1,...,β6) and the variance of the error terms (σ2).

Given a a quantitative predicting variable and a qualitative predicting variable with 7 categories in a linear regression model with intercept, 7 dummy variables need to be included in the model. (T/F)

False. See Lesson 3.2: Basic Concepts We only need 7 dummy variables. "When we have qualitative variables with k levels, we only include k-1 dummy variables if the regression model has an intercept."

The estimated variance of the error terms of a multiple linear regression model with intercept can be obtained by summing up the squared residuals and dividing the sum by n - p , where n is the sample size and p is the number of predictors. (T/F)

False. See Lesson 3.3: Regression Parameter Estimation The estimated variance of the error terms of a multiple linear regression model with intercept should be obtained by summing up the squared residuals and dividing that by n-p-1, where n is the sample size and p is the number of predictors as we lose p+1 degrees of freedom when we estimate the p coefficients and 1 intercept.

The causation of a predicting variable to the response variable can be captured using multiple linear regression on observational data, conditional of other predicting variables in the model. (T/F)

False. See Lesson 3.4 Model Interpretation "This is particularly prevalent in a context of making causal statements when the setup of the regression does not allow so. Causality statements can only be made in a controlled environment such as randomized trials or experiments. "

Conducting t-tests on each β parameter in a multiple linear regression model is the preferable to an F-test when testing the overall significance of the model. (T/F)

False. See Lesson 3.7: Testing for Subsets of Coefficients "We cannot and should not select the combination of predicting variables that most explains the variability in the response based on the t-tests for statistical significance because the statistical significance depends on what other variables are in the model."


Generally occurs when there are high correlations between two or more predictor variables. In other words, one predictor variable can be used to predict the other. This creates redundant info, skewing the results in a regression model.

What are the null and alternative hypotheses of the F test?

H0: all the regression coefficients except the intercept are 0. HA: at least one is not 0.

The estimated versus predicted regression line for a given x*

Have the same expectation

The estimated versus predicted regression line for a given x*:

Have the same expectation

What is the rule of thumb for Cook's Distance?

If Di > 4/n or Di >1 or any "large" Di should be investigated

When do we use transformations?

If the linearity assumption with respect to one or more predictors does not hold, then we use transformations of the corresponding predictors to improve on this assumption. If the normality assumption does not hold, we transform the response variable, commonly using the Box-Cox transformation. If the constant variance assumption does not hold, we transform the response variable.

What does it mean when we reject the H0 of the F test?

If we reject the null hypothesis, we will conclude that at least one of the predicting variables has explanatory power for the variability in the response.

An experiment was conducted to determine the effect of gamma radiation on the numbers of chromosomal abnormalities observed in cells. A multiple linear regression model was fitted to estimate the effect of the number of cells, amount of the radiation dose (Grays), and the rate of the radiation dose (Grays/hour) on the number of chromosomal abnormalities observed. The data frame has 27 observations. Here is the model summary and Cook's Distance plot. Coefficient Estimate SE t-value Pr(>|t|) (Intercept) -74.15392 42.24544 -1.755 0.092518 cells 0.06871 0.02196 3.129 0.004709** doseamt 41.33160 9.13907 4.523 0.000153*** doserate 20.28402 8.29071 2.447 0.022482* ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 54.05 on X degrees of freedom Multiple R-squared: 0.5213, Adjusted R-squared: 0.4588 F-statistic: 8.348 on X and Y DF, p-value: 0.0006183 How does an increase in 1 unit in doserate affect the expected number of chromosome abnormalities, given that the other predictors in the model are held constant? Increase of 8.291 Decrease of 41.331 Increase of 20.284 Decrease of 9.134

Increase of 20.284 The estimated coefficient for doserate is 20.284. If we fix all other predictors, for each 1 unit increase in doserate, the expected number of chromosome abnormalities increases 20.284 units.

The assumption of normality:

It is needed for the sampling distribution of the estimators of the regression coefficients and hence for inference.

Plotting the residuals versus each predictor checks for which assumption?


What are the 4 assumptions of MLR?

Linearity, Constant Variance, Independence, Normality

The estimators of the linear regression model are derived by:

Minimizing the sum of squared differences between observed and expected values of the response variable.

If a predicting variable is categorical with 5 categories in a linear regression model without intercept, we will include 5 dummy variables in the model.


If one confidence interval in the pairwise comparison includes only positive values, we conclude that the difference in means is positive, and statistically significant.


Do the estimated residuals have constant variance?


A data point far from the mean of the x's and y's is always: an influential point and an outlier a leverage point but not an outlier an outlier and a leverage point an outlier but not a leverage point None of the above

None of the above See 1.9 Outliers and Model Evaluation We only know that the data point is far from the mean of x's and y's. It only fits the definition of a leverage point because the only information we know is that it is far from the mean of the x's. So you can eliminate the answers that do not include a leverage point. That leaves us with remaining possibilities, "a leverage point but not an outlier" and "an outlier and a leverage point" , both of which we can eliminate. We do not have enough information to know if it is or is not an outlier . None of the answers above fit the criteria of it being always being a leverage point.

What does the QQ plot and histogram check for?


What are three ways we can transform the predicting variables?

Power, Log, Polynomial transformations

T/F: Cook's distance measures how much the fitted values (response) in the multiple linear regression model change when the ith observation is removed.


T/F: If the residuals are not normally distributed, then we can model instead the transformed response variable where the common transformation for normality is the Box-Cox transformation.


T/F: Influential points in multiple linear regression are outliers.


T/F: Multicollinearity can lead to less accurate statistical significance of some of the regression coefficients.


T/F: Multicollinearity in the predicting variables will impact the standard deviations of the estimated coefficients.


T/F: The presence of certain types of outliers can impact the statistical significance of some of the regression coefficients.


T/F: We can use a t-test to test for the statistical significance of a coefficient given all predicting variables in a multiple linear regression model.


T/F: We could diagnose the normality assumption using the normal probability plot.


T/F: When making a prediction for predicting variables on the "edge" of the space of predicting variables, then its uncertainty level is high.


T/F; The estimator of the mean response is unbiased.


If one confidence interval in the pairwise comparison includes only positive values, we conclude that the difference in means is statistically significantly positive.


If one confidence interval in the pairwise comparison includes zero under ANOVA, we conclude that the two corresponding means are plausibly equal.


If response variable Y has a quadratic relationship with a predictor variable X, it is possible to model the relationship using multiple linear regression.


If the constant variance assumption does not hold, we transform the response variable.


If the constant variance assumption in ANOVA does not hold, the inference on the equality of the means will not be reliable.


The fitted values are defined as

The regression line with parameters replaced with the estimated regression coefficients.

The fitted values are defined as:

The regression line with parameters replaced with the estimated regression coefficients.

The total sum of squares divided by N-1 is

The sample variance estimator assuming equal means and equal variances

The pooled variance estimator is:

The sample variance estimator assuming equal variances.

The variability in the prediction comes from

The variability due to a new measurement and due to estimation.

The variability in the prediction comes from:

The variability due to a new measurement and due to estimation.

The mean squared errors (MSE) measures:

The within-treatment variability.

If the linearity assumption with respect to one or more predictors does not hold, then we use transformations of the corresponding predictors to improve on this assumption.


If the normality assumption does not hold, we transform the response variable, commonly using the Box-Cox transformation.


If we reject the test of equal means, we conclude that some treatment means are not equal.


In a Poisson regression model, we use a chi-squared test to test the overall regression.


In a multiple regression model with 7 predicting variables, the sampling distribution of the estimated variance of the error terms is a chi-squared distribution with n-8 degrees of freedom.


In a simple linear regression model, the variable of interest is the response variable.


You were hired to consult on a study for the attendance behavior of high school students at two different schools. The data set you were given contains for each 316 students: the number of days he/she was absent in an academic year (daysabs), his/her math scores (math), his/her language arts scores (langarts), and whether the student is male or female (1 = male, 0 = female). A Poisson regression model was fitted to evaluate the relationship between the number of days of absence in an academic year and all the predictors. The R output for the model summary is as follows: Coefficient Estimate SE z value Pr(>|z|) (Intercept) 2.687666 0.072651 36.994 <2e-16 math -0.003523 0.001821 -1.934 0.0531 langarts -0.012152 0.001835 -6.623 3.52e-11 male -0.400921 0.048412 -8.281 <2e-16 Also, assume the average language arts scores (across all students) is 50, and the average math scores (across all students) is 45.5. How many regression coefficients including the intercept are statistically significant at the significance level 0.05? All Three Two None

Three As the summary output above shows, the coefficients associated to the intercept, langarts and male are statistically significant at α = 0.05. Their associated p-values (<2e-16, 3.52e-11, <2e-16) are smaller than 0.05

The objective of the residual analysis is

To evaluate departures from the model assumptions

The objective of the pairwise comparison is

To identify the statistically significantly different means.

1. The means of the k populations 2. The sample means of the k populations 3. The sample means of the k samples are NOT all the model parameters in ANOVA


A high Cook's distance for a particular observation suggests that the observation could be an influential point.


A negative value of LaTeX: \beta_1 β 1 is consistent with an inverse relationship between LaTeX: x x and LaTeX: y y .


A negative value of β 1 is consistent with an inverse relationship between x and y .


A no-intercept model with one qualitative predicting variable with 3 levels will use 3 dummy variables.


An example of a multiple regression model is Analysis of Variance (ANOVA).


Analysis of Variance (ANOVA) is an example of a multiple regression model.


Assuming the model is a good fit, the residuals in simple linear regression have constant variance.


Before making statistical inference on regression coefficients, estimation of the variance of the error terms is necessary.


For a given predicting variable, the estimated coefficient of regression associated with it will likely be different in a model with other predicting variables or in the model with only the predicting variable alone.


For a linearly dependent set of predictor variables, we should not estimate a multiple linear regression model.


For assessing the normality assumption of the ANOVA model, we can use the quantile-quantile normal plot and the historgram of the residuals.


In case of multiple linear regression, controlling variables are used to control for sample bias.


The normality assumption states that the response variable is normally distributed. (T/F)

True. See 1.8 Diagnostics "Normality assumption: the error terms are normally distributed." The response may or may not be normally distributed, but the error terms are assumed to be normally distributed.

The mean sum of squared errors in ANOVA measures variability within groups. (T/F)

True. See 2.4 Test for Equal Means. MSE = within-group variability

Cook's distance (Di) measures how much the fitted values in a multiple linear regression model change when the ith observation is removed. (T/F)

True. See Lesson 3.11: Assumptions and Diagnostics "This is the distance between the fitted values of the model with all the observations versus the fitted values of the model discarding the i-th observation from the data used to fit the model. "

The presence of certain types of outliers, such as influential points, can impact the statistical significance of some of the regression coefficients. (T/F)

True. See Lesson 3.11: Assumptions and diagnostics Outliers that are influential can impact the statistical significance of the beta parameters.

It is good practice to create a multiple linear regression model using a linearly dependent set of predictor variables. (T/F)

True. See Lesson 3.13: Model Evaluation and Multicollinearity It is good practice to create a multiple linear regression model using a linearly independent set of predicting variables. "XTX is not invertible if the columns of X are linearly dependent, i.e. one predicting variable, corresponding to one column, is a linear combination of the others."

An example of a multiple linear regression model is Analysis of Variance (ANOVA). (T/F)

True. See Lesson 3.2 Basic Concepts "Earlier, we contrasted the simple linear regression model with the ANOVA model... Multiple linear regression is a generalization of both models."

If the residuals are not normally distributed, we can model the transformed response variable instead, where a common transformation for normality is the Box-Cox transformation. (T/F)

True. See Lesson 3.3.11: Assumptions and Diagnostics If the normality assumption does not hold, we can use a transformation that normalizes the response variable such as Box-Cox transformation.

A linear regression model has high explanatory power if the coefficient of determination is close to 1. (T/F)

True. See Lesson 3.3.13: Model Evaluation and Multicollinearity If R2 is close to 1, almost all of the variability in Y can be explained by the linear regression model; hence, the model has high explanatory power.

In the case of multiple linear regression, controlling variables are used to control for sample bias. (T/F)

True. See Lesson 3.4: Model Interpretation "Controlling variables can be used to control for bias selection in a sample."

For a given predicting variable, the corresponding estimated regression coefficient will likely be different in a conditional model versus a marginal model. (T/F)

True. See Lesson 3.4: Model Interpretation "Importantly, the estimated regression coefficients for the conditional and marginal relationships can be different, not only in magnitude but also in sign or direction of the relationship."

In multiple linear regression, the estimated regression coefficient corresponding to a quantitative predicting variable is interpreted as the estimated expected change in the response variable when there is a change of one unit in the corresponding predicting variable holding all other predictors fixed. (T/F)

True. See Lesson 3.4: Model Interpretation "The estimated value for one of the regression coefficient βi represents the estimated expected change in y associated with one unit of change in the corresponding predicting variable, Xi, holding all else in the model fixed."

A partial F-Test can be used to test whether the regression coefficients associated with a subset of the predicting variables in a multiple linear regression model are all equal to zero. (T/F)

True. See Lesson 3.7: Testing for Subsets of Regression Parameters We use the Partial F-test to test the null hypothesis that the regression coefficients associated to a subset of the predicting variables are all equal to zero. The alternative hypothesis is that at least one of these regression coefficients is not zero.

The estimators for the regression coefficients are:

Unbiased regardless of the distribution of the data.

How do we interpret the VIF?

VIF measures the proportional increase in the variance of beta hat j compared to what it would have been if the predictive variables had been completely uncorrelated

To use the estimated residuals for assessing model assumptions, what do we need to do first?

We need to standardize them

What can we do if the Normality or Constant Variance assumption does not hold?

We would transform the response variable. A common transformation is the Box-Cox transformation.

We detect departure from the assumption of constant variance

When the residuals vs fitted values are larger in the ends but smaller in the middle.

Do the error terms have constant variance?


Will R^2 always increase when we add predicting variables?


A linear regression model was fitted to estimate the response variable Height for black cherry trees using just the Diameter. The data frame has 31 observations. Here is the model summary, with some parts missing. Coefficients: Estimate Std. Error t-value Pr(>|t|) (Intercept) 62.0313 A 14.152 1.49e-14 *** Diameter 1.0544 0.3222 3.272 0.00276 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 5.538 on B degrees of freedom Multiple R-squared: 0.2697, Adjusted R-squared: 0.2445 What is the MSE for this model? a. 30.669 b. 11.201 c. 20.534 d. None of the above

a. 30.669 MSE is the square of the Residual standard error. MSE = 5.5382 = 30.669

Adjusted R^2

adjusted for the number of predictive variables. So it's not going to increase as we add more predictive variables

A linear regression model was fitted to estimate the response variable Height for black cherry trees using just the Diameter. The data frame has 31 observations. Here is the model summary, with some parts missing. Coefficients: Estimate Std. Error t-value Pr(>|t|) (Intercept) 62.0313 A 14.152 1.49e-14 *** Diameter 1.0544 0.3222 3.272 0.00276 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 5.538 on B degrees of freedom Multiple R-squared: 0.2697, Adjusted R-squared: 0.2445 What is the value of the correlation coefficient between Height and Diameter? a. 0.2697 b. 0.5193 c. 0.3222 d. None of the above

b. 0.5193 In simple linear regression, the correlation coefficient between the response and predictor variables, 𝜌 = √𝑅2. Since, 𝑅 2 = 0.2697, then 𝜌 = √0.2697 = 0.5193.

In a study of cheddar cheese from the LaTrobe Valley of Victoria, Australia, samples of cheese were analyzed for their chemical composition and were subjected to taste tests. Overall taste scores were obtained by combining the scores from several tasters. The data frame has 30 observations and the following variables: taste - a subjective taste score Acetic - concentration of acetic acid (log scale) H2S - concentration of hydrogen sulfide (log scale) Lactic - concentration of lactic acid Using the following R output from a fitted multiple linear regression model, answer the following multiple-choice questions. Call: lm(formula = taste ~., data = chedder) Residuals: Min 1Q Median 3Q Max -17.390 -6.612 -1.009 4.908 25.449 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -28.8768 19.7354 -1.463 0.15540 Acetic 0.3277 4.4598 0.073 0.94198 H2S 3.9118 1.2484 3.133 0.00425 ** Lactic 19.6705 8.6291 2.280 0.03108 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' Residual standard error: 10.13 on 26 degrees of freedom Multiple R-squared: 0.6518, Adjusted R-squared: 0.6116 F-statistic: 16.22 on 3 and 26 DF, p-value: 3.81e-06 Calculate the sum of squared errors (SSE) from the given R output. Select the choice that most closely approximates your calculation. a. 102.617 b. 2668.039 c. 2533.081 d. 2786.025

b. 2668.039 MSE = SSE/(n−p−1) = SSE/DF. Hence, SSE =MSE*DF = 10.132* (30-3-1) = 2668.039

A linear regression model was fitted to estimate the response variable Height for black cherry trees using just the Diameter. The data frame has 31 observations. Here is the model summary, with some parts missing. Coefficients: Estimate Std. Error t-value Pr(>|t|) (Intercept) 62.0313 A 14.152 1.49e-14 *** Diameter 1.0544 0.3222 3.272 0.00276 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 5.538 on B degrees of freedom Multiple R-squared: 0.2697, Adjusted R-squared: 0.2445 What is the value of A (standard error for the estimated intercept)? a. 877.9 b. 4.383 c. 0.2281 d. None of the above

b. 4.383 Since t-value = (estimated intercept - 0)/estimated std, we have, estimated std = estimated intercept/tvalue = 62.0313/14.152 = 4.383

Which of the following is not an application of regression? a. Testing hypotheses b. Proving causation c. Predicting outcomes d. Modeling data

b. Proving causation

In a study of cheddar cheese from the LaTrobe Valley of Victoria, Australia, samples of cheese were analyzed for their chemical composition and were subjected to taste tests. Overall taste scores were obtained by combining the scores from several tasters. The data frame has 30 observations and the following variables: taste - a subjective taste score Acetic - concentration of acetic acid (log scale) H2S - concentration of hydrogen sulfide (log scale) Lactic - concentration of lactic acid Using the following R output from a fitted multiple linear regression model, answer the following multiple-choice questions. Call: lm(formula = taste ~., data = chedder) Residuals: Min 1Q Median 3Q Max -17.390 -6.612 -1.009 4.908 25.449 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -28.8768 19.7354 -1.463 0.15540 Acetic 0.3277 4.4598 0.073 0.94198 H2S 3.9118 1.2484 3.133 0.00425 ** Lactic 19.6705 8.6291 2.280 0.03108 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' Residual standard error: 10.13 on 26 degrees of freedom Multiple R-squared: 0.6518, Adjusted R-squared: 0.6116 F-statistic: 16.22 on 3 and 26 DF, p-value: 3.81e-06 Calculate the sum of squares total (SST) from the given R output. Select the choice that most closely approximates your calculation. a. 4994.48 b. 3147.54 c. 7662.38 d. 8655.21

c. 7662.38 Since, R2 = 1 - SSE/SST, we have, SST = SSE /(1-R 2 ) = 2668.039/(1-0.6518) = 7662.38

You have measured the systolic blood pressure of a random sample of 50 employees of a company, and have fitted a linear regression model to estimate the response variable systolic blood pressure using the sex of the employees. The 95% confidence interval for the mean systolic blood pressure for the female employees is computed to be (122, 138). Which of the following statements gives a valid frequentist interpretation of this interval? a. 95% of the sample of female employees has a systolic blood pressure between 122 and 138. b. 95 % of the employees in the company have a systolic blood pressure between 122 and 138. c. If the sampling procedure were repeated 100 times, then approximately 95 of the resulting 100 confidence intervals would contain the true mean systolic blood pressure for all female employees of the company. d. We are 95% confident the sample mean is between 122 and 138

c. If the sampling procedure were repeated 100 times, then approximately 95 of the resulting 100 confidence intervals would contain the true mean systolic blood pressure for all female employees of the company.

In a study of cheddar cheese from the LaTrobe Valley of Victoria, Australia, samples of cheese were analyzed for their chemical composition and were subjected to taste tests. Overall taste scores were obtained by combining the scores from several tasters. The data frame has 30 observations and the following variables: taste - a subjective taste score Acetic - concentration of acetic acid (log scale) H2S - concentration of hydrogen sulfide (log scale) Lactic - concentration of lactic acid Using the following R output from a fitted multiple linear regression model, answer the following multiple-choice questions. Call: lm(formula = taste ~., data = chedder) Residuals: Min 1Q Median 3Q Max -17.390 -6.612 -1.009 4.908 25.449 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -28.8768 19.7354 -1.463 0.15540 Acetic 0.3277 4.4598 0.073 0.94198 H2S 3.9118 1.2484 3.133 0.00425 ** Lactic 19.6705 8.6291 2.280 0.03108 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' Residual standard error: 10.13 on 26 degrees of freedom Multiple R-squared: 0.6518, Adjusted R-squared: 0.6116 F-statistic: 16.22 on 3 and 26 DF, p-value: 3.81e-06 Given the R output, an increase in the concentration of lactic acid by one unit results in a(n) ___________ in the given taste score by ___________ points, holding all other variables constant. a. Decrease, 19.6705 b. Increase, 8.6291 c. Increase, 19.6705 d. Decrease, 8.6291

c. Increase, 19.6705 The estimated coefficient for Lactic is 19.6705. If we fix all other predictors, for each 1 unit increase in Lactic, the given test score increases 19.6705 points.

In ANOVA, for which of the following purposes is the Tukey method used? a. Test for homogeneity of variance b. Test for normality c. Test for differences in pairwise means d. Test for independence of errors

c. Test for differences in pairwise means

A linear regression model was fitted to estimate the response variable Height for black cherry trees using just the Diameter. The data frame has 31 observations. Here is the model summary, with some parts missing. Coefficients: Estimate Std. Error t-value Pr(>|t|) (Intercept) 62.0313 A 14.152 1.49e-14 *** Diameter 1.0544 0.3222 3.272 0.00276 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 5.538 on B degrees of freedom Multiple R-squared: 0.2697, Adjusted R-squared: 0.2445 What is the value of B (degrees of freedom of the estimated error variance)? a. 32 b. 31 c. 30 d. 29

d. 29 The degrees of freedom of the estimated error variance are calculated as df = n-k - 1 = 31 - 1 - 1 = 29

If the confidence interval for a regression coefficient contains the value zero, we interpret that the regression coefficient is definitely equal to zero.


If the non-constant variance assumption does not hold in multiple linear regression, we apply a transformation to the predicting variables.


In ANOVA, the number of degrees of freedom of the chi-squared distribution for the variance estimator is N-k-1 where k is the number of groups.


In logistic regression, R^2 could be used as a measure of explained variation in the response variable.


In simple linear regression, the confidence interval of the response increases as the distance between the predictor value and the mean value of the predictors decreases.


The F-test can be used to test for the overall regression in Poisson regression.


The interpretation of the regression coefficients is the same for both Logistic and Poisson regression.


The only assumptions for a simple linear regression model are linearity, constant variance, and normality.


Trying all three link functions for a logistic regression model (C-ln-ln, probit, logit) will produce models with the same goodness of fit for a dataset.


We cannot estimate a multiple linear regression model if the predicting variables are linearly independent.


We do not need to assume independence between data points for making inference on the regression coefficients.


If a logistic regression model provides accurate classification, then we can conclude that it is a good fit for the data.

false "Goodness of fit doesn't guarantee good prediction." And conversely, good prediction doesn't guarantee the model is a good fit.

In a multiple linear regression model with n observations, all observations with Cook's distance greater than 4/n should always be discarded from the model.

false An observation should not be discarded just because it is found to be an outlier. We must investigate the nature of the outlier before deciding to discard it.

In a simple linear regression model, any outlier has a significant influence on the estimated slope parameter.

false An outlier does not necessarily have a large influence on model parameters. When it does, we call it an influential point.

Mallow's Cp statistic penalizes for complexity of the model more than both leave-one-out CV and Bayesian information criterion (BIC).

false BIC penalizes complexity more than the other approaches.

Backward stepwise regression is computationally preferable over forward stepwise regression.

false Backward stepwise regression is more computational expensive than forward stepwise regression and generally selects a larger model.

Complex models with many predictors are often extremely biased, but have low variance.

false Complex models with many predictors have often low bias but high variance.

You obtained a statistically significant F-statistic when testing for equal means across four groups. The number of unique pairwise comparisons that could be perfomed is seven

false For k=4 treatments, there are 𝑘(𝑘−1)/2 = 4(4−1)/2 = 6 unique pairs of treatments. The number of unique pairwise comparisons that could be perfomed is six.

Consider a multiple linear regression model with intercept. If two predicting variables are categorical and each variable has three categories, then we need to include five dummy variables in the model

false In a multiple linear regression model with intercept, if two predicting variables are categorical and both have k=3 categories, then we need to include 2*(k-1) = 2*(3-1) = 4 dummy variables in the model.

In a simple linear regression model, given a significance level α , the ( 1 − α ) % confidence interval for the mean response should be wider than the ( 1 − α ) % prediction interval for a new response at the predictor's value x∗ .

false In a simple linear regression model, given a significance level α, the (1−α)100% confidence interval for the mean response should be narrower than the (1−α)100% prediction interval for a new response at the predictor's value x* .

With k-fold cross validation larger k values increase bias and reduce variance.

false Larger values of k decrease bias and increase variance.

Stepwise regression is a greedy algorithm searching through all possible combinations of the predicting variables to find the model with the best score.

false Not all possible combinations are checked.

In a multiple linear regression model, when more predictors are added, R^2 can decrease if the added predictors are unrelated to the response variable.

false R^2 never decreases as more predictors are added to a multiple linear regression model.

Ridge regression is a regularized regression approach that can be used for variable selection.

false Ridge regression is a regularized regression approach but does not perform variable selection.

In simple linear regression models, we lose three degrees of freedom when estimating the variance because of the estimation of the three model parameters β 0 , β 1 , σ 2.

false See 1.2 Estimation Method "The estimator for σ 2 is σ ^ 2, and is the sum of the squared residuals, divided by n - 2."

The sampling distribution for the variance estimator in simple linear regression is χ 2 (chi-squared) regardless of the assumptions of the data.

false See 1.2 Estimation Method "The sampling distribution of the estimator of the variance is chi-squared, with n - 2 degrees of freedom (more on this in a moment). This is under the assumption of normality of the error terms."

We assess the constant variance assumption by plotting the error terms, ϵ i, against fitted values.

false See 1.2 Estimation Method "We use ϵ ^ i as proxies for the deviances or the error terms. We don't have the deviances because we don't have β 0 and β 1.

The simple linear regression coefficient, β ^ 0, is used to measure the linear relationship between the predicting and response variables.

false See 1.2 Estimation Method β ^ 0 is the intercept and does not tell us about the relationship between the predicting and response variables.

The p-value is a measure of the probability of rejecting the null hypothesis.

false See 1.5 Statistical Inference Data Example "p-value is a measure of how rejectable the null hypothesis is... It's not the probability of rejecting the null hypothesis, nor is it the probability that the null hypothesis is true."

The normality assumption states that the response variable is normally distributed.

false See 1.8 Diagnostics "Normality assumption: the error terms are normally distributed." The response may or may not be normally distributed, but the error terms are assumed to be normally distributed.

With the Box-Cox transformation, when λ = 0 we do not transform the response.

false See 1.8 Diagnostics When λ = 0, we transform using the normal log.

In ANOVA, the linearity assumption is assessed using a plot of the response against the predicting variable.

false See 2.2. Estimation Method Linearity is not an assumption of ANOVA.

For a multiple linear regression model to be a good fit, we need the linearity assumption to hold for only one of the predicting variables.

false See Lesson 3.11: Assumptions and diagnostics In multiple linear regression, we need the linearity assumption to hold for all of the predicting variables, for the model to be a good fit. "For example, if the linearity does not hold with one or more predicting variables, then we could transform the predicting variables to improve the linearity assumption."

In multiple linear regression, a VIF value of 6 for a predictor means that 90% of the variation in that predictor can be modeled by the other predictors.

false See Lesson 3.13: Model Evaluation and Multicollinearity A VIF value of 6 for a predictor means that 83.3% of the variation in that predictor can be modeled by the other predictors in the model.

It is good practice to create a multiple linear regression model using a linearly dependent set of predictor variables.

false See Lesson 3.13: Model Evaluation and Multicollinearity It is good practice to create a multiple linear regression model using a linearly independent set of predicting variables. "XTX is not invertible if the columns of X are linearly dependent, i.e. one predicting variable, corresponding to one column, is a linear combination of the others."

Multicollinearity in multiple linear regression means that the rows in the design matrix are (nearly) linearly dependent.

false See Lesson 3.13: Model Evaluation and Multicollinearity Multicollinearity in multiple linear regression means that the columns in the design matrix are (nearly) linearly dependent.

Multicollinearity among the predicting variables will not impact the standard errors of the estimated regression coefficients.

false See Lesson 3.13: Multicollinearity Multicollinearity in the predicting variables can impact the standard errors of the estimated coefficients. "However, the bigger problem is that the standard errors will be artificially large."

In multiple linear regression, the prediction of the response variable and the estimation of the mean response have the same interpretation.

false See Lesson 3.2.9: Regression Line and Predicting a New Response. In multiple linear regression, the prediction of the response variable and the estimation of the mean response do not have the same interpretation.

A multiple linear regression model contains 6 quantitative predicting variables and an intercept. The number of parameters to estimate in this model is 7.

false See Lesson 3.2: Basic Concepts The number of parameters to estimate in a multiple linear regression model containing 6 quantitative predicting variables and an intercept is 8: 7 regression coefficients (β0,β1,...,β6) and the variance of the error terms (σ2).

The estimated variance of the error terms of a multiple linear regression model with intercept can be obtained by summing up the squared residuals and dividing the sum by n - p , where n is the sample size and p is the number of predictors.

false See Lesson 3.3: Regression Parameter Estimation The estimated variance of the error terms of a multiple linear regression model with intercept should be obtained by summing up the squared residuals and dividing that by n-p-1, where n is the sample size and p is the number of predictors as we lose p+1 degrees of freedom when we estimate the p coefficients and 1 intercept.

The causation of a predicting variable to the response variable can be captured using multiple linear regression on observational data, conditional of other predicting variables in the model.

false See Lesson 3.4 Model Interpretation "This is particularly prevalent in a context of making causal statements when the setup of the regression does not allow so. Causality statements can only be made in a controlled environment such as randomized trials or experiments. "

Conducting t-tests on each β parameter in a multiple linear regression model is the preferable to an F-test when testing the overall significance of the model.

false See Lesson 3.7: Testing for Subsets of Coefficients "We cannot and should not select the combination of predicting variables that most explains the variability in the response based on the t-tests for statistical significance because the statistical significance depends on what other variables are in the model."

In a multiple linear regression model, the adjusted R^2 measures the goodness of fit of the model

false The adjusted R^2 is not a measure of Goodness of fit. R^2 and adjusted R^2 measures the ability of the model and the predictor variable to explain the variation in response variable. Goodness of Fit refers to having all model assumptions satisfied.

In ANOVA, when testing for equal means across groups, the alternative hypothesis is that the means are not equal between two groups for all pairs of means/groups.

false The alternative is that at least one pair of groups have unequal means

Under Poisson regression, the sampling distribution used for a coefficient estimator is a chi-squared distribution when the sample size is large.

false The coefficient estimator follows an approximate normal distribution.

In regularized regression, the penalization is generally applied to all regression coefficients (β0, ... ,βp), where p = number of predictors.

false The shrinkage penalty is applied to β1, . . . , βp, but not to the intercept β0.

The number of parameters that need to be estimated in a logistic regression model with 5 predicting variables and an intercept is the same as the number of parameters that need to be estimated in a standard linear regression model with an intercept and same predicting variables.

false There are no error terms in logistic regression, so we only have parameters for the 6 coefficients in the model. With linear regression, we have parameters for the 6 coefficients in the model as well as the variance of the error terms.

Variable selection is a simple and completely solved statistical problem since we can implement it using the R statistical software.

false Variable selection for a large number of predicting variables is an "unsolved" problem, and variable selection approaches should be tailored to the problem at hand.

It is good practice to perform a goodness-of-fit test on logistic regression models without replications.

false We can only define residuals for binary data with replications, and residuals are needed for a goodness-of-fit test.

When testing a subset of coefficients, deviance follows a chi-square distribution with q degrees of freedom, where q is the number of regression coefficients in the reduced model.

false q is difference between the number of regression coefficients in the full model and the reduced model.

When the number of predicting variables is large, both backward and forward stepwise regressions will always select the same set of variables.

false Backward and forward stepwise regressions will not always select the same set of variables.

It is good practice to perform variable selection based on the statistical significance of the regression coefficients.

false It is not good practice to perform variable selection based on the statistical significance of the regression coefficients.

A logistic regression model has the same four model assumptions as a multiple linear regression model.

false The assumptions of a logistic regression model are: 1. Linearity Assumption: There is a linear relationship between the link function and the predictors 2. Independence assumption: The response variables are independent random variables 3. The link function is the logit function

For Generalized Linear Models, including Poisson regression, the deviance residuals should approximately follow the standard normal distribution if the model is a good fit for the data.

false The deviance residuals are approximately N(0,1) if the model is a good fit.

To test if a coefficient is less than a critical value, C, we conduct a one-sided test on the _________ tail of a ___________ distribution. left, normal left, t right, normal right, t None of the above

left, t See 1.4 Statistical Inference "For β 1 greater than zero we're interested on the right tail of the distribution of the β ^ 1."

The F-test is a _________ tailed test with ______ and ______ degrees of freedom. one, k, N-1 one, k-1, N-k two, k-1, N-k two, k, N-1 None of the above.

one, k-1, N-k See 2.4 Test for Equal Means The F-test is a one tailed test that has two degrees of freedom, namely k − 1 and N − k.

A study was conducted to measure the effect of a fungicide treatment on the survival rate of botrytis blight. Botrytis blight samples were divided into 20 groups, each consisting of about 100 samples and exposed to different levels of chemicals in a fungicide. The output of a logistic regression model is below, where concS represents the concentration of a sulfur in the fungicide and concCu represents the concentration of a copper in the fungicide. Use it to answer the following multiple-choice questions. Call: glm(formula = cbind(Survived, Died) ~ concS + concCu,family = "binomial", data = data) Deviance Residuals: Min 1Q Median 3Q Max -9.5366 -2.4594 0.1223 3.9710 6.3566 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.58770 0.22958 15.63 <2e-16 *** concS -4.32735 0.26518 16.32 <2e-16 *** concCu -0.27483 0.01784 15.40 <2e-16 *** Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 718.76 on 19 degrees of freedom Residual deviance: 299.43 on 17 degrees of freedom AIC: 363.53 The p-value for a goodness-of-fit test using the deviance residuals for the regression can be obtained from which of the following? pchisq(419.33,2, lower.tail =FALSE) pchisq(363.53,3, lower.tail =FALSE) pchisq(299.43,17, lower.tail =FALSE) pchisq(718.76,19, lower.tail =FALSE)

pchisq(299.43,17, lower.tail =FALSE) The goodness of fit test uses the residual deviance (299.43) and corresponding degrees of freedom (17) as the test statistic for the chi-squared test.

Do we evaluate normality using residuals or the response variable?


The sampling distribution of β ^ 0 is a t-distribution chi-squared distribution normal distribution None of the above

t-distribution See 1.4 Statistical Inference The distribution of β 0 is normal. Since we are using a sample and not the full population, the sampling distribution of β ^ 0 is the t-distribution.

What is the F-test for?

test for overall regression

Multicollinearity inflates...?

the standard error of the estimated coefficients

A Poisson regression model fit to a dataset with a small sample size will have a hypothesis testing procedure with more Type I errors than expected.


A logistic regression model may not be a good fit if the responses are correlated or if there is heterogeneity in the success that hasn't been modeled.


Akaike Information Criterion (AIC) is an estimate for the prediction risk.


An overdispersion parameter close to 1 indicates that the variability of the response is close to the variability estimated by the model.


Assuming the model is a good fit, the residuals in simple linear regression have constant variance.


Elastic net regression uses both penalties of ridge and lasso regression and hence combines the benefits of both.


If a Poisson regression model does not have a good fit, the relationship between the log of the expected rate and the predicting variables might be not linear.


If a predicting variable is categorical with 5 categories in a linear regression model without intercept, we will include 5 dummy variables in the model.


If one confidence interval in the pairwise comparison includes zero under ANOVA, we conclude that the two corresponding means are plausibly equal.


If there are specific variables that are required to control the bias selection in the model, they should be forced into the model and not be part of the variable selection process.


In ANOVA, to test the null hypothesis of equal means across groups, the variance of the response variable must be the same across all groups.


In a multiple linear regression model, the R^2 measures the proportion of total variability in the response variable that is captured by the regression model.


In a simple linear regression model, we can assess if the residuals are correlated by plotting them against fitted values.


In a simple linear regression model, we need the normality assumption to hold for deriving a reliable prediction interval for a new response.


In ridge regression, when the penalty constant lambda (λ) equals zero, the corresponding ridge coefficient estimates are the same as the ordinary least squares estimates.


Multicollinearity in multiple linear regression means that the columns in the design matrix are (nearly) linearly dependent.


Ridge regression can be used to deal with problems caused by high correlation among the predictors.


The L1 penalty measures the sparsity of a vector and forces regression coefficients to be zero.


The estimators of the error term variance and of the regression coefficients are random variables.


The larger the coefficient of determination or R-squared, the higher the variability explained by the simple linear regression model.


The lasso regression requires a numerical algorithm to minimize the penalized sum of least squares.


The one-way ANOVA is a linear regression model with one qualitative predicting variable.


The penalty constant lambda (λ) in penalized regression controls the trade-off between lack of fit and model complexity.


We can assess the assumption of constant-variance in multiple linear regression by plotting the standardized residuals against fitted values.


We estimate the regression coefficients in Poisson regression using the maximum likelihood estimation approach.


A binary response variable with replications in logistic regression has a Binomial distribution.

true A binary response variable with replications does follow a Binomial distribution.

Suppose that we have a multiple linear regression model with k quantitative predictors, a qualitative predictor with l categories and an intercept. Consider the estimated variance of error terms based on n observations. The estimator should follow a chi-square distribution with n − k − l degrees of freedom.

true For this example, we use k + l df to estimate the following parameters: k regression coefficients associated to the k quantitative predictors, ( l − 1 ) regression coefficients associated to the ( l − 1 ) dummy variables and 1 regression coefficient associated to the intercept. This leaves n − k − l degrees of freedom for the estimation of the error variance.

When conducting ANOVA, the larger the between-group variability is relative to the within-group variability, the larger the value of the F-statistic will tend to be.

true Given the formula of the F-statistic a larger increase in the numerator (between-group variability) compared to the denominator will result in a larger F-statistic ; hence, the larger MSSTr is relative to MSE, the larger the value of F-stat.

In a simple linear regression model, given a significance level α , if the ( 1 − α ) % confidence interval for a regression coefficient does not include zero, we conclude that the coefficient is statistically significant at the α level.

true In a simple linear regression model, given a significance level α , if the ( 1 − α ) % confidence interval for a regression coefficient does not include zero, we conclude that the coefficient is statistically significant at the α level.

It is required to standardize or rescale the predicting variables when performing regularized regression.

true Regularized regression requires standardization or scaling of the predicting variables.

A negative value of β 1 is consistent with an inverse relationship between the predictor variable and the response variable.

true See 1.2 Estimation Method "A negative value of β 1 is consistent with an inverse relationship"

The pooled variance estimator, s p o o l e d 2, in ANOVA is synonymous with the variance estimator, σ ^ 2, in simple linear regression because they both use mean squared error (MSE) for their calculations.

true See 1.2 Estimation Method for simple linear regression See 2.2 Estimation Method for ANOVA The pooled variance estimator is, in fact, the variance estimator.

Under the normality assumption, the estimator for β 1 is a linear combination of normally distributed random variables.

true See 1.4 Statistical Inference "Under the normality assumption, β 1 is thus a linear combination of normally distributed random variables... β ^ 0 is also linear combination of random variables"

The prediction interval of one member of the population will always be larger than the confidence interval of the mean response for all members of the population when using the same predicting values.

true See 1.7 Regression Line: Estimation & Prediction Examples "Just to wrap up the comparison, the confidence intervals under estimation are narrower than the prediction intervals becausethe prediction intervals have additional variance from the variation of a new measurement."

If the model assumptions hold, then the estimator for the variance, σ ^ 2, is a random variable.

true See 1.8 Statistical Inference We assume that the error terms are independent random variables. Therefore, the residuals are independent random variables. Since σ ^ 2 is a combination of the residuals, it is also a random variable.

An ANOVA model with a single qualitative predicting variable containing k groups will have k + 1 parameters to estimate.

true See 2.2 Estimation Method We have to estimate the means of the k groups and the pooled variance estimator, s p o o l e d 2.

The mean sum of squared errors in ANOVA measures variability within groups.

true See 2.4 Test for Equal Means MSE = within-group variability

If the constant variance assumption in ANOVA does not hold, the inference on the equality of the means will not be reliable.

true See 2.8 Data Example "This is important since without a good fit, we cannot rely on the statistical inference." Only when the model is a good fit, i.e. all model assumptions hold, can we rely on the statistical inference.

If the pairwise comparison interval between groups in an ANOVA model includes zero, we conclude that the two means are plausibly equal.

true See 2.8 Data Example If the comparison interval includes zero, then the two means are not statistically significantly different, and are thus, plausibly equal.

Cook's distance (Di) measures how much the fitted values in a multiple linear regression model change when the ith observation is removed.

true See Lesson 3.11: Assumptions and Diagnostics "This is the distance between the fitted values of the model with all the observations versus the fitted values of the model discarding the i-th observation from the data used to fit the model. "

The presence of certain types of outliers, such as influential points, can impact the statistical significance of some of the regression coefficients.

true See Lesson 3.11: Assumptions and diagnostics Outliers that are influential can impact the statistical significance of the beta parameters.

An example of a multiple linear regression model is Analysis of Variance (ANOVA).

true See Lesson 3.2 Basic Concepts "Earlier, we contrasted the simple linear regression model with the ANOVA model... Multiple linear regression is a generalization of both models."

If the residuals are not normally distributed, we can model the transformed response variable instead, where a common transformation for normality is the Box-Cox transformation.

true See Lesson 3.3.11: Assumptions and Diagnostics If the normality assumption does not hold, we can use a transformation that normalizes the response variable such as Box-Cox transformation.

linear regression model has high explanatory power if the coefficient of determination is close to 1.

true See Lesson 3.3.13: Model Evaluation and Multicollinearity If R2 is close to 1, almost all of the variability in Y can be explained by the linear regression model; hence, the model has high explanatory power.

In the case of multiple linear regression, controlling variables are used to control for sample bias.

true See Lesson 3.4: Model Interpretation "Controlling variables can be used to control for bias selection in a sample."

For a given predicting variable, the corresponding estimated regression coefficient will likely be different in a conditional model versus a marginal model.

true See Lesson 3.4: Model Interpretation "Importantly, the estimated regression coefficients for the conditional and marginal relationships can be different, not only in magnitude but also in sign or direction of the relationship."

In multiple linear regression, the estimated regression coefficient corresponding to a quantitative predicting variable is interpreted as the estimated expected change in the response variable when there is a change of one unit in the corresponding predicting variable holding all other predictors fixed.

true See Lesson 3.4: Model Interpretation "The estimated value for one of the regression coefficient βi represents the estimated expected change in y associated with one unit of change in the corresponding predicting variable, Xi, holding all else in the model fixed."

A partial F-Test can be used to test whether the regression coefficients associated with a subset of the predicting variables in a multiple linear regression model are all equal to zero.

true See Lesson 3.7: Testing for Subsets of Regression Parameters We use the Partial F-test to test the null hypothesis that the regression coefficients associated to a subset of the predicting variables are all equal to zero. The alternative hypothesis is that at least one of these regression coefficients is not zero.

Simpson's Paradox occurs when a coefficient reverses its sign when used in a marginal versus a conditional model.

true Simpson's paradox: Reversal of an association when looking at a marginal relationship versus a conditional relationship.

Generalized linear models, like logistic regression, use a Wald test to determine the statistical significance of the coefficients.

true The coefficient estimates follow an approximate normal distribution and a z-test, also known as a Wald test, is used to determine their statistical significance.

The estimated regression coefficients in Poisson regression are approximate.

true The estimated parameters and their standard errors are approximate estimates.

For logistic regression, if the p-value of the deviance test for goodness-of-fit is large, then it suggests that the model is a good fit.

true The null hypothesis is that the model fits the data. So large p-values suggests that the model is a good fit.

With Poisson regression, the variance of the response is not constant.

true V(Y|x_1,...x_p)=exp(beta_0 + beta_1 x_1 + ... + beta_p x_p)

Although there are no error terms in a logistic regression model using binary data with replications, we can still perform residual analysis.

true We can perform residual analysis on the Pearson residuals or the Deviance residuals.

The training risk is not an unbiased estimator of the prediction risk.

true The training risk is a biased estimator of the prediction risk.

When selecting variables for explanatory purpose, one might consider including predicting variables which are correlated if it would help answer your research hypothesis.

true When the objective is to explain the relationship to the response, one might consider including the predicting variables even they are correlated.

