Midterm1 T/F

Ace your homework & exams now with Quizwiz!

If one or more of the regression assumptions do not hold, the model does not fit the data well, so the predictors are not useful in estimating the response

False. The predictors may still be useful after some transformation

In a simple linear regression model, any outlier has a significant influence on the estimated slope parameter

False. No necessarily. When it does, we call it an influential point

In ANOVA, we assume the variance of the response is the same for all categories, and we estimate it using the pooled variance estimator

True

In ANOVA, we use an F-test to compare the variability between groups to the variability within groups

True

In ANOVA, to test the null hypothesis of equal means across groups, the variance of the response variable must be the same across all groups.

False

In ANOVA, we assume the variance of the response variable is different for each population

False

In testing for subsets of coefficients in a multiple linear regression, the null hypothesis we test for is that all coefficients are equal, written: H0: B1 = B2 = ... = Bk

False

Statistical inference for linear regression under normality relies on large sample size.

False

The assumption of normality is not required in linear regression to make inference on the regression coefficients.

False

The smaller the coefficient of determination or R-squared, the higher the variability explained by the simple linear regression.

False

We cannot estimate a multiple linear regression model if the predicting variables are linearly independent.

False

You obtained a statistically significant F-statistic when testing for equal means across 4 groups. The number of unique pairwise comparisons that could be performed is 7

False. For k=4 treatments, there are (k(k-1))/2 = 6 unique pairs of treatments

When creating pairwise confidence intervals to compare means in ANOVA, we should adjust for joint inference to achieve the desired confidence level. The adjusted intervals will be narrower than they would have been without the adjustment

False. The intervals will be wider

In Anova, the pooled variance estimator or MSE is the variance estimator assuming equal means.

False. The pooled variance estimator is the variance estimator assuming equal variances. We assume thatthe variance of the response variable is the same across all populations and equal to sigma square.

In multiple linear regression, the width of a confidence interval for the expected response given a set of predictors is the same regardless of the value of those predictors

False. The width depends on the values of the predictors

If the confidence interval for a regression coefficient contains the value zero, we interpret that the regression coefficient is plausibly equal to zero.

True

If the normality assumption does not hold for a regression, we may use a transformation on the response variable.

True

If the observed responses are not independent of each other, we can make incorrect conclusions about the strength of our model

True

The F-test in ANOVA compares the between variability versus the within variability.

True

The adjusted R-squared value is a good indicator of the predictive power of a model regardless of the number of predicting variables.

True

The correlation coefficient can be used to evaluate different transformations of X and Y to improve the linearity assumption in simple linear regression

True

The estimators of the variance parameter and of the regression coefficients in a regression model are random variables.

True

The prediction of the response variable has higher uncertainty than the estimation of the mean response.

True

The standard error in linear regression indicates how far the data points are from the regression line, on average.

True

We should assess the constant variance assumption in simple linear regression by plotting the residuals vs. fitted values.

True

Assumption of normality in linear regression is required for confidence intervals, prediction intervals, and hypothesis testing.

True

If a predicting variable is a categorical variable with 5 categories in a linear regression model without intercept, we will include 5 dummy variables.

True

If one confidence interval in the pairwise comparison in ANOVA includes zero, we conclude that the two corresponding means are plausibly equal.

True

A linear regression model is a good fit to the data set if the R-squared is above 0.90.

False

A nonlinear relationship between the response variable and a predicting variable cannot be modeled using regression.

False

The only assumptions for a simple linear regression model are linearity, constant variance, and normality.

False The assumptions of simple Linear Regression are Linearity, Constant Variance assumption, Independence and normality.

In a simple linear regression model, given a significance level a, if the (1-a)% confidence interval for a regression coefficient does not include zero, we conclude that the coefficient is statistically significant at the a level

False. "We use this confidence interval to answer whether B1 is statistically significant by checking whether 0 is in the confidence interval. If it is not, we conclude that Bj is statistically significant."

In a multiple linear regression model with n observations, all observations with Cook's distance greater than 4/n should always be discarded from the model.

False. An observation should not be discarded just because it is found to be an outlier. We must investigate the nature of the outlier before deciding to discard it.

Unlike the pooled variance estimator, the variance of a one-way ANOVA model requires only one mean to be estimated. Therefore, the variance estimator of a one-way ANOVA model has fewer degrees of freedom

False. As the variance estimator has fewer parameters replaced by their estimators, is has more degrees of freedom than the pooled variance estimator. Specifically, is N is the number of observations and k is the number of levels in the qualitative variable, the pooled variance has N-k dof while the estimator has N-1

Differences in means among categories of a qualitative variable are detected if within-group variability is larger than between-group variability

False. Between-group should be sufficiently larger

Suppose that we have a multiple linear regression model with k quantitative predictors, a qualitative predictor with l categories and an intercept. Consider the estimated variance of error terms based on n observations. The estimator should follow a chi-square distribution with n - k - l degrees of freedom.

False. For this example, we use k + l df to estimate the following parameters: k regression coefficients associated to the k quantitative predictors, (l — 1) regression coefficients associated to the (l —1) dummy variables and 1 regression coefficient associated to the intercept. This leaves n — k — l degrees of freedom for the estimation of the error variance.

The three objectives of linear regression are Prediction, Forecasting, and Hypothesis Testing

False. Forecasting falls under prediction. Modeling is the other one

Consider a multiple linear regression model with intercept. If two predicting variables are categorical and each variable has three categories, then we need to include five dummy variables in the model.

False. In a multiple linear regression model with intercept, if two predicting variables are categorical and both have k=3 categories, then we need to include 2*(k-1) = 2*(3-1) = 4 dummy variables in the model.

In multiple linear regression, the independence assumption and the mean zero assumption are equivalent

False. Mean zero is equivalent to linearity assumption

In simple linear regression, if there are standardized residuals less than 1, it is an indication of the presence of outliers

False. Observations with standardized residuals greater than 1 might be outliers

In a multiple linear regression model, when more predictors are added, R^2 can decrease if the added predictors are unrelated to the response variable.

False. R^2 never decreases as more predictors are added to a multiple linear regression model.

In a simple linear regression model, given a significance level a, the (1-a)% confidence interval for the mean response should be wider than the (1-a)% prediction interval for a new response at the predictors value x*

False. Should be narrower

In a multiple linear regression model, the adjusted R^2 measures the goodness of fit of the model.

False. The adjusted R^2 is not a measure of Goodness of fit. R^2 and adjusted R^2 measures the ability of the model and the predictor variable to explain the variation in response variable. Goodness of Fit refers to having all model assumptions satisfied.

In ANOVA, when testing for equal means across groups, the alternative hypothesis is that the means are not equal between two groups for all pairs of means/groups.

False. The alternative is that at least one pair of groups have unequal means.

If the confidence interval for a regression coefficient contains the value zero, we interpret that the regression coefficient is definitely equal to zero.

False. The coefficient is plausibly zero, but we cannot be certain that it is.

In ANOVA, the number of degrees of freedom of the chi-squared distribution for the variance estimator (not pooled variance estimator) is N-k-1 where k is the number of groups.

False. This variance estimator has N-1 degrees of freedom. We lose one DF because we calculate one mean and hence its N-1.

For ANOVA, assuming all modeling assumptions hold, we should use a normal distribution when constructing a confidence interval for a single estimated mean parameter because each mean parameter follows a normal distribution.

False. Variance is not always known and we haven't assumed large sample sizes, so we can't always use a normal distribution

A strong positive correlation between two variables is evidence of a cause-and-effect relationship

False. We can't determine causation from correlation alone

If one confidence interval in the pairwise comparison includes zero under ANOVA, we conclude that the two corresponding means are plausibly equal.

If the confidence interval includes zero, it is plausible that the corresponding means are equal.

In simple linear regression, the confidence interval of the response increases as the distance between the predictor value and the mean value of the predictors decreases.

The confidence interval bands increase as a predictor increases in distance from the mean of the predictors.

In a multiple linear regression model, the R^2 measures the proportion of total variability in the response variable that is captured by the regression model.

True

In a simple linear regression model, we can assess if the residuals are correlated by plotting them against the fitted values

True

In simple linear regression model, we need the normality assumption to hold for deriving a reliable prediction interval for a new response

True

You are interested in understanding the relationship between education level and IQ, with IQ as the response variable. In your model, you also include age. Age would be considered a controlling variable while the education level would be an explanatory variable.

True. Controlling variables can be used to control for bias selection in a sample. They're used as default variables to capture more meaningful relationships with respect to other explanatory or predicting factors. Explanatory variables can be used to explain variability in the response variable, in this case the education level.

When conducting ANOVA, the larger the between-group variability is relative to the within-group variability, the larger the value of the F-statistic will tend to be.

True. Given the formula of the F-statistic a larger increase in the numerator (between-group variability) compared to the denominator will result in a larger F-statistic ; hence, the larger MSSTr is relative to MSE, the larger the value of F-stat.

Assuming the model is a good fit, the residuals in simple linear regression have constant variance.

True. Goodness of fit refers to whether the model assumptions hold, one of which is constant variance.

When building a regression model using an intercept and a categorical variable with 6 levels, we cannot use 6 dummy variables due to linear dependence in the design matrix

True. If X is the design matrix, the summation of the six columns representing the dummy variables in X would equal the column representing the intercept, meaning the columns of X are linearly dependent. This prevents the coefficients from being estimated using (X X)^-1

We can assess the assumption of constant-variance in simple linear regression by plotting residuals against fitted values

True. In a residuals Vs fitted plot, if the residuals are scattered around the 0 line, it indicates that theconstant variance assumption of errors hold.

The one-way ANOVA is a linear regression model with one qualitative predicting variable.

True. One-way ANOVA is a linear regression model with one predicting factor/ categorical variable.

The larger the coefficient of determination or R-squared, the higher the variability explained by the simple linear regression model.

True. R-squared represents the proportion of total variability in Y (response) that can be explained by the regression model (that uses X). R-squared is the proportion of variability explained by the model.

The estimators of the error term variance and of the regression coefficients are random variables.

True. The estimators are ̂𝛽 = (𝑋𝑇 𝑋)−1𝑋𝑇 𝑌 and ̂𝜎2 = ̂𝜖𝑇 ̂𝜖/(𝑛−𝑝−1), where ̂𝜖 = (𝐼 −𝐻)𝑌 . These estimatorsare functions of the response, which is a random variable. Therefore they are also random.

The response variable for a linear regression model is a random variable

True. The response is random, and for the models discussed so far, it follows a normal distribution

If a predicting variable is categorical with 5 categories in a linear regression model without intercept, we will include 5 dummy variables in the model.

When we have qualitative variables with k levels, we only include k-1 dummy variables if the regression model has an intercept. If not, we will include k dummy variables.


Related study sets

Domain I: Injury Prevention and Illness Prevention and Wellness Prevention

View Set

Chapter 10 - Using Language Well

View Set

Week 10 BSN 346 Substance Misuse in Pregnancy

View Set