6414 Regression
The fitted values are defined as
The regression line with parameters replaced with estimated regression coefficient
Random sampling is computationally less expensive than the K-fold cross validation.
False.
In MLR, if the points on the QQ normal plot of the standardized residuals significantly deviate from a straight line, then we can conclude the linearity assumption does not hold.
False. Normality does not hold.
Logistic regression differs from regression in that: 1 - Sampling distribution of regression coefficients is approx 2 - Large sample is required to make accurate predictions 3 - Normal distribution is used instead of t distrib 4 - All of the above
All of the above
Assuming the model is a good fit, the residuals in simple linear regression have constant variance.
True. Goodness of fit refers to whether the model assumptions hold, one of which is constant variance.
Influential points
Outlier far from mean of both x's and y's
In ANOVA, if the pairwise comparison interval between groups does not include zero, we conclude that the two means are plausibly equal
False
In multiple linear regression, the coefficient of determination is used to evaluate goodness-of-fit
False
The mean square prediction error (MSPE) is a robust prediction accuracy measurement for an ordinary least square (OLS) model regardless of the characteristics of the dataset
False
The presence of multicollinearity in a multiple linear regression model will not impact the standard errors of the estimated regression coefficients.
False
The sub-sampling approach is not recommended for addressing the p-value problem with large samples.
False
Given SLR, the estimated reg line is different from the value of the predicted response at a specific value X*
False. The estimated regression line (or the mean response) has the same expectation as the predicted response at a specific value of X* but a smaller variance
In Anova, the pooled variance estimator or MSE is the variance estimator assuming equal means.
False. The pooled variance estimator is the variance estimator assuming equal variances. We assume that the variance of the response variable is the same across all populations and equal to sigma square.
An ANOVA model with a qualitative predicting variable containing k groups will have k+1 parameters to estimate
TRUE (k# of groups + 1 for variance)
In ANOVA, the mean sum of squared errors measures variabilitywithin groups.
TRUE The ratio between the sum of squared errors (SSE) divided by N - k is (MSE). measure of the within-group variab.
Multicollinearity can lead to misleading conclusions on the statistical significance of the regression coefficients of a multiple linear regression model.
True
Under the normality assumption, the estimated simple linear regression coefficient, β1_hat , is a linear combination of normally distributed random variables.
True
We use chi sq to test whether subset of regression coefficients are zero in Poisson
True
We use pairwise comparison in ANOVA to find the estimated means that are greater or lower than others among all pairs of k populations
True
When conducting residual analysis for a SLR, a plot of residuals against fitted values can be used to check constant variance assumption
True.
The variability in the prediction comes from
The variability due to a new measurement and due to estimation
The pooled variance estimator is
The variance estimator assuming equal variances
The objective of the residual analysis is
To evaluate goodness of fit.
The objective of the pairwise comparison is
To identify the statistically significantly different means.
A partial F-test can be used to test the null hypothesis that the regression coefficients associated with a subset of the predicting variables in a multiple linear regression model are all equal to zero
True
An example of a multiple linear regression model is Analysis of Variance (ANOVA).
True
In simple linear regression, the prediction interval of one member of the population will always be wider than the corresponding confidence interval of the mean response for all members of the population when using the same predicting value
TRUE confidence intervals are narrower than prediction intervals
What is VIF
Variance inflation factor (VIF) is a measure of the amount of multicollinearity in a set of multiple regression variables. Mathematically, the VIF for a regression model variable is equal to the ratio of the overall model variance to the variance of a model that includes only that single independent variable. This ratio is calculated for each independent variable. A high VIF indicates that the associated independent variable is highly collinear with the other variables in the model.
When to transform or not if lambda = ___ Consider y = 5
lambda -2 = 5^-2 lambda -1 = 5^-1 lambda -.5 = 1/(sqrt 5) lambda 0 = ln(5) lambda .5 = sqrt 5 lambda 1 = 5 lambda 2 = 5^2
Cook's distance (Di) measures how much the fitted values in a multiple linear regression model change when the i observation is removed
True
If the VIF for each predicting variable is smaller than a certain threshold, then we can say that there is not a problematic amount of multicollinearity in the multiple linear regression model.
True
If the assumptions of a simple linear regression model hold, then the estimator for the variance of the error terms, σ2, is a random variable.
True
What are the objectives of multiple linear regression.
prediction, modeling, testing hypothesis
The causation of a predicting variable to the response variable can be captured using multiple linear regression on observational data, conditional of other predicting variables in the model
FALSE (Causality statements can only be made in a controlled environment such as randomized trials or experiments.)
In a first-order multiple linear regression model, the estimated regression coefficient corresponding to a quantitative predicting variable is interpreted as the estimated expected change in the response variable when there is a change of one unit in the corresponding predicting variable holding all other predictors fixed
True
In a linear regression model without intercept, a qualitative variable with k categories is represented by k dummy variables
True
In linear regression, true model parameters are unknown regardless of how much data are observed
True
In logistic regression, the estimation of regression coefficients is based on MLE
True
In multiple linear regression, if the F-test statistic is greater than the appropriate F-critical value, then at least one of the slope coefficients is significantly different from zero at the given significance level.
True
In multiple linear regression, we could diagnose the normality assumption by using the normal probability plot of the standardized residuals.
True
The objective of multiple linear regression is
-Predict future new responses -Model associates of explanatory variables to a response variable accounting for controlling factors -Test hypothesis using stat inference
Total number of unique pairwise mean comparisons among 8 group means is:
28 k * (k-1)/2 8 * 7/2 = 28
ON a dataset with N=486 and 5 categories. What DF of the F test without intercept?
4, 481 F test in ANOVA has DF such as (k-1, n-k) In this case k=5 and N=486
In MLR, what are some indications of multicollinearity? A - Overall F statistic is significant but t stat is not B - Individual t stat is significant but f stat is not C - Standard error of estimate coef's is artificially small D - Estimated coefficients change dramatically when value of one predictor changes slightly
A & D
If a multiple linear regression model contains 10 quantitative predictors and an intercept, then error term variance estimator follows a (chi-squared) distribution with n-10 degrees of freedom, where n is the sample size.
FALSE Intercept should be accounted for. Thus: (n-p-1)
Regression Assumptions: Normally distributed
Can be assessed with normality probability plot Transformation possible? Yes on response variable.
Equal variances
Can be assessed with residuals vs. fitted plot Transformation possible? Yes on response variable
Assuming that the residuals are normally distributed, the estimated variance of the error terms has the following sampling distribution under the simple linear model
Chi sq. N-2
Which is not objective of regression? Testing hypothesis, Extrapolation, Prediction, Modeling
Extrapolation
In multiple linear regression, the uncertainty of the prediction of a new response comes only from the newness of the observation.
FALSE (Parameter estimates & Newness)
In ANOVA, the linearity assumption is assessed using a QQ-plot of the residuals
FALSE (QQ plot measures normality, not linearity. Also, an assumption of ANOVA is not linearity)
In multiple linear regression, the adjusted R can be used to compare models, and its value will always be greater than or equal to that of R squared
FALSE (less than or equal to)
If a multiple linear regression model contains 5 quantitative predicting variables and an intercept, then the number of parameters to estimate is 6
FALSE (β0, βd's, variance)
The estimated variance of the error terms of a multiple linear regression model with intercept can be obtained by summing up the squared residuals and dividing that sum by n-p , where n is the sample size and p is the number of predictors
FALSE Intercept should be accounted for. Thus: (n-p-1)
In SLR models, we lose 3 DF when estimating the variance of the error terms because of the estimation of the 3 model parameters - β0, β1, σ2
FALSE We lose 2 DF because σ2 uses estimates for β0, β1
In simple linear regression, the normality assumption states that the response variable is normally distributed
FALSE the normality assumption is related to the error terms
The p-value of a hypothesis test is interpreted as the probability of rejecting the null hypothesis
FALSE the p value is a measure of how rejectable the null hypothesis is
In simple linear regression, we assess the constant variance assumption by plotting the response variable against fitted values
FALSE we plot the residuals vs. fitted
With the Box-Cox transformation, we do not transform the response variable when λ = 0
FALSE when lambda = 0, we use the log transformation
In simple linear regression, β1 hat is an unbiased estimator for β0
FALSE β1_hat is an unbiased estimator for β1. It is never an estimator for β0
The quantile-quantile normal plot of the residuals is the only tool available for assessing the normality assumption in ANOVA
FALSE. Could also look at a histogram
In simple linear regression, the sampling distribution for the variance estimator is (chi-squared) regardless of whether the assumptions of the model hold or not
FALSE. This is under the assumption of normality. Chi2 with n-2 DF
The only assumptions for a simple linear regression model are linearity, constant variance, and normality.
False The assumptions of simple Linear Regression are Linearity, Constant Variance assumption, Independence and normality
In multiple linear regression, a VIF of 10 means that there is no correlation among the j predictor and the remaining predictor variables, and hence the variance of the estimated regression coefficient is not inflated.
False
In multiple linear regression, the estimation of the variance of the error terms is unnecessary for making statistical inference on the regression coefficients.
False
The estimated simple linear regression coefficient β0 hat measures the strength of a linear relationship between the predicting and response variables
False
The interpretation of the regression coefficients is the same for logistic regression and poisson regression.
False. Interpretation of the regression coefficients of Poisson regression is in terms of log ratio of the rate. Interpretation of the regression coefficients of Logistic regression is in terms of log odds.
If the constant variance assumption does not hold in multiple linear regression, we apply a Box-Cox transformation to the predicting variables.
False. Apply box cox to the response variable
Elastic Net often underperforms LASSO regression in terms of prediction accuracy because it considers both L1 and L2 penalties together.
False. Elastic net often outperforms the lasso in terms of prediction accuracy. The difference between the lasso and the elastic net is the addition of a penalty just like the one used in ridge regression. By considering both penalties, L1 and L2 together, we have the advantages of both lasso and ridge regression.
In SLR, the sampling distribution of the estimators of the regression coefficients is a T distribution with n-2 df regardless of the distribution of the errors
False. Estimators for regression coef's are unbiased regardless of the distribution of the data, but their sampling distribution is t-dist with n-2 df assuming the errors have a normal distribution
For a linear regression under normality, the variance used in the Mallow's Cp penalty is the true variance, not an estimate of variance.
False. For linear regression under normality, the variance used in the Mallow's Cp penalty is the estimated variance from the full model.
Stepwise regression is a greedy search algorithm that is guaranteed to find the model with the best score.
False. Greedy, but not always best.
If the constant variance assumption does not hold in multiple linear regression, we apply a Box-Cox transformation to the predicting variables.
False. If constant variance or normality assumptions do not hold, we apply a Box-Cox transformation to the response variable.
Interpretation of logistic regression coefficients is same as for linear regression assuming normality
False. Interpretation in logistic is with respect to log odds (not with respect to response variable)
In ANOVA, the pooled variance estimator or MSE is calculated assuming equal means and equal variances across all k populations
False. It only assumes equal variances across all k populations.
LASSO regression will always select the same number or more predicting variables than Ridge and Elastic-Net regression
False. Lasso selects the same or fewer due to variable elimination
The only model parameters in ANOVA are mean and variance of entire population
False. Model parameters are the means of k populations and variance of the entire population
Multicollinearity in multiple linear regression means that the columns in the design matrix are linearly independent.
False. Multicollinearity means there is a dependency between predicting variables which would equate to the columns in the design matrix.
In logistic regression, R2 can be used as a measure of explained variation in the response variable
False. Not applicable for Logistic regression due to binary response variable.
Forward stepwise variable selection starts with the simpler model and selects the predicting variable that increases the R-squared the most, unless the R-squared cannot be increased any further by adding variables.
False. R-squared is not compared during stepwise variable selection. Variables are selected if they reduce the AIC or BIC of a model.
Testing for the overall regression in MLR, rejecting the NULL hypothesis that all predictor coef's are zero concludes that all predictors are statistically signif.
False. Rejecting the null means at least 1 is statistically signif.
We can perform variable selection based on the p-values of the t-tests to test for statistical significance of the regression coefficients.
False. Should be tested by partial F test.
In stepwise regression, for either forward or backward direction, we correspondingly accept or remove variables that produce larger AICs or BICs.
False. Smaller AIC/BIC
In multiple linear regression, the sampling distribution used for estimating confidence intervals for the regression coefficients is the normal distribution
False. T distribution
If the confidence interval for a regression coefficient contains the value zero, we interpret that the regression coefficient is definitely equal to zero.
False. The coefficient is plausibly zero, but we cannot be certain that it is.
In ANOVA, the number of degrees of freedom of the chi-squared distribution for the variance estimator (not pooled variance estimator ) is N − k − 1 where k is the number of groups.
False. This variance estimator has N-1 degrees of freedom. We lose one DF because we calculate one mean and hence its N-1.
The estimated regression coefficients obtained by using the method of least squares are biased estimators of the true regression coefficients.
False. Unbiased.
In simple linear regression, the confidence interval of the response increases as the distance between the predictor value and the mean value of the predictors decreases
False: The confidence interval bands increase as a predictor increases in distance from the mean of the predictors
The estimated versus predicted regression line for a given x*
Have the same estimation (they have different variance)
T distribution
In particular, where the sample size is small and/or the population standard deviation is unknown. The sampling distribution of the stimated regresion coefficient is the T distrib.
How to test for independence
Independence can only be guaranteed in experimental design
In SLR for inference, the error terms are assumed to be:
Independent and normally distributed with zero mean and constant variance
The assumption of normality
It's needed for the sampling distribution of the estimator of the regression coefficients, and hence for inference
Cook's distance
Measures how much the estimated parameter values in the regression model change when the ith observation is removed.
Estimators of a regression line are derived by:
Minimizing sum of squared differences
What ways does Logistic regression differ from SLR in that:
No error term Response not normally distributed It models the probability of a response, not the expectation of a response
One way anova: Objective Assumptions PVE
Objective: compare means across k populations. ANOVA table. Test for equal means? Which pairs are different? Estimation of confidence intervals. Assumptions: Constant variance. Independence. Normality Pooled variance estimator: The variance estimator assuming equal variances. We assume that the variance of the response variable is the same across all populations and equal to sigma square where n = total number of samples and k = different populations or groups
The F-test for equal means is a _________ tailed test with ______ and ______ degrees of freedom, where k is the number of groups and n is the total sample size
One, k-1, n-k
Leverage points
Outlier data far from mean
How to test for linearity
Residual vs. Fitted plot Plot would have pattern where red line is approx horizontal at zero Transformation possible? Yes. Possible to transform predictors.
Chi distribution
Use chi-squared distribution when we are interested in confidence intervals and their standard deviation
Which are all the model parameters in ANOVA
The model parameters for one-way ANOVA are the means of the k populations and the variance of the entire population (all k populations together)
The total sum of squares divided by N-1 is
The sample variance estimator assuming equal means and equal variances
If the residuals of a multiple linear regression model are not normally distributed, we can model the transformed response variable instead, where a common transformation for normality is the Box-Cox transformation.
True
In ANOVA, if the constant variance assumption does not hold, the statistical inference on the equality of the means will not be reliable
True
In MLR, the higher the VIF the more likely the predictor is strongly correlated with a linear combination of other predictors
True
In multiple linear regression, we study the relationship between a single response variable and several predicting quantitative and/or qualitative variables
True
In simple linear regression, a negative value of β1 is consistent with an inverse relationship between the predicting variable and the response variable.
True
Logistic regression models the probability of success
True
In the balance of Bias-Variance tradeoff, adding variables to our model tends to increase our variance and decrease our bias.
True. Adding more variables will increase the variability and possibly induce multicollinearity. Adding more variables also reduces the bias in the model since it has an additional predictor to conform to which keeps the model from favoring one of the original predictors
BIC variable selection criteria favors simpler models.
True. BIC penalizes complexity more than other approaches.
You are interested in understanding the relationship between education level and IQ, with IQ as the response variable. In your model, you also include age. Age would be considered a controlling variable while the education level would be an explanatory variable.
True. Controlling variables can be used to control for bias selection in a sample. They're used as default variables to capture more meaningful relationships with respect to other explanatory or predicting factors. Explanatory variables can be used to explain variability in the response variable, in this case the education level.
If one confidence interval in the pairwise comparison includes zero under ANOVA, we conclude that the two corresponding means are plausibly equal.
True. If the confidence interval includes zero, it is plausible that the corresponding means are equal.
A poisson regression model fit to a dataset with a small sample size will have a hypothesis testing procedure with more Type I errors than expected.
True. In Poisson regression, if the sample size is small, the statistical inference is not reliable. Thus, the hypothesis testing procedure will have a probability of type I error larger than the significance level.
We can assess the assumption of constant-variance in simple linear regression by plotting residuals against fitted values.
True. In a residuals Vs fitted plot, if the residuals are scattered around the 0 line, it indicates that the constant variance assumption of errors hold
The one-way ANOVA is a linear regression model with one qualitative predicting variable.
True. One-way ANOVA is a linear regression model with one predicting factor/ categorical variable.
Multicollinearity in multiple linear regression means that the columns in the design matrix are (nearly) linearly dependent.
True. Problems arise when the columns of XT X are not linearly independent, or the value of one predictor can be closely estimated from the other predictors. We call this condition multicollinearity.
The larger the coefficient of determination or R-squared, the higher the variability explained by the simple linear regression model.
True. R-squared represents the proportion of total variability in Y (response) that can be explained by the regression model (that uses X).
Ridge regression corrects for the impact of multicollinearity by re-weighting the regression coefficients
True. Ridge regression has been developed to correct for the impact of multicollinearity. If there is multicollinearity in the model, all predicting variables are considered to be included in the model, but ridge regression will allow for re-weighting the regression coefficients in a way that those corresponding to correlated predictor variables share their explanatory power and thus minimizing the impact of multicollinearity on the estimation and statistical inference of the regression coefficients.
In Logistic regression, the hypothesis test for subsets of coefficients is approximate; it relies on large samples sizes
True. The distribution is chi squared and relies on large sample
The estimators of the error term variance and of the regression coefficients are random variables.
True. The estimators are ̂ β = (XT X)−1 XT Y and ̂ σ2 = ̂ εT ̂ ε/(n − p − 1), where ̂ ε =(I − H)Y . These estimators are functions of the response, which is a random variable. Therefore they are also random.
If a predicting variable is categorical with 5 categories in a linear regression model without intercept, we will include 5 dummy variables in the model.
True. When we have qualitative variables with k levels, we only include k − 1 dummy variables if the regression model has an intercept. If not, we will include k dummy variables.
The estimators for the regression coefficients are
Unbiased regardless of the distribution of the data
The mean squared errors (MSE) measures
Within treatment variability