Checking a regression model

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

The best plot for checking the similar variances condition is the scatterplot of y on x

False, because any pattern is more visible in the plot of residuals on x. When a scatterplot of y on x can be used to check the similar variances condition, it is easier to see any changes in the variation in a plot of the residuals on x.

True or false: Exclude from the estimation of the regression equation any case for which the residual is more than 3 Se away from the fitted line.

False, because sometimes one observation is better than none. Why: The outlier may be the result of entering the wrong data, or it may be an indicator of something important. More information is required to determine if the outlier improves the model or not. Wrong Answers: True, why? : an outlier can exert a strong pull on the model but that does nto imply that it should be excluded. False, because using more data always produces a better regression model: Why? The outlier may be the result of entering the wrong data, or it may be an indicator of something important. More information is required to determine if the outlier improves the model or not.

True or false: Because residuals represent the net effects of many other factors, it is rare to find a group of residuals from a simple regression that is normally distributed.

False, because the sums of random effects tend to be normally distributed by the CLT. Why? A normal model is often a good description for the unexplained variation. The errors​ (residuals) around the fitted line represent the combined effect of other variables on the response. Since sums of random effects tend to be normally distributed​ (the Central Limit​ Theorem), a normal model is a good place to start when describing the error variation.

(True or False) If the simple regression model is used to model data that do not have constant variance, then 95% prediction intervals produced by this model are longer than needed....

False: Because prediction intervals would vary between too long and too short. Why? Since the sample variance is used to calculate prediction intervals, the intervals would be too long Wrong Answers (False because prediction intervals are independent of variance?) --> in correct. The sample variance is used to calculate prediction intervals. Determine how this would affect intervals produced by a model created using data with unequal variance.

Term that describes data with unequal error variation.

Heteroscedasticity occurs when the underlying errors lack constant variance. Data with heteroscedasticity have unequal error variation.

Use this plot to check the linear enough condition.

Scatterplot of y on x or a plot of residuals. Answer extended: The linear enough condition states that the association between the response and the explanatory variable needs to be linear. The scatterplot of y on x or the scatterplot of the residuals on the explanatory​ variable, x, is useful for checking for linear association. (Linear Regression only). Regression lines will be very misleading if your data isn't approximately linear. The best way to check this condition is to make a scatter plot of your data. If the data looks like it can roughly fit a line, you can perform regression. For other types of regression (like exponential regression), eyeball the scatter plot to make sure it roughly follows the shape of whatever regression you are performing.

Data on sales have been collected from a chain of convenience stores. Some of the stores are considerably larger​ (more square feet of display​ space) than others. In a regression of sales on square​ feet, can you anticipate any​ problems?

There would likely be unequal variation in the data, with more variation among larger stores.

a leveraged outlier has an unusually large or small value of the explanatory value

True, An outlier near the minimum or maximum of the explanatory variable is leveraged. A leveraged outlier can exert a strong pull on the model.

OUtlier

an observation that stands away from the rest of the data and appears distinct in a plot for any value of the explanatory variable.

An observation in a regression model with an unusually large or small value of x

an outlier near the minimum or maximum of the explanatory variable is leveraged. A leveraged outlier can exert a strong pull on the model. Why?

A common cause of dependent error terms is

the presense of a lurking variable: why? Errors in the Simple Regression Model are assumed to be independent. One way to detect dependent errors is to notice a pattern in the timeplot of the residuals. One source of dependence is the presence of lurking variables. The errors epsilonε accumulate everything else that affects the​ response, aside from the single explanatory variable in the model. If​ there's another variable that affects​ Y, we may be able to see its influence in the timeplot of the residuals.

Use this plot to check for dependence in data over time

timeplot of residuals answer extended: a pattern in a time plot of the residuals indicates dependence in the error of the model, perhaps due to lurking variables. If the residuals vary around zero, consistently over time, the data are independent. If the data being analyzed is time series data (data recorded sequentially), the Residual vs. Order of the Data plot will reflect the correlation between the error term and time. Fluctuating patterns around zero will indicate that the error term is dependent. Why?


Ensembles d'études connexes

Math Review 1: Literal equations

View Set