ISLR- Ch 3

¡Supera tus tareas y exámenes ahora con Quizwiz!

What are factors and levels

. If a qualitative predictor (also known as a factor) only has two levels, or possi- ble values,

What are some problems associated with fitting a linear regression model to a particular data set?

1. Non-linearity of the response-predictor relationships. 2. Correlation of error terms. 3. Non-constant variance of error terms. 4. Outliers. 5. High-leverage points. 6. Collinearity.

Define 95% confidence interval

A 95% confidence interval is defined as a range of values such that with 95% probability, the range will contain the true unknown value of the parameter. The range is defined in terms of lower and upper limits computed from the sample of data.

How to detect collinearity

A simple way to detect collinearity is to look at the correlation matrix of the predictors. An element of this matrix that is large in absolute value indicates a pair of highly correlated variables, and therefore a collinearity problem in the data.

What is an outlier?

An outlier is a point for which yi is far from the value predicted by the model

When do parametric methods will tend to outperform non-parametric approaches?

As a general rule, parametric methods will tend to outperform non-parametric approaches when there is a small number of observations per predictor.

What is the intercept and slope term of SLR

B0 - intercept. that is, the expected value of Y when X = 0, and β1 is the slope—the average increase in Y associated with a one-unit increase in X.

What are the coefficients/parameters of simple linear regression

B0 b1

What is a drawback of selection methods?

Backward selection cannot be used if p > n, while forward selection can always be used. Forward selection is a greedy approach, and might include variables early that later become redundant. Mixed selection can remedy this.

How do we find estimates for B0 & B1

By minimizing the least squares criterion. Or minimizing the RSS

Define collinearity

Collinearity refers to the situation in which two or more predictor variables are closely related to one another.

What test statistic is used in MLR

F-statistic

Formula for SE of B0 b1. What is an assumption that should be true for this to be valid?

For these formulas to be strictly valid, we need to as- sume that the errors εi for each observation are uncorrelated with common variance σ2.

Describe the Knn regression method

Given a value for K and a prediction point x0, KNN regression first identifies the K training observations that are closest to x0, represented by N0. It then estimates f(x0) using the average of all the training responses in N0.

What is the null and alternate hypothesis of SLR

H0 : There is no relationship between X and Y versus the alternative hypothesis Ha : There is some relationship between X and Y . Mathematically, this corresponds to testing H0 :β1 =0 Ha :β1!= 0,

State the null and alternative hypothesis for MLR

H0 :β1 =β2 =···=βp =0 versus the alternative Ha : at least one βj is non-zero.

What does it mean to be unbiased?

Hence, an unbiased estimator does not systematically over- or under-estimate the true parameter.

Explain whether we accept or reject null hypothesis based on value of F-statistic

Hence, when there is no relationship between the response and predictors, one would expect the F-statistic to take on a value close to 1. On the other hand, if Ha is true, then E{(TSS − RSS)/p} > σ2, so we expect F to be greater than 1.

What should we conclude if p-value of dummy variable is high?

However, we notice that the p-value for the dummy variable is very high. This indicates that there is no statistical evidence of a difference in average credit card balance between the genders.

What is the test to check if there is a relationship between response and predictors?

Hypothesis testing

How do you interpret a residual plot?

Ideally, the residual plot will show no discernible pattern. The presence of a pattern may indicate a problem with some aspect of the linear model.

How does SE(βˆ1) affect the hypothesis testing

If SE(βˆ1) is small, then even relatively small values of βˆ1 may provide strong evidence that β1 ̸= 0, and hence that there is a relationship between X and Y . In contrast, if SE(βˆ1) is large, then βˆ1 must be large in absolute value in order for us to reject the null hypothesis.

What to do if the residual plot indicates that there are non-linear associations in the data?

If the residual plot indicates that there are non-linear associations in the data, then a simple approach is to use non-linear transformations of the predictors, such as logX, √X, and X2, in the regression model.

Describe the problem if the error terms are correlated

If there is correlation among the error terms, then the estimated standard errors will tend to underestimate the true standard errors. As a result, confi- dence and prediction intervals will be narrower than they should be. For example, a 95 % confidence interval may in reality have a much lower prob- ability than 0.95 of containing the true value of the parameter. In addition, p-values associated with the model will be lower than they should be; this could cause us to erroneously conclude that a parameter is statistically significant. In short, if the error terms are correlated, we may have an unwarranted sense of confidence in our model.

How do you deal with outliers?

If we believe that an outlier has occurred due to an error in data collec- tion or recording, then one solution is to simply remove the observation. However, care should be taken, since an outlier may instead indicate a deficiency with the model, such as a missing predictor.

What is Backward selection?

In backward selection we start with the full model and sequentially delete the predictors that have the least impact on the fit (predictors with large p-values) This procedure continues until a stopping rule is reached. For instance, we may stop when all remaining variables have a p-value below some threshold.

What is forward selection?

In forward selection we start with a model that just has the intercept and then sequentially add to it the predictors that most improve the fit (predictors that minimize the RSS)

What is RSE?

In general, σ2 is not known, but can be estimated from the data. This esti- mate is known as the residual standard error, Roughly speaking, it is the average amount that the response will deviate from the true regression line.

How to check for positively correlated errors in time series data?

In order to determine if this is the case for a given data set, we can plot the residuals from our model as a function of time. If the errors are uncorrelated, then there should be no discernible pat- tern. On the other hand, if the error terms are positively correlated, then we may see tracking in the residuals

How do you detect high leverage points?

In order to quantify an observation's leverage, we compute the leverage statistic. A large value of this statistic indicates an observation with high leverage leverage.

What is the curse of dimensionality? In terms of kNN regression

In this data set there are 100 training observations; when p = 1, this provides enough information to accurately estimate f(X). However, spreading 100 observations over p = 20 dimensions results in a phenomenon in which a given observation has no nearby neighbors—this is the so-called curse of dimensionality.

What is tracking?

In time series data, if the error terms are positively correlated, then we may see tracking in the residuals—that is, adjacent residuals may have similar values.

However, even if an outlier does not have much effect on the least squares fit, it can cause other problems. Explain.

Including the outlier may result in a much higher value for the RSE. Since RSE is used to calculate confidence intervals and p-values, such a dramatic increase caused by a single data point can have implications for the interpretation of the fit. Similarly, inclusion of the outlier causes the R2 to decline from 0.892 to 0.805.

How large does the F-statistic need to be before we can reject H0 and conclude that there is a relationship?

It turns out that the answer depends on the values of n and p. When n is large, an F-statistic that is just a little larger than 1 might still provide evidence against H0. In contrast, a larger F-statistic is needed to reject H0 if n is small.

What is the residual? In terms of SLR?

Let yˆi = βˆ0 + βˆ1xi be the prediction for Y based on the ith value of X. Then ei = yi − yˆi represents the ith residual—this is the difference between the ith observed response value and the ith response value that is predicted by our linear model

How to detect outliers by using studentized residuals?

Observations whose studentized residuals are greater than 3 in abso- lute value are possible outliers.

How to identify heteroscedasticity

One can identify non-constant variances in the errors, or heteroscedasticity, from the presence of a funnel shape in the residual plot.

Why do we have outliers?

Outliers can arise for a variety of reasons, such as incorrect recording of an observation during data collection.

Why are prediction intervals wider than confidence intervals?

Prediction intervals are always wider than confidence intervals, because they incorporate both the error in the estimate for f(X) (the reducible error) and the uncertainty as to how much an individual point will differ from the population regression plane (the irreducible error).

What does R^2 measure?

R2 measures the proportion of variability in Y that can be explained using X. An R2 statistic that is close to 1 indicates that a large proportion of the variability in the response has been explained by the regression. A number near 0 indicates that the regression did not explain much of the variability in the response; this might occur because the linear model is wrong, or the inherent error σ2 is high, or both.

What are residual plots. Differentiate between residual plots for linear regression and multiple regression.

Residual plots are a useful graphical tool for identifying non-linearity. Given a simple linear regression model, we can plot the residuals, ei = yi − yˆi, versus the predictor xi. In the case of a multiple regression model, residual plot since there are multiple predictors, we instead plot the residuals versus the predicted (or fitted) values yˆi.

How to identify outliers

Residual plots can be used to identify outliers.

the standard error of μˆ

Roughly speaking, the standard error tells us the average amount that this estimate μˆ differs from the actual value of μ

Define Simple Linear Regression

Simple linear regression lives up to its name: it is a very straightforward approach for predicting a quantitative response Y on the basis of a sin- gle predictor variable X. It assumes that there is approximately a linear relationship between X and Y .

How does collinearity affect hypothesis testing

Since collinearity reduces the accuracy of the estimates of the regression coefficients, it causes the standard error for βˆj to grow. collinearity results in a decline in the t-statistic. As a result, in the presence of collinearity, we may fail to reject H0 : βj = 0. This means that the power of the hypothesis test—the probability of correctly detecting a non-zero coefficient—is reduced by collinearity.

Why might correlations among the error terms occur?

Such correlations frequently occur in the context of time series data. In many cases, observations that are obtained at adjacent time points will have positively correlated errors.

What statistic do we use for the hypothesis?

T-statistic

Difference between TSS and RSS

TSS measures the total variance in the response Y , and can be thought of as the amount of variability inherent in the response before the regression is performed. In contrast, RSS measures the amount of variability that is left unexplained after performing the regression.

When is RSE small or large?

The RSE is considered a measure of the lack of fit of the model (3.5) to the data. If the predictions obtained using the model are very close to the true outcome values—that is, if yˆi ≈ yi for i = 1, . . . , n—then (3.15) will be small, and we can conclude that the model fits the data very well. On the other hand, if yˆi is very far from yi for one or more observations, then the RSE may be quite large, indicating that the model doesn't fit the data well.

What is a drawback of RSE? How does R^ overcome this drawback?

The RSE provides an absolute measure of lack of fit of the model (3.5) to the data. But since it is measured in the units of Y , it is not always clear what constitutes a good RSE. The R2 statistic provides an alternative measure of fit. It takes the form of a proportion—the proportion of variance explained—and so it always takes on a value between 0 and 1, and is independent of the scale of Y .

In what setting will a parametric approach such as least squares linear re-gression outperform a non-parametric approach such as KNN regression?

The answer is simple: the parametric approach will outperform the non- parametric approach if the parametric form that has been selected is close to the true form of f.

What is a drawback of the approach of using an F-statistic to test for any association between the predictors

The approach of using an F-statistic to test for any association between the predictors and the response works when p is relatively small, and cer- tainly small compared to n. However, sometimes we have a very large num- ber of variables. If p > n then there are more coefficients βj to estimate than observations from which to estimate them. In this case we cannot even fit the multiple linear regression model using least squares, so the F-statistic cannot be used.

What is the hierarchical principle?

The hierarchical principle states that if we include an interaction in a model, we should also include the main effects, even if the p-values associated with their coefficients are not significant.

Give a range for the leverage statistic

The leverage statistic hi is always between 1/n and 1, and the average leverage for all the observations is always equal to (p + 1)/n. So if a given observation has a leverage statistic that greatly exceeds (p+1)/n, then we may suspect that the corresponding point has high leverage.

Describe the non-linearity problem

The linear regression model assumes that there is a straight-line relation- ship between the predictors and the response. If the true relationship is far from linear, then virtually all of the conclusions that we draw from the fit are suspect. In addition, the prediction accuracy of the model can be significantly reduced.

How can collinearity be a problem in the context of regression

The presence of collinearity can pose problems in the regression context, since it can be difficult to separate out the indi- vidual effects of collinear variables on the response.

Explain why least squares estimates are unbiased

The property of unbiasedness holds for the least squares coefficient estimates given by (3.4) as well: if we estimate β0 and β1 on the basis of a particular data set, then our estimates won't be exactly equal to β0 and β1. But if we could average the estimates obtained over a huge number of data sets, then the average of these estimates would be spot on!

How do we asses the accuracy of the SLR.

The quality of a linear regression fit is typically assessed using two related quantities: the residual standard error (RSE) and the R^2 statistic.

Explain what the values of the variance inflation factor can say about multicollinearity

The smallest possible value for VIF is 1, which indicates the complete absence of collinearity. Typically in practice there is a small amount of collinearity among the predictors. As a rule of thumb, a VIF value that exceeds 5 or 10 indicates a problematic amount of collinearity

Define Variable selection

The task of determining which predictors are associated with the response, in order to fit a single model involving only those predictors, is referred to as variable selection.

What is the baseline in qualitative ?

There will always be one fewer dummy variable than the number of levels. The level with no dummy variable—African American in this example—is known as the baseline.

What is mixed selection?

This is a combination of forward and backward se- lection. We start with no variables in the model, and as with forward selection, we sequentially add the variable that provides the best fit. if at any point the p-value for one of the variables in the model rises above a certain threshold, then we remove that variable from the model. We con- tinue to perform these forward and backward steps until all variables in the model have a sufficiently low p-value, and all variables outside the model would have a large p-value if added to the model.

It turns out that R2 will always increase when more variables are added to the model, even if those variables are only weakly associated with the response. Why?

This is due to the fact that adding another variable to the least squares equations must allow us to fit the training data (though not necessarily the testing data) more accurately. Thus, the R2 statistic, which is also computed on the training data, must increase.

But in higher dimensions, KNN often performs worse than linear regression.

True

Observation 20 had relatively little effect on the least squares fit in Because it has low leverage. True or false

True

even when the true relationship is highly non-linear, KNN may still provide inferior results to linear regression.

True

high leverage observations tend to have a sizable impact on the estimated regression line, much more than outliers. True or false

True

the squared correlation and the R^2 statistic are identical. True or False

True

What are the 2 main assumptions that linear regression makes?

Two of the most important assumptions state that the relationship between the predictors and response are additive and linear. Additive assumption means that the effect of changes in a predictor Xj on the response Y is independent of the values of the other predictors. The linear assumption states that the change in the response Y due to a one-unit change in Xj is constant, regardless of the value of Xj.

What do we do when the true relationship is non-linear?

Use polynomial regression

Name some statistics that can be used to judge the quality of a model

Various statistics can be used to judge the quality of a model. These include Mallow's Cp, Akaike information criterion (AIC), Bayesian information criterion (BIC), and adjusted R^2

What is the difference between outliers and high leverage points

We just saw that outliers are observations for which the response yi is unusual given the predictor xi. In contrast, observations with high leverage high leverage have an unusual value for xi

how to deal with multicollinearity

When faced with the problem of collinearity, there are two simple solu- tions. The first is to drop one of the problematic variables from the regres- sion. This can usually be done without much compromise to the regression fit, since the presence of collinearity implies that the information that this variable provides about the response is redundant in the presence of the other variables. The second solution is to combine the collinear variables together into a single predictor.

Give a solution for heteroscedasticity

When faced with this problem, one possible solution is to trans- form the response Y using a concave function such as log Y or Y . Such a transformation results in a greater amount of shrinkage of the larger re- sponses, leading to a reduction in heteroscedasticity.

Multiple regression model

Y = β0 +β1X1 +β2X2 +···+βpXp +ε, (3.19)

What is an indicator or dummy variable

an indicator or dummy variable that takes on two possible dummy numerical values.

What is multiple regression?

extend the simple linear regression model (3.5) so that it can directly accommodate multiple predictors. We can do this by giving each predictor a separate slope coefficient in a single model.

What is polynomial regression?

extending the linear model to accommodate non-linear relationships is known as polynomial regres- sion, since we have included polynomial functions of the predictors in the regression model.

The p-value for the interaction term, TV×radio, is extremely low. What does this mean?

indicating that there is strong evidence for Ha : β3 ̸= 0. In other words, it is clear that the true relationship is not additive.

What is a problem with the plotting of residuals method of detecting outliers?

it can be difficult to decide how large a resid- ual needs to be before we consider the point to be an outlier.

What is multicollinearity?

it is possible for collinearity to exist between three or more variables even if no pair of variables has a particularly high correlation. We call this situation multicollinearity.

What is heteroscedasticity

non-constant variances in the error terms is known as heteroscedasticity.

What is the null model?

null model—a model that con- tains an intercept but no predictors.

What a are studentized residuals? What are they useful for?

studentized residuals, computed by dividing each residual ei by its estimated standard error. It is useful for identifying outliers

What is time series data?

time series data consists of ob- servations for which measurements are obtained at discrete points in time.

What is the variance inflation factor? What is it used for?

variance inflation factor (VIF). The VIF is the ratio of the variance of βˆj when fitting the full model divided by the variance of βˆj if fit on its own. It is used to detect multi collinearity

What does a small p-value suggest?

we can infer that there is an association between the predictor and the response. We reject the null hypothesis—that is, we declare a relationship to exist between X and Y —if the p-value is small enough

ISLR- Ch 3

Conjuntos de estudio relacionados

Chapter 5: Fina 5170

CHAPTER 4

Conduction of the heart

AP Bio Semester 1 Final (Chp 2)

Physical Science Chapter 23

Econ Quiz Chp 5

New Deal Alphabet Soup

MGMT 365 Chapter 12 Connect Test Questions

ECON TEST 3 Prep

PRIORITIZATION NCLEX style

Spinal Column, Abdominals and Thorax Exam

Patho Exam 1

CS372 Computer Networking Week 1-3 review for quiz 1

DEVELOPMENT & USAGE OF ENGLISH TEST

bsc 182 exam 2- achy breaky heart

SmartBook

History of the USA 1917-29: The Red Scare

Series 65: Unit 4 Quiz 2

econ 321 exam #2

Physics 1