9. Correlation and Regression

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

With the exceptions of r = +1/-1, we cannot really speak to the strength of the relationship indicated by the correlation coefficient without what?

A statistical test of the significance. - for our purposes, we want to test whether the correlation between the population of two variables is equal to zero.

What can outliers result in?

Apparent statistical evidence that a significant relationship exists when, in fact, there is none, or that there is no relationship when, in fact, there is.

How can coefficient of determination be computed for a simple linear regression?

By squaring the correlation coefficient, r. - cannot be used when there is more than one independent variable in the regression.

What is the challenge with computing a confidence interval for a predicted value?

Computing Sf (standard error of the forecast). -

What does covariance capture?

The linear relationship between two variables.

Covariance

a statistical measure of the degree to which two variables move together.

Nonlinear relationships

two variables could have nonlinear relationship yet a zero correlation.

Predicted Values

values of the dependent variable based on the estimated regression coefficients and a prediction about the value of the independent variable.

Range of correlation coefficient

-1 -> 1

Limitations of regression analysis

1) Linear relationships can change over time. This means that the estimation equation based on data from a specific time period may not be relevant for forecasts or predictions in another time period. - this is referred to as parameter instability. 2) Even if the regression model accurately reflects the historical relationship between the two variables, its usefulness in investment analysis will be limited if other market participants are also aware of and act on this evidence. 3) If the assumptions underlying regression analysis do not hold, the interpretation and tests of hypotheses may not be valid. - like if data is heteroskedastic or exhibits autocorrelation.

Assumptions underlying linear regression

1) a linear relationship exists between dependent and independent variable. 2) the independent variable is uncorrelated with the residuals 3) the expected value of the residual term is zero. 4) the variance of the residual term is constant for all observations. 5) the residual term is independently distributed; that is, the residual for one observation is not correlated with that of another observation. 6) the residual term is normally distributed.

F statistic

= MSR/MSE = (RSS/k)/(SSE/n-k-1)

R^2 formula when speaking in terms of variation

= total variation (SST) - unexplained variation (SSE)/ total variation aka Explained variation/total variation

When do F test and T test, test same hypothesis?

For a simple linear regression as there is only one independent variable, so the F-test tests the same hypothesis as the t-test for statistical significance of the slope coefficient. - in fact, in simple linear regression with one independent variable, F = t^2b1.

What does the F-test assess?

How well a set of independent variables, as a group, explains the variation in the dependent variable. - in multiple regression, the F-statistic is used to test if whether at least one independent variable in a set of independent variables explains a significant portion of the variation of the dependent variable.

When will standard error of the estimate be low (relative to total variability)?

If the relationship is very strong. - will be high if the relationship is weak.

On scatter plot of dependent and independent variable, which axis are they each on?

Independent on horizontal.

A property of the least squares method is that the intercept term may be expressed as

Intercept = Mean of Y - b1 mean of X.

Unit of measurement for correlation coefficient?

No unit of measurement, it is a pure measure of the tendency of two variables to move together.

What can the estimated slope coefficient be?

Positive, negative, or zero.

What can the intercept term be?

Positive, negative, or zero.

What is total variation equal to?

RSS (Regression sum of squares) + SSE (sum of squared errors).

Decision rule for F-test

Reject H0 if F > Fc

decision rule for tests of significance for regression coefficients What does rejection of the null mean?

Reject H0 if t>tcritical or t<-tcritical that the slope coefficient is different from the hypothesized value of b1. To test whether an independent variable explains the variation in the dependent variable, the hypothesis that is tested is whether the true slope is zero.

SEE calculating from ANOVA table

SEE = sq rt (MSE) = sq rt (SSE/n-2)

in a regression coefficient confidence interval, what is the standard error of the regression a function of? And? Why does this make sense?

The SEE. - as SEE rises, so does standard error of the regression, and the confidence interval widens. This makes sense because SEE measures the variability of the data about the regression line, and the more variable the data, the less confidence there is in the regression model to estimate a coefficient.

What does the intercept equation highlight?

The fact that the line passes through a point with coordinates equal to the mean of the independent and dependent variables (ie the point xbar and ybar).

For stocks, what can the slope coefficient in a regression be called?

The stock's beta. - measures relative amount of systematic risk in ABC's returns. - beta less than one = less risky than average.

Analysis of Variance

a statistical procedure for analyzing the total variability of the dependent variable.

Null hypothesis in t test

correlation = 0

What does slope =?

covariance divided by variance

DoF for F test with one independent variable

df numerator = k = 1 df denominator = n - k - 1 = n - 2 where n = number of observations.

Scatter Plot

each point represents the value of two variables.

What else can dependent variable be called?

explained variable, endogenous variable and predicted variable, while independent variable can be called explanatory variable, the exogenous variable, and predicting variable .

Limitations to correlation analysis

impact of outliers, potential for spurious correlation, and nonlinear relationships. + does not capture strong nonlinear relationships.

Correlation coefficient (r)

is a measure of the strength of the linear relationship (correlation) between two variables.

coefficient of determination (R^2)

is defined as the percentage of the total variation in the dependent variable explained by the independent variable.

The r = 1 and negative 1, the data points...

lie exactly on a line, but the slope of that line is not necessarily 1 or -1.

Covariance range. So?

may range from negative to positive infinity, and is presented in terms of squared units. So, for these reasons, we take the extra step of calculating the correlation coefficient, which converts the covariance into a standardized measure that is easier to interpret.

Standard Error of the Estimate

measures the degree of variability of the actual Y-values relative to the estimated Y-values from a regression equation. - gauges "fit" of the regression line. - the smaller the SE, the better the fit.

Total sum of squares

measures the total variation in the dependent variable.

Sum of squared errors measures

measures the unexplained variation in the dependent variable. Is the sum of squared vertical distances between the actual Y-values and the predicted Y values on the regression line.

Regression sum of squares

measures the variation in the dependent variable that is explained by the independent variable. RSS is the sum of the squared distances between the predicted Y values and the mean of Y.

How meaningful is the actual value of the covariance?

not very, because its measurement is extremely sensitive to the scale of the two variables.

simple linear regression model formula

nth observation of dependent variable Y = regression intercept term + regression slope coefficient independent variable x + residual. ^ wat a mess, what is going on.

For a simple regression, the formula for the predicted (or forecast) value of Y

predicted Y = ^b0 + ^b1Xp - Xp = forecast value of independent variable.

Spurious correlation

refers to appearance of causal linear relationship when, in fact, there is no relation.

appropriate test statistic with n-2 degrees of freedom

tb1 = ^b1 - b1/ S of ^b1.

What does rejection of the null hypothesis in F Test mean?

that the independent variable is significantly different than zero, which is interpreted to mean that it makes a significant contribution to the explanation of the dependent variable.

What does an intercept term of -2.3% be interpreted to mean? What is the intercept term in this regression called?

that when excess returns of S&P 500 (independent variable) is zero, the return on ABC stock is -2.3%. The stock's ex-post alpha. It is a measure of excess risk-adjusted returns. A negative ex-post alpha means that ABC underperformed S&P on a risk-adjusted basis.

In an ANOVA table, how are the mean regression sum of squares and mean squared error calculated?

the appropriate sum of squares divided by its degrees of freedom.

estimated slope coefficient

the estimated slope coefficient for the regression line describes the change in Y for a one unit change in X. = covariance xy/ variance x

What is the regression line?

the line for which estimates of b0 and b1 (estimated intercept term and estimated slope coefficient) are such that the sum of the squared differences (vertical differences) between the Y values predicted by predicted by the regression equation and actual T values, is minimized.

What is the intercept term b0?

the line's intersection with the Y-axis at X = 0.

What is the Standard Error of the Estimate?

the standard deviation of the error terms in the regression. - as such, the SEE is also referred to as the standard error of the residual, or standard error of the regression.

What is total sum of squares equal to?

the sum of the squared differences between the actual Y values and the mean of Y. **this is not the same as variance. Variance (of the dependent variable) = SST/(n-1)

Sum of squared errors

the sum of the squared vertical distances between the estimated and actual Y values. - thus, the regression line minimizes this SSE.

What does the estimated intercept represent?

the value of the dependent variable at the point of intersection of the regression line and the axis of the dependent variable when the independent variable takes on a value of zero.

What is the purpose of simple linear regression?

to explain the variation in a dependent variable in terms of the variation in a single independent variable. - variation being the degree to which a variable differs from its mean value.


संबंधित स्टडी सेट्स

12.9 Module Quiz - IPv6 Addressing

View Set

Medical Terminology: Prefixes that Pertain to Numbers or Quantity

View Set

Equations of Hyperbolas (continued) Assignment

View Set

Live Virtual Machine Lab 5-2: Module 05 Identity & Access Management

View Set

Purchasing Supply chain management chapter 6

View Set