QM - Module 4 - Simple & Multiple Regression

Ace your homework & exams now with Quizwiz!

40 Explain the terms dependent variable, independent variable, error term, parameter of the population, parameter of the sample.

- Dependent variable: endogenous variable which is determined by some other, independent variables and relates to y in the regression model. o Variable to be forecast - Independent variable: exogenous variable as x in the simple linear regression. (Example: relation between sales (y) and sales area (x)) o Variables believed to be related to the dependent variable - Error term: residual which describes the deviation between the actual data point and the calculated regression line (yi - yhat). It covers influences which aren't represented by the regression model and therefore represents the random term. In contrast to deterministic models, where there is no error variable, probabilistic models add an error term to measure the error of the deterministic component. o Accounts for all the variables, measurable and immeasurable, that are not part of the model - Parameter of population: ß0 und ß1 are parameters of population as they correspond to the true parameters and are normally unknown. o Also yi and ε (?) - Parameter of the sample: b0 and b1 are parameters of the sample. In the regression model they are estimators which are based on the calculated straight line through our sample data. b0 and b1 are unbiased estimators of ß0 und ß1.

51 How do you test the correlation coefficients for significance? What are the preconditions for this test?

Because the coefficient of correlation ρ is a population parameter, we must estimate its value from the sample data. The resulting sample coefficient of correlation is denoted by r. When there is no linear relationship between two variables ρ equals 0. When we want to determine whether a linear relationship exists we can do the following hypothesis test: H0: ρ = 0 and H1: ρ ≠ 0 Note that the t-test of ρ and the t-est of βi produce identical results. However the concept behind these two tests is different. If we are interested in discovering the relationship between two variables, or if we have conducted an experiment where we controlled the values of the independent variable, the t-test of βi should be applied. If we are interested in determining whether two random variables that are bivariate normally distributed are linearly related, the t-test of ρ should be applied Precondition: Variables are interval and bivariate normally distributed. If this condition is not satisfied, we can apply the Spearman rank correlation coefficient (instead of the Pearson coefficient of correlation).

50 What is the explanation for the non-linear characteristics of the bounds of prediction and confidence interval?

Both intervals are represented by curved lines. This shape occurs since the estimated error becomes larger the farther the given value of x is from xmean. By computing the values of the interval, both formulas contain the squared difference of the given value of x from the xmean in the numerator. The part of the estimated error is measured by: GRAPH 1 The larger the difference of these two values the wider the interval would get. GRAPH 2

57 If the results of the significance tests of the coefficients differ from the results of the Fischer- Test, which test will have a higher priority?

By replacing the F-test by a series of t-tests, there occurs the probability of a type-1-error every time a t-test is obtained. The type-1-error (α-error) implies the risk of accepting the model if it's false. Therefore a regression model with at least one coefficient which is significantly different from 0 could be interpreted as valid although the coefficient is not significant different from 0. As the F-test combines the t-tests for every coefficient in a single test, the probability of making such an error is substantially reduced. The F-test has also a higher priority regarding the multicollinearity. In this case the t-statistic may result in small values, which indicate misleadingly a non-linear relation to dependent variables. Instead of that the F-test isn't affected by multicollinearity.

58 What is multicollinearity? What is a sign for multicollinearity? How can you reduce multicollinearity?

Commonly, different independent variables in a multiple regression model are related. When this relationship is too strong, some problems arise when testing and fitting the model. When two independent variables are highly correlated, they both convey essentially the same information. Neither may contribute significantly to the model once the other has been included. But together they contribute a lot. The adjusted-R will be high, the p-value of the F-test is low, which implies that the good fit is unlikely to be a coincidence. If the goal is to understand how the various X variables impact Y, then multicollinearity is a big problem. It increases the standard error of estimates, which reduces t-statistics and thus reduces the degree of confidence that can be placed in the estimates. The confidence intervals may even include zero, which means that an increase in the X value can be associated with an increase, or a decrease, in Y. Large correlation usually point to the source of the trouble. Try to include independent variables that are independent of each other or use a stepwise regression (forward or backward). - T-statistic results in small values, which indicate misleadingly no linear relation to dependent variables - Multicollinearity influences the t-statistic but not F-test o A high F-value in combination with low t-values suggests multicollinearity

52 What does heteroskedasticity mean? How do you check it?

Heteroskedasticity implies a standard deviation of the error variable (σε) which is not constant for different values of x. That means that the distribution of ε conditional on x is not identical. The variance of residual error should be constant for all values of the independents. If this assumption is violated, the errors are said to be heteroscedastic. There are several methods of testing the presence of heteroscedasticity. Most commonly used is the time-honored inspection of the residuals. You have to check for patterns in a plot of the residuals against the predicted values of y. A homoscedastic model will display a uniform cloud of dots, whereas heteroscedasticity will result in patterns such as a funnel shape, indicating greater error as the dependent variable increases. (More formal tests are Breusch-pagan test and Goldfeld-Quandt test. There are two ways of dealing with heteroscedasticity: either entirely respecify the model or use the weighted least-squares regression option. This is a regression technique in which smaller residuals are weighted more when calculating the regression coefficients confidence intervals.) Last part not treated in Lecture, nor In reading

54 Explain the test statistic as well as the test distribution for the Fischer-test within the linear regression analysis.

In the multiple linear regression (as opposed to the simple linear regression) more than one slope coefficient has to be tested. The t-test examines only the significance of the coefficients βi one by one, which means we have to do the same procedure for every coefficient, whereas the F-test takes into account all of them simultaneously. Remember: In EXCEL you have to look for each component for the t-value. But to make a conclusion from the F-test you only have to look at one value. • Test statistic: ratio of two scaled sums of squares (MSR/MSE) • Test distribution: ratio of two chi-squares variates • Alpha error (probability of identifying the model wrongly as valid) is lower for F-test than for t-test • Multicollinearity does not impair the F-test

45 Which parameter is used to quantify the explanatory power? What interpretation does this parameter allow?

R2 as the coefficient of determination measures the strength of the linear relationship. It indicates which proportion of the sample variation in yi is explained by the regression model. The coefficient of determination corresponds to the square of the coefficient of correlation: R2 = r2. R2 = (sum of squares for regression/sum of squares total) = SSR/SST = explained variation/variation in y. R2 can take values between 0 and 1. If R2=0 the model doesn't have any explanatory power, whereas R2=1 means that all εi are 0. R2 can only be negative if there is no constant in the model. It increases with adding more regressors (= drawback, because it also increases even if additional variables have no explanatory power --> using adjusted R2). Example: Hedge-Fund replications --> researchers tend to include factors which actually aren't relevant although they rise R2. • See formulas

47 Description of a return model using linear regression. Explain the difference between the systematic risk and unsystematic risk. Which of them can be diversified? Which of them is compensated on the market?

Regression model of returns: μ = ß0 + ß1* μm + ε. The market model implies that a stock return μ is linearly dependent on the equity market (represented by the return on market portfolios μm). Systematic risk is based on the market and measures the volatility of the asset price which is affected but also compensated by the market. Therefore the coefficient ß1 is called the stock's beta coefficient, which measures how sensitive the stock's rate of return is to changes in the level of the overall market. Unsystematic risks are the so called firm-specific risks and are only a result of activities and events of one corporation and therefore they can be diversified. Thus, they can be regarded as non compensated risk. If an investor expects the markets to rise, it makes sense to hold a portfolio with a ß > 1. The systematic risk is measured by the coefficient of determination and the unsystematic risk as a consequence by 1-R2.

53 Meaning of different "Sum of squares" (SST, SSTR, SSE) and their averages (MSTR, MSE).

SST, SSTR and SSE are Sum of Squares whereas MSTR and MSE are Mean Squares and hence the averages of the Sum of Squares. In detail these are: - SST: Sum of Squares total (total variation of y) - SSTR: Sum of Squares for regression (variation which is explained by the regression model) assumption: SSTR = SSR - SSE: Sum of Squares for error (unexplained variation) - MSTR: Mean Square for regression, MTSR= SSR/k assumption: MSTR = MSR - MSE: Mean Square for error, MSE= SSE/(n-k-1) This implies: SST = SSTR + SSE. Basis for the mean of the Squares are the degrees of freedom. For the regression the df corresponds to k, the number of all estimated parameters. For the residual the df equate to (n-k-1). Finally the F value is calculated by dividing MSTR by MSE. • See formulas and graph

42 What are the "Short-Cut" formulas good for?

Short cut formulas are useful for computing the covariance (cov(x,y)) and the sample variance of the independent variable x (sx2). Combining them provides a shortcut method to manually calculate the slope coefficient or the SSE. In the end, we do not have to calculate the deviations of x and y from their mean but can use the sum of x, the sum of y, the sum of x multiplied with y and the sum of the squared x. • See formulas

59 What is the Durbin-Watson used for? Explain the conceptual background of the Durbin-Watson test.

The Durbin-Watson test is used to determine whether there is evidence of first-order autocorrelation, which means a relationship between ei and ei-1. The resulting value dw can take on values between 0 and 4, whereby d < 2 indicates positive first-order autocorrelation (consecutive residuals are similar) and d > 2 indicates negative first-order autocorrelation (consecutive residuals differ widely). In the latter case, (ei - ei-1)2 will be large so that a d > 2 results. The test can be conducted for negative or positive first-order autocorrelation (one-tail) or simply for first-order autocorrelation (combining the two one-tail tests). Example: positive first-order autocorrelation test: The null hypothesis can be defined as "There is no first-order autocorrelation" and the alternative hypothesis as "There is positive first-order autocorrelation". With respect to n, k and α different values dL and dU can be read off tables. For d < dL we can conclude positive first-order autocorrelation, for d > dU there is no evidence for first-order autocorrelation and for dL < d < dU the test is inconclusive. --> Why d < 4??? Werden wir noch behandeln... werde es dann beantworten. Aber falls jmd. kapiert warum d < 4 sein soll/muss.... Tell me :-)

43 What preconditions do the error terms need to fulfill, in order to have a sufficiently good regression model? How can you verify them?

The Gauss-Markov assumptions are as following: 1. E(ε)=0 --> The expected value of the error term has to be zero. 2. {ε1,...,εN} and {x1,...,xN} are independent --> The X and ε are independent. 3. Var(εi) = σε2 --> All error terms have the same variance. This is called homoskedasticity. 4. Cov(εi ,εj) = 0 --> The value for ε associated with any particular value of y is independent of ε associated with any other value of y. In other words, there is zero correlation between different error terms. 5. (The probability distribution of ε is normal.) Conditions 1 and 2 ensure that the OLS estimator is unbiased (even if heteroskedasticity or autocorrelation are present). In order to verify this, we have to calculate the expected value of b. Taking conditions 3 and 4 into account, we can ensure that we have the Best Linear Unbiased Estimator (BLUE). The second thing we have to calculate to verify this is the variance of b. Nonnormality can be identified by drawing a histogram of the residuals. If it is bell shaped, we can assume that the error is normally distributed. Heteroscedasticity can be identified by plotting the residuals against the predicted values of y. Nonindependence of the error variable can be identified by plotting the residuals against time or by conducting a Durbin-Watson test.

55 What is the link between the sum of squared errors, the standard deviation of the error terms, the coefficient of determination and the test statistic of the Fischer-test.

The Sum of squared errors (SSE) corresponds directly to the standard deviation of the error term and affects the coefficient of determination R2 as well as the F value. Since the SSE is large this implies also a large sε but therefore a small R2 and a small F value. Thus in such a case the explanatory power of the regression model is poor. The smaller the SSE the better the explanatory power of the model. With a SSE and sε of 0 the regression model describes the relation of the dependent and the independent variables perfectly.

44 How do you test the coefficients for significance? What is the starting null hypothesis? Justify the choice of the null hypothesis.

The coefficients are only significant if they are linearly related, hence the regression line mustn't be horizontal with a slope ß1=0. Although the true parameters are unknown, we can draw inferences about the population slope ß1 from the sample slope b1. A hypothesis test of ß1 helps us assessing the linear model. The null hypothesis specifies that there is no linear relationship, which means the slope is 0, H0: ß1 = 0. If the null hypothesis can be rejected, a linear relationship can be assumed as ß1 is significantly different from 0. Therefore the opposite of the desired result has to be formulated within the null hypothesis. The alternative hypothesis states that ß1 is unequal to 0. With H0: ß1≠ 0 we would get no useful result in case we could reject this null hypothesis. The test can be one- or two-tailed and is a t-test in the case of a simple linear regression and either a series of individual t-tests or an F-test (with H0: ß1 = ... = ßk = 0) in the case of a multiple linear regression. The test of ß0 is usually ignored as the interpretation of the y-intercept ß0 can be misleading.

41 How are the coefficients of a linear regression model determined? What can you say about the sum of the squared errors?

The coefficients of a linear regression model are b0, b1, ..., bn, whereby b0 is the y-intercept and b1, ..., bn are slope parameters. Every bi (for i ≥ 1) relates to a certain independent variable xi. The coefficients are determined by minimizing the sum of squared deviations between the actual data points and the regression line (yi - yi,hat)2. The sum of the squared deviations represents the so called sum of squared errors (SSE). The bi (for i ≥ 1) are calculated by dividing the covariance between the corresponding x and y by the (sample) variance of this x. b0 can then be calculated by using the b1's, the means of the xi and the mean of y. • See formulas SSE is an important statistic because it is the basis for other statistics that assess how well the linear model fits the data. It is possible to compute the standard error of estimate (sε) from the SSE. sε has to be as little as possible and helps to evaluate different models. Nevertheless it cannot be used as an absolute measure since it has no upper limit. Furthermore it can be used to calculate the coefficient of determination (R2).

46 What is the link between the standard deviation of the error terms and the explanatory power?

The higher the standard deviation of the error terms (σε) the lower the explanatory power. This is the case because a high σε implies that some of the errors are large. As σε is an unknown population parameter, we have to estimate σε from the data. The estimator is called Standard Error of Estimate (SEE) and can be calculated as square root of SSE (sum of squared errors) divided by (n-2). • See formulas

49 Explain the meaning as well as the difference between prediction interval and confidence interval? What is the link between both intervals?

The prediction interval is used to predict one single, particular value of the dependent variable for a given x. This interval describes within which range the predicted value will lie for a particular probability given by the t-value for the corresponding α. The confidence interval estimator is used for calculating the expected value of y for a given x. Since the estimation of the expected value bears less error, the confidence interval estimator will be narrower than the prediction interval for the same given value for x. Compared to the confidence interval formula the prediction interval formula contains one more figure (the 1) within the square root. Therefore a wider prediction interval is resulting which corresponds to the lower error probability by estimating the expected value. The link is an overlap in the results? Confidence interval should be included in prediction interval. • See formulas

48 What information do the values of the regression output contain?

The regression output contains the following information: Multiple R: is the correlation between the observed variable and the model predicted variable. It ranges between 0 and 1. The higher the multiple R, the better is the regression model. R2 (aka. coefficient of determination): indicates the percentage of the total variation observed in the dependent variable that can be explained using the linear model prediction compared to just using the mean. We decompose the variability in the response, into the portion that we can explain with our linear prediction and error. The larger the value of R2, the more accurate is the regression. A R2 close to 1 means that we have a very good fit - the independent variable explains most of the variance in the dependent variable. Adjusted R2: The adjusted R2 is a modification of the R2, which adjusts for the number of explanatory terms in a model. In contrary to the R2, the adjusted R2 only increases if the new term improves the model more then would be expected by chance. The adjusted R2 can be negative, and will always be equal or less than R2. The adjusted R2 will be only more useful if the R2 is calculated based on a sample - not the entire population. Standard error of estimate: indicates how closely the actual observations coincide with the predicted values on the regression line. If we accept the hypothesis that error terms are normally distributed, then about 68.3% of the observations should fall within plus/minus 1 standard error units of the regression line and 95.4% should within plus/minus 2 standard error units. ANOVA (analysis of variance): is used to test the significance of the overall regression results. The two hypotheses confronted are: Null hypothesis H0, which states that there is no linear relationship between X and Y versus the alternative hypothesis H1 which states that there is a linear relationship between X and Y. The relevant information for the ANOVA is the so called F-ratio - defined as the ratio of the explained to the unexplained variance. We can reject the null hypothesis if the computed/observed F is greater than or equal to a critical value, which can be obtained by looking at a table of the F-distribution. An ANOVA table always has the same structure! Alternatively, we can look at the p-value. If the p-value is small enough (e.g. less than 5%), this means that it is unlikely that the slope was non-zero purely by chance.

56 What results from the significance tests of the coefficients? What results from the Fischer tests? How is the exact argumentation?

We test the regression coefficients to check whether the intercept coefficient is significantly different from zero and if the slope coefficients are significantly different from zero too. Bear in mind that each regression coefficient is estimated and is therefore affected by some uncertainty. This uncertainty is measured by the standard error of each coefficient. With this information three different approaches exist to test whether the intercept is significantly different from zero: --> By dividing the estimated coefficient by its standard error we get the t-ratio aka the t-statistics. It tells us how many standard error units the coefficient is away from zero. As a rule of thumb, a t-statistic with an absolute value larger than 1.96 means that the corresponding coefficient is statistically different from zero. It is more than two standard deviations away from its mean. --> We can look at the p-value. If it is small enough, then we accept the idea that the corresponding coefficient is statistically different from zero --> We can build a 95% confidence interval by adding plus/minus 1.96 times the standard error to the estimated coefficient. If zero is not included in this interval, this means that we can exclude the fact that the corresponding coefficient may be equal to zero. In Excel the 95% confidence interval is pre-calculated!!! REMEMBER: ALL these approaches are founded on the hypothesis of normally distributed errors with constant variance! Seems not to be the answer to the question... - t-test examines the coefficients one by one, while F-test takes into account all of them simultaneously, whether the slope coefficient is significantly different from 0 - if t-test statistic is bigger than table value, then reject H0 that beta is 0 - same for F-statistic


Related study sets

Pre and Post Sale Customer Service Check

View Set

Chapter 8: Joints of the Skeletal System

View Set

Peds - Chapter 22: Nursing Care of the Child With a Neuromuscular Disorder

View Set

Chapter 1 Concept Overview Videos

View Set

Acute Respiratory Distress Syndrome (Sherpath)

View Set

Prepu - Chapter 53: Assessment of Kidney and Urinary Function

View Set