CH5: SIMPLE REGRESSION
How to calculate intervals for the predictions for y values from the regression model?
Look in the table for standard error then use the formula.
How do you calculate confidence intervals for the prediction?
prediction +/- z x Standard Error (general)
How are regression line slopes of x on y and y on x related?
r = |1|, b = 1/d, regression lines identical r = 0, y=a and x=c, so they are at right angles
When is the linear relationship between the variables strongest?
the bigger |R| is.
How is adjusted R^2 different from R^2?
Always smaller, more reliable because: 1. accounts for number of data points in the regression sample 2. accounts for the number of independent variables in the regression line
Wha is linear regression?
An approach to modelling the linear relationship between two variables.
What is one crucial assumption you make in simple regression analysis?
Assume linear relationship between dependent and independent variables.
Speak about correlation and confidence intervals in the language of business.
At the 95% confidence level, we are sure that an increase in advertising by one unit will increase the sales volume by a unit between a and b.
Why do we set null hypothesis to = 0 all the time?
Because we want the probability that we would get these coefficients if there actually was no correlation. If the probability is low enough, and t statistic high enough, then it's unlikely and then the results are significant.
How are residual errors used in linear regression?
Caculate lowest residual errors.
EXAM: explain why you would or wouldn't use the generated simple regression for predicting the winning percentage for baseball team.
Choose which. Main reason is that BLAH has a much higher R-square and a lower standard error.
What is r^2?
Coefficient of determination = measures the proportion of the total variation in the dependent variable Y which is accounted for by the indepdent variable in the estimated regression equation. If r=0.84, r^2 = .70; therefore 70% of Y is due to X
What is symmetric what is not?
Correlation and covariance are, regression is not.
EXAM: describe how you should draw observations of daily revenues in two divisions for the following two scenarios: (1) r = 0; (2) r = -0.9
(1) if r = 0, X and Y are not correlated. When using Monte Carlo to estimate, we only need to generate random numbers for X and Y independently. (2)if r = -0.9, they are correlated and should generate random values for Y based on those generated for X. Summation of X and Y is used to estimate total annual revenue.
What are the 2 Excel methods to find the line of best fit:
1. Add trendline on scatter plot and choose linear option 2. Regression function in 'Data Analysis'
What are 3 methods to balance out deviations between points and the curve of best fit?
1. average deviation - minimize it 2. maximum deviation - minimize it 3. least sum of squares (SSE)
EXAM: compare the results of the simple regressin and the multiple regression and highlight some key differences.
1. compare the R-square statistic - highest one wins 2. compare standard - pick lower = these show which have a higher predictive power. 3. compare p-value of the common independent variable = this show which is more significant. 4. consider whether it's caused by multicollinearity. 5. which looks overall better 6. manager needs to conduct further analysis in order to be sure that the multiple regression model can be used for making important managerial decisions
What are the two types of intervals that we can calculate?
1. confidence intervals for the coefficients b0 and b1 2. confidence intervals for the predictions y We do this because we've used a sample and both coefficients and predictions are random variables that are normally distributed.
How to make r significant enough?
1. increase sample points 2. remove outlying values 3. lag one of the variables to allow for change in time
How is correlation better than covariance?
1. r is scale-independent, value is unchaged if (a) constant is added (b) all values multiplied by constant 2. r is reliable indicator of strenth of linear relationship 3. covariance can be large or small without X and Y being related
EXAM: How is the correlation coefficient interpreted?
1. r near 1 = X and Y are positively correlated: increases in X are accompanied by increases in Y 2. r near -1 = X and Y are negatively correlated: increases in X are accompanied by decreases in Y 3. r is near 0 = X and Y are not strongly related, at least not in a linear way
What are the two types of significance tests we run with correlation and why do we do them?
1. testing significane of correlation 2. testing signifiance of the coefficients We do both because, given we used a sample to infer them, we are unsure whether they reflect the population correlation or their coefficients. For coefficients, we're unsure if they are significant enough to be included in the regression function.
Why do we want to model the relationship between two vriables?
1. to determine the linear relationship 2. to interpret the linear relationship 3. to make predictions based on this linear relationsip
What are the three types of deviatins in regression analysis?
1. total sum of squares (TSS) 2. regression sum of squares (RSS) 3. sum of squared errors (SSE)
What is th slope of the regression line x on y?
1/d
EXAM: Use simple regression equation to predict the winning percentage of a baseball team for which the number of Runs is 800. Give a rough 90% prediction interval for the winning percentage. Interpret your result intuitivel.
Create the equation from the intercept and 'Runs'. Where the number of runs in a season as the first independent variable represents winninf percentage for a team as a dependent variable. BLAH is the intercept, and BLAH is the slope. Where x = 800, SUBSTITUTE. A 90% prediction = prediction +/- 1.64Sd
What is y called?
Dependent variable, response variable
What is the aim of regression analysis?
Find parameters such that the deviation between the model prediction (equation) an observations is as small as possible at all data points.
What is the meaning of a regression slope of 5 in business language?
For an increase of £1000 in x, y increases by such-and-such units.
EXAM: explain the relationship between t statistic and p value?
For simple regression, both t-statistic and p-value for the slope give us the same information stating whether or not the slope for the explanatory variable 'Runs' is statistically significant for predicting the winning percentage. Small p-values correspond to large absolute t-values. In the simple regression, the absolute value of t-statistic is BLAH and the p-value is BLAH.
How are p values and t values different?
For t values, you compare with a t value for significance, and for p values you compare with the signig
How do you design a significance test for correlation coefficient?
H0: r=0 H1: r>0 if you think there is positive correlation (1-tailed) r<0 if you think there is negative correlation (1-tailed) r=/=0 if you're looking for correlation but not specifying whether it is negative or positive Then compare with the critical value from tables.
How do you perform a t tes for the coefficients, say slope?
H0: true slope = 0 H1: true slope > 0 estimate slope - 0/standard error of slope
What does R^2 explain?
How well the estimated regression function fits the data. r^2 = 1 (fits perfectly cos all variation due to regression function, not residual error) r^2 = 0 (regression function unable to explain any of the variation in the behaviour of y, all due to residual variation)
How is correlation relevant for Monte Carlo simulation?
If correlated, we need to randomly generated random values for X, then using those and the correlation coefficient, generate random values for Y. The summatin of X and Y is used to estimate total annual revenue.
How are p-values and significance levels tied?
If p = g, the model would be rejected at 100p% significance level. Small p values say there is a small probability of getting that coefficient if the two variables weren't correlated.
What is x called?
Independent variable, predictor variable, explanatory variable
What does r near 0 mean
No strong correlation, at least linearily
Are parameters exact?
No, they are estimates.
What pattern should residual errors follow?
None, they should be pure random noise, otherwise that's evidence that the relationship is not purely linear.
What else is the R coefficient called?
Pearson's product moment correlation coefficient.
What does r = -1 mean
Perfectly negatively correlated
What does r = 1 mean
Perfectly positively correlated
Define r^2 in terms of variance types
RSS/TSS (fraction of variance accounted for by the regression; 1 - SSE/TSS
What is RSS?
Regression sum of squares - amount of variation in y about its mean explained by the regressin function
Mathematical formula for RSS
SUM{(b0 +b1x)-mean y)^2
Mathematical formula for TSS
SUM{y-mean y}^2
What is sx and sy?
Sample standard deviations.
What conclusion can you draw from the fact that we only have one set of historical data, and the regression line could be different if we had selected other observations?
Slope and intercept of regression line are random variables, and parameters b0 and b1 are point estimates for the *true variables*. The are also normally distributed.
What does r near -1 mean?@
Strong negative correlation
What does r near 1 mean?
Strong positive correlation
What is SSE?
Sum of squared error - amount of variation in y around its mean that the regression function cannot account for
What is SSE?
Sum of the squares of errors, used for determining paramteres in the method of least squares
What is the p value?
The chance that we would get this slope if it were actually 0.
EXAM: Define the correlation coefficient of X and Y in words and mathematically.
The correlation coefficient between X and Y measures the strength of the linear relationship between two variables X and Y. It is a number between -1 and 1 + add how it's interpreted.
EXAM: Is the correlation coefficient scale independent? Explain.
The correlation coefficient is scale independent. It means that the correlation value is unchanged if: 1. we add a constant to all X values or another constant to all Y vaues 2. we multily all X values by a positive constant or all Y values the same
What are parameters?
The intercept and the slope of the regression function (b0, b1)
What is a 'regression model'?
The relationship that describes how the dependent variable is related to the independent variable + an error term: y = B0 +B1x + e
Why are extrapolations dangerous?
The reltionship between the variables might be completely different outside the range used for finding the regression model.
What does the standard error measure?
The scatter in the actual data around the estimated regression line - how the statistic varies from sample to sample.
What does the correlation coefficient measure?
The strength of the linear relationship between two variables X (independent) and Y(dependent).
Why are interpolations dangerous?
They don't always express a connection between x and y: 1. correlation doesn't mean causation 2. explanations can be found for correlations 3. hidden third factor that influences both 4. r has to be significant enough
How to calculate confidence intervals for b0 and b1?
They're in the table under 95%
What is the t value used for?
To test whether the relationship is significant.
What is TSS?
Total sum of squares - the total deviations from the mean = SSE + RSS
What does negative correlation mean?
When one variable increases, the other decreases.
When are t values significant?
When they're significantly different from zero measured by the standard error of the slope sb1.
Express confidence interval for Y values in the language of business.
With 95% confidence level, we are sure that with an advertising budget of £70,000 the sales volume is between a and b (the interval).
Describe the line given by Y = a + bX
X = independent and explanatory Y = dependent and response a and b = constants a = Y-intercept (where line hits Y axis) b = slope/gradient of line
Give an approximate 95% prediction interval for a value y, given value for x:
[prediction -2Se, prediction + 2Se]
What is the slope of the regression line y on x?
b
What is the t-value equal to?
b1/sb1 (slope/standaard error of the slope)
EXAM: How is the correlation coefficient defined mathematically?
correl(X,Y) = r = covar(X,Y)/sxsy, where sx and sy are standard deviations of the X values and Y values and covar = formula, is the covariance between X and Y.
What are residual errors defined as?
e = y - (b0+b1x) - estimation
What is e and how is it distributed?
error term - normally distributed, e~N(0, s^2)
How do you calculate confidence intervals for the coefficients?
find from table
What is the correlation coefficient?
|R|<1