chapter 14
correlation coefficient
The correlation coefficient computed from the sample data measures the *strength and direction* of a linear relationship between two variables.
If the assumption that the variance of ε is the same for all values of x is valid, and the assumed regression model is an adequate representation of the relationship between the variables, then
The residual plot should give an overall impression of a horizontal band of points
regression line/ line of best fit
We want to determine the equation of the regression linewhich is the line of best fit. Best fit means that the sum of the squares of the vertical distance from each point to the line is at a minimum.
confidence interval for B1 equation
b1 = point estimator t alpha/2 sb1 = margin of error
regression analysis is appropriate when the
dependent variable is continuous
coefficient of determination (r^2) measures
goodness of fit of the estimated regression equation.
good patterns of residual plots
horizontal band is desired
If the assumptions about the error term ε appear questionable,
hypothesis tests about the significance of the regression relationship and the interval estimation results may not be valid. 2. the residuals provide the best information about E/epsilon
B1 = 0
if B1 = 0, we can conclude that the mean value of y does not depend on the value of x--- * x and y are not linearly related*
test of significance holy grail *t test*
if Ho (null hypothesis is rejected) == a *statistically significant relationship exists between the 2 variables* if Ho is not rejected--- insufficient evidence to state that a stat sign relationship exists bet 2 variables
simple linear regression model
o Bo and B1 = parameters of the model o E = epsilon = random variable referred to as the error term
estimated simple linear regression equation
o Y hat = point estimator of E(y), the estimated value of y for a given x value *bo = y intercept of the line * b1 = slope of the line
coefficient of determination
provides a measure of the goodness of fit for the estimated regression equation
rejection rule for confidence interval for B1
reject Ho if 0 is not included in the confidence interval for B1
MSE (equation)
s^2 = MSE = (SSE)/n-2
regression analysis
statistical procedure used to develop an equation showing how the 2 variables are related
SSR
sum of squares due to regression = measures how much values from yhat onn the estimated regression line deviate from ybar
SST
tells you how much variation there is in the dependent variable
Least squares model (theory)
the best fitting line for the observed data ---- IS CALCULATED BY--- minimizing the sum of squares residuals (E/epsilon = yi - yhati) aka the diff bet observed and pred values
simple linear regression model is
the equation that describes how y is related to x and an error term
SST
total sum of squares
(least squares method (theory) )
uses sample data to provide the values of bo and b1 that minimize the "sum of squares of the deviations" between the observed values of dependent variable yi and the predicted values of dependent variable yhat i
independent variable
variable used to predict the variable of the dependent variable
sigma^2 =
variance of E, epsilon, in the regression model
To test for a significant regression relationship,
we must conduct a hypothesis test to *determine whether the value of β1 is zero.*
least squares method (equation) aka SSE
yhati = estmated value of the dependent variable yi = observed value of dep variable
epsilon/ error term
Error term accounts for the variability in y that cannot be explained by the relationship between x and y. (y intercept)
regression equation for simple linear regression
Graph of the simple linear regression is a straight line Bo = y intercept of the regression line B1 = slope E(y) = mean/expected value for y for a given value of x
Coefficient of determiniation
Used to evaluate goodness of fit for the estimated regression equation - takes values between 0 and 1 *expressed as a percent
SSE
((• value of SSE is a measure of the error in using the estimated regression equation to predict the values of the dependent variable in the sample.))
coefficient of determination explanation
(0,1) range = *percent of total sum of squares that can be explained* by the regression line (i..e. strong, weak relationship)
regression equation
. The equation that describes how the expected (or mean) value of y, denoted e(y), is related to x is called the_____
Confidence interval for B1
1. *if Ho is rejected, the hypothesized value of B1 is not included in the confidence interval for B1* 2. to reject Ho, 0 cant be included in conf int
correlation coefficient cont'd
1. Correlation coefficient is restricted to linear relationship between two variables. 2. Coefficient of determination can be used for nonlinear relationship and for relationships that have two or more independent variables.
assumptions about E(epsilon) in the regression model
1. The error epsilon, is a random variable with a mean of 0 2. The variance of epsilon, denoted by sigma^2 is the same for all values of the independent variable 3. The values of epsilon are independent 4. The error epsilon is a normally distributed random variable
simple linear regression
1. The goal of linear regression is to describe the relationship between two variables as a straight line. 2. To achieve this, linear regression attempts to model the relationship by fitting a linear equation (a straight line) to observed data.
intrapolation
1. linear regression model is built on sample data covering a spec range 2. intrapolate -- don't extrapolate
cautions of interpretation of significance tests
1. rejecting Ho : B1 = 0 and concluding that x and y has sign rel DOES NOT CONCLUDE a cause-and-effect relatioship is present bet x and y 2. just becuase you can reject Ho = B1 = 0 does not enable us to conclude there is a LINEAR RELATIONSHIP bet x and y
MSE
= mean square error == s^2 == provides estimate of sigma^2 (variance of E)
standard error of the estimate
= point estimate of sigma = s
residual observation of i
In other words, the ith residual is the error resulting from using the estimated regression equation to predict the value of the dependent variable
question answered by coefficient of determination
Question = How well does estimated regression equation fit the data ?
relationship between all sum of squares
SST = SSR + SSE
