Stats Quiz 3
Least-Squares Method
Determines the values of b not and b 1 that minimize the sum of squared differences around the prediction line
Total Sum of Squares:Total Variation (SST)
Is a measure of the variation of the Yi values around their mean, Y-bar Subdivided into explained variation (SSR) and unexplained variation (SSE) SST = SSR + SSE
Coefficient of Determination
Measures te proportion of variation in Y that is explained by the variation in the independent variable X in the regression model Equal to the regression of sum of squares (SSR) divided by the total sum of squares (SST) r^2 = SSR/SSR
Coefficient of Correlation
Measures the relative strength of a linear relationship between two numerical variables Values range from -1 (perfect negative correlation) to +1 (perfect positive correlation) Correlation alone cannot prove causation Sample C of C = r
Standard Error of the Estimate
Measures the variability of the observed Y values from the predicted Y values in the same way that the standard deviaion measures the variability around the sample mean Measures the standard deviation around the prediction line
Error Sum of Squares: Unexplained Variation (SSE)
Represents the variation due to factors other than the relationship between X and Y
Regression Sum of Squares: Explained Variation (SSR)
Represents the variation that is explained by the relationship between X and Y
Independence of Errors
Requires that he errors be independent of one another. Especially important when data are collected over a period of time
Normality
Requires that the errors be normally distributed at each value of X As long as the distribution of the errors at each level of X is not extremely differenct from a normal distribution, inferencecs about Beta not and Beta 1 are not seriously affected
Equal Variance (Homoscedasticity)
Requires that the variance of the errors be constant for all values of X. Ex: The variability of Y values is the same when X is a low value as when X is a high value
Linearity
States that the relationship between variables is linear
Residual (Estimated Error Value)
The difference between the observed (Yi) and predicted (Y-hat i) values of the dependent variable for a given value of Xi Appears on a scatter plot as the vertical distance between observed value of Y and the prediction line ei = Yi - Y-hat i
Simple Linear Regression Equation (Prediction Line)
The predicted value of Y equals the Y intercept plus the slope multiplied by the value of X Unless all observed data points fall on a straight line, the prediction line is not a perfect predictor Takes the form ax +b
Relevant Range
The range between the sample values
Residual Analysis
Visually evaluates these assumptionsand helps you determine whether the regression model that has been selected is appropriate
Covariance
measures the strength of the linear relationship between two numerical variables (X, Y) Can have any value so it cannot be used to determine the relative strength of a relationship
Assumptions of Regression (LINE)
1. Linearity 2. Independence of Errors 3. Normality of Error 4. Equal Variance