Chapter 11 and 12 Formulas and Terms
Prediction /confidence Interval for an Individual Y, Given X (actual observed/single outcome value)
Confidence interval estimate for an actual observed value of y given a particular xi. Refer to formula sheet for full formula. There is a +1 in the brackets. This extra term adds to the interval width to reflect the added uncertainty for an individual case
Confidence Interval for the Average Y, Given X (expected/mean value)
Confidence interval estimate for the expected value of y given a particular xi: (refer to formula sheet for full formula) there is no +1 in the brackets for this formula
simple linear regression model
The relationship between Y and X is described by a linear function. Changes in Y are assumed to be influenced by changes in X. β₀ and β₁(slope) are population parameters and ε is a random error term representing all other factors that influence Y. β₀ and β₁ are the linear components. We assume that the error terms are random variables with mean 0 and constant variance, σ2. Y = β₀ + β₁X + ε
multiple regression
This equation, but with ε added at the end.
Explanatory Power of a Linear Regression Equation: Total variation of Y
Total variation of y is made up of two parts: SST = SSR + SSE
t-test
Used for hypothesis testing. b1 = regression slope coefficient. β₁ = hypothesized slope. sb₁ = standard error of the slope. t = (b₁ - β₁)/sb₁
Adjusted Coefficient of Determination
Used to correct for the fact that adding non-relevant independent variables will still reduce the error sum of squares. Adjusted R² provides a better comparison between multiple regression models with different numbers of independent variables. Value is less than R² and can be negative. R²(bar) = 1 - SSE/SST = (n-1)/(n- K- 1) where n = sample size, K = number of independent variables.
error sum of squares (unexplained variation)
Variation attributable to factors other than the linear relationship between x and y. yi = Observed values of the dependent variable. yi(hat) = predicted value of y for given xi variable.
multiple regression coefficients
(refer to formula sheet)
relationship between multiple and simple regression coefficients
(refer to formula sheet)
sum of squares regression(explained variation)
Explained variation attributable to the linear relationship between x and y. yi(hat) = predicted value of y for given xi variable. y(bar) = Average value of the dependent variable.
total sum of squares
Measures the variation of the yi values around their mean, y.
total sum of squares
Measures the variation of the yi values around their mean, y. yi = Observed values of the dependent variable. y(bar) = Average value of the dependent variable.
sample multiple regression
Omitted Variable Bias can be corrected by estimating sample multiple regression equations. The coefficients are chosen by minimizing the sum of squared residuals.
Coefficient of Determination
The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable. The coefficient of determination is also called R-squared and is denoted as R². This value must be between 0 and 1. For example, if there is a perfect linear relationship between X and Y, 100% of the variation in Y is explained by variation in X. R² = 1. R² = SSR/SST
Correlation and R-squared
The coefficient of determination, R2, for a simple regression is equal to the sample correlation squared. In our example, the sample correlation between price and square feet equals 0.7621 and the R2 thus equals 0.76212=0.5808. R² = r²
Predicting expected values for y
The sample regression equation can be used to predict an expected value for y, given a particular x. The specified value for x is x(n+1). y(hat)(n+1) = b₀ + b₁(x (n+1))
Sample Linear Regression (simple)
The simple linear regression equation provides a sample estimate of the population regression line. yi = Estimated (or predicted) y value for observation i. b₀ = Estimate of the regression intercept. b₁ = Estimate of the regression slope. xi = Value of x for observation i. The individual random error terms ei have a mean of zero. yi = b₀ + b₁xi
standard error(sb1) of the sample regression slope(b1)
a measure of the variation in the slope of regression lines from different possible samples. σ = standard deviation of random error. sx = standard deviation of x. sb₁ = √(σ²)/((n-1)sx²)
Estimation of Population Model Error Variance: standard deviation of the sample error
a measure of the variation of observed y values from the regression line and an estimator for the standard deviation of the population error. Division by n - 2 instead of n - 1 is because the simple regression model uses two estimated parameters, b0 and b1, instead of one. se = √(∑ei²)/(n-2)
Least Squares Coefficient Estimators
b0 and b1 are obtained by finding the values of b0 and b1 that minimize the sum of the squared errors (SSE): min SSE = min ∑ [yi - (b₀ + b₁xi)]²
slope and intercept coefficient estimator
b₁ can also be r(sy/sx), where s is standard deviation. The regression line always goes through the mean x, y.
Unbiased Estimators
if E(b₁) = β₁, and E(b₀) = β₀, then these estimators are unbiased.
coefficient of multiple correlation
the correlation between the predicted value and the observed value of the dependent variable. Is the square root of the R-squared. Used as another measure of the strength of the linear relationship between the dependent variable and the independent variables. Comparable to the correlation between Y and X in simple regression. R = r(y(hat), y) = √R²