Module 7: Intro to Linear Regression

¡Supera tus tareas y exámenes ahora con Quizwiz!

Example 1.2 Linear Regression For the money supply growth and inflation rate data that we have been working with in this reading, determine the slope coefficient and the intercept term of a simple linear regression using money supply growth as the independent variable and the inflation rate as the dependent variable.

( look at the picture for regression line equation steps) Tips: We're look to isolate the Total Variation in Y. We want to understand 0.003094. How much is explained and unexplained by the independent variable , Y. again, TSSreg = SSEy = 0.003094, we want to know how much is explained and unexplained? step 1) find slope, b1 = COVxy/ VARx = 0.7591 (we all know about "slope". As IV or x goes up 1, the DV or y go up by 0.7591) step 2) find y-intercept, b0 = (¯y - ¯x) *b1 = -0.005 step 3) ^Y = -0.005 + (0.7591)(ΔIV)

Definition standard error of estimate (SEE)

(check picture, especially the note) + we divide n-2 whenever we see simple linear regression. (don't let these small flashcards confused you)

Calculation ANOVA Table

(look at the picture)

Calculation/Definition Coefficient of Determination (R^2) Method #1 (only one Independent Variable)

(remember from M7L1) (only one Independent Variable) Coefficient of Determination = Coeff Corre ^ 2 or rcc ^2. Example 1-1 from M7L1: .....the coefficient of determination for the regression equals 0.95732 or 0.9164. What this means is that variation in money supply growth rate explains about 91.64% of the variation in inflation rates...... 1 - 91.64% = 8.36%, the unexplained.

Definition Linear Regression With One Independent Variable

*** - it is used to summarize the relationship between two variables that are linearly related. It is used to make predictions about a dependent variable, Y , using an independent variable, X. It's to test hypotheses regarding the relation between the two variables and to evaluate the strength of the relationship + Both X and Y has synonyms X or IV= also known as the explanatory variable, exogenous variable, and predicting variable Y or DV = also known as the explained variable, endogenous variable, and predicted variable

Calculation Regression model equation Regression Line Equation** (which we will be using for rest of the term)

+ "b1 and b0 are the regression coefficients." is only estimates of their actual parameters given our historical samples for x and y + b0 = y- intercept = Ave Y - Ave X * b1 + b1 = Slope Coefficient = COV x,y / VAR x (x being IV) + εi = error term, why? because sometime actual Y doesn't equal Y per the regression. It should be 0. (this is for conceptual part of the question) Thus: we use Regression Line Equation Regression line equation=> ˆY =ˆb0 + ˆb1*X - Linear regression computes the line of best fit that minimizes the sum of the squared regression residuals

Calculation (Recap from prior lessons) Covariances Correlation Coefficient Variances

+ Cov =Σ(Actual1 - Expected or mean1)(Actual2 - Expected2)/ n or n-1 or σ1*σ2*corre coeff + C.C = Cov/σ1σ2 + Variance (σ^2) =Σ(Actual - Expected or mean)^2 / n or n-1 +Covariances only gives us the directions of the assets, not strength. +Correlation Coefficient gives is the strengths and directions of the assets. (their properties are on the textbook in case you forgot I didn't write it down)

Definition Analysis of variance or ANOVA

- (ANOVA) is a statistical procedure that is used to determine the usefulness of the independent variable(s) in explaining the variation in the dependent variable. An important part of ANOVA is the F-test, which tests whether all the slope coefficients in the regression are equal to zero. F-test basically tests a null hypothesis of b1 = 0 versus an alternate of b1 ≠ 0. + if we reject the null, it means we have invalid results, vice versa.

Definition Coefficient of Determination (R^2)

- (R^2) tells us how well the independent variable "explains" the variation in the dependent variable. It measures the fraction of the total variation in the dependent variable that is explained by the independent variable.

Definition Test of hypothesis: Level of Significance and P-Value

- (recap) we reject the small p value. P value < level of significance = we reject the null hypothesis P value > level of significance = we failed to reject -(recap) smaller the standard error of an estimated parameter, the stronger the results of the regression and the narrower the resulting confidence intervals.

Definition standard error of estimate (SEE)/standard error of the regression

- Basically, it's the standard deviation of regression. It is used to measure how well a regression model captures the relationship between the two variables, IV and DV. - It indicates how well the regression line "fits" the sample data and is used to determine how certain we can be about a particular prediction of the dependent variable (^Y) based on a regression equation Obvious: smaller the standard deviation of the residual term (the smaller the standard error of estimate), the more accurate the predictions

Facts: (Very Important) linear regression on the exam, when they ask what does liner regression do?

- Linear regression looks to minimize differences between actual/historical Y and ^Y predicted per the regression equation.

Definition Standard Error of Forecast, Sf and Estimated variance of prediction error, Sf^2

- Regression analysis is used to make predictions or forecasts of the dependent variable based on the regression equation. Analysts construct confidence intervals around the regression forecasts. Two sources of uncertainty when we use a regression model to make a prediction regarding the value of the dependent variable. (important) 2 uncertainty: 1) uncertainty inherent in the error term, ε. or the SEE, Y-^Y 2) uncertainty in the estimated parameters, b0 and b1. (by the fact that just because you had certain relationships historically, that doesn't mean the y-intercept them the slope are going to be the same going forward) Facts: Sf > SEE prediction error is greater than the standard error of the estimate. The prediction error also takes into account uncertainty with the regression parameters, which are only estimates of their actual population parameters and it's what the relationship was historically. only way they would be the same is if we actually knew those regression parameters with certainty.

4/4 assumptions of simple linear regression model:

- The regression residuals are normally distributed. Also note that, with large sample sizes, we can drop the requirement that the residuals be normally distributed (due to the central limit theorem). Another word: the variance of each error term is normally distributed.

1/4 assumptions of simple linear regression model:

- The relationship between the dependent variable (Y) and the independent variable (X) is linear and the independent variable, X, is not random. If the relationship between the independent and dependent variables is nonlinear, estimating that relation with a simple linear regression model will produce invalid results. The residuals of a linear model should not follow a pattern when plotted against the independent variable. How do we know is nonlinear>? + bi or slope = 0 +rcc or correlation coefficient = 0 + IV isn't random. Ex: errors are correlated with the size of the IV

2/4 assumptions of simple linear regression model:

- The variance of the error term is constant for all observations. This is known as the homoskedasticity assumption. Σ (Y - ^Y)^2/(n -2) = Variance of Regression, we want this to be constant (quick examples on the picture) (1.6, there's no correlation between residuals and time. 1.7 shows variance of regression isn't constant because residual and time correlates) (1.7 eventually shows two regression lines if you look at the picture)

Calculation/Definition Hypothesis tests of the slop coefficient, b1 (I skip the math and formulas, doubt will be on the test) (we're at the area where test might not test these, pay attention to concepts)

- We can use the F-statistic using ANOVA to test for the significance of the slope coefficient (that is, whether it is significantly different from zero) But, we can also use t test. -There's a lot in example 2.4, but I doubt we will see them on the exam. Thus, I didn't drop it down here in quizlet.

Definition log-log model (or double log model)

- both the dependent and independent variables are logarithmic form.

Definition lin-log model

- dependent variable is linear, but the independent variable is logarithmic. + slope coefficient provides the absolute change in the dependent variable for a relative change in the independent variable.

Definition Log-lin model

- dependent variable is logarithmic, but the independent variable is linear. + slope coefficient is the relative change in the dependent variable for an absolute change in the independent variable.

Definition Indicator variables (or dummy variables) Hypothesis tests of slope when IV is an indicator variable

- in regression models they determine whether a particular qualitative variable explains the variation in the model's dependent variable to a significant extent. These are qualitative variables: + Value has property = 1 + Value has no property = 0 Characteristics: + A dummy variable must be binary in nature (i.e., it may take on a value of either 0 or 1). + If the model aims to distinguish between n categories, it must employ n − 1 dummy variables. The category that is omitted is used as a reference point for the other categories. + intercept term in the regression indicates the average value of the dependent variable for the omitted category. + slope coefficient of each dummy variable estimates the difference (compared to the omitted category) a particular dummy variable makes to the dependent variable. (textbook used Ex2.6 for indications of these characteristics)

Tips: "Simple" Linear Regression

- it means one independent variable

3/4 assumptions of simple linear regression model:

- no serial or no autocorrelation. Meaning the observations (pairs of Xs and Ys) are "independent" of each other. This implies that regression residuals are uncorrelated across observations. Another word: A +/- error in one observation shouldn't influence next error. Another word: uncorrelated error terms.

Definition To perform/calculate the F-test, we need the following information:

1) total # of observations (n) 2) The total number of parameters to be estimated. (for the purpose of the exam) we use 1 Independent Variable, which has 2 parameters. 3) RSS(regression sum of squares)- amount of variation in the dependent variable that is explained by the independent variable. 4) SSE (Sum of squared error/residual) - amount of variation in the dependent variable that is unexplained by the independent variable.

Calculation/Definition Coefficient of Determination (R^2) Method #2 (one or more Independent Variable) (see ANOVA)

3 subparts in textbook( we used some of prior example data) 1) Total Variation in DV or Y, or we call it "TSS" = Σ (Y - ¯Y)^2 ............... (or the numerator to find Variance of Y) or Total Sum of Squared, TSS or sum of squared deviations of observed values of Y from the average value of Y ( textbook version) Ex: 0.003094 2) Unexplained Variation in DV or Y = Σ (Y - ^Y)^2 ............. (the numerator for Variance of Regression) or Sum of Squared Error, SSE Ex:8.36%, SSE/TSS = unexplained % (Important) I got very confused in the process of creating flashcard. So, Yes, SSE or 0.000259 from ex 1.1 and 2.1 are the unexplained variation. I was confused because it's 8.36% the and it confused me. 8.36% was the % of unexplained in 0.003094. 8.36% = 0.000259 / 0.003094 or SSE/TSS. 3) Explained Variation in DV or Y -->( rcc^2 , R^2) = Σ (^Y - ¯Y)^2 or Regression sum of the squared, RSS Ex: 91.64% RSS/TSS = explained % How do we find RSS knowing TSS = 0.003094 and RSS/TSS = 91.64% basing on the examples? or TSS - SSE is simpler easy, we back into it for both question, RSS = 0.002895

Calculation Regression Residuals

= (Y - ^Y) or (basically the error)

Calculation VAR of regression or [SEE^2]

= Σ (Y - ^Y)^2/(n -2) or (SSE)^2/ (n -2)

Calculation Testing the significance of the correlation coefficient(using t test) (we're at the area where test might not test these, pay attention to concepts)

A 2 tailed test with n-2 DOF using t value. (similar steps as lesson 6) 1) - test statistic = t = [ corre coeff *(n-1)^1/2] / (1 - r^2 or Unexplained)^1/2 2) compare t value and test statistics. + decision rule is the same, we reject H0, if t-stat > + t-crit or if t-stat < − t-crit.

L1QMTB-AC1014-2107 *** The linear regression model least likely assumes: uncorrelated error terms. a random independent variable. a linear relationship between the two variables. L1QM-TB1005-2107 *** Which of the following statements is least likely to be an assumption of linear least squares regression analysis? The independent variable is random. The expected value of the error term in the model is zero. The variance of the error term is constant for all observations.

B A

Calculation Standard Error of Forecast, Sf (I doubt we need to memorize the formula)

Basically it's the standard deviation

Test Banks: (Questions that's worthy/good practice)(I started from this chapter, rest of them should slowly cumulate)

M7L2 L1QM-TBB1206-2107* L1QMTB-AC1020-2107** L1QMTB-AC1018-2107** L1QM-ITEMSET-PQ10925-2107* L1QMTB-AC1021-2107 L1QMTB-AC1009-2107

Example 2.6 Regressions with Indicator Variables

Pay attention to the analysis. It indicates what's the characteristics are for.

Calculation Coefficient of Determination (This is new)

R^2 = corre coeff^2 or rcc^2 - R^2 tells us the ΔDV explained by ΔIV Important Example: We got SSE or TSS as 0.003094. How much of that is explained and unexplained? Let's say we have 0.9573 Corre Coeff. We turn it to R^2, (0.9573)^2 = 91.64%. It means 91.64% is explained by the regression. 1 - 91.64% = 8.36%, or 8.36% is unexplained, or the residual, or the errors(which is explained later) Takeaway from the example: 91.64% is explained and 8.36% is unexplained.

Example 2.3 ANOVA (after doing some questions, ANOVA isn't that bad)

Regression = RSS Error = SSE MSR = RSS/k or RSS/1 MSE =SSE/(N-2) = SEE F stat = MSR/MSE

Calculation standard error of estimate (SEE) Remember: Σ (Y - ^Y)^2/ (n-2) = Variance of Reg = SEE^2

SEE = [SEE^2 or Variance of Reg]^1/2 or = [Σ (Y - ^Y)^2/ (n-2)]^1/2 Note: Variance of Reg = Σ (Y - ^Y)^2/ (n-2). also = SSEy/ (n-2) Obvious: 1) smaller the standard deviation of the residual term (the smaller SEE), the more accurate the predictions 2) we build CI around the predicted Y, ^Y. How? easy, remember CI = point estimate +/- SE *t/z value. we use SEE * t value = +/- dispersion around ^Y

Calculation Sum of Squared Error = SSE = remember: Error = (Actual - Expected or Mean) Regular variance: (Actual - Expected or Mean)^2/n (Edit: I was actually confused about it before, I've to clarify. )

SSE = Σ (Y - ^Y)^2 or Sum of squared error/Sum of squared residual or Numerator of Variance of regression [ SEE^2] or Unexplained variation or [1 - Coefficient of Determination] vs Instructor has been saying: TSS, SSE of Y, or SSEy: Σ (Y - ¯Y)^2 or Numerator of Variance of Y or Total Sum of the Squares for the regression (Important) - We are going to want to analyze in great detail the sum of the squares of the errors for Y. Later on, this is going to be called the total sum of the squares for the regression. The total variation in the dependent variable. Another words: SSE of DV (SSEy) = Total sum of the squares for the regression(TSSreg). Based on the picture, it's 0.003094. How much of that is explained and unexplained by our regression?

Facts: (test will most likely test conceptual in this chapter) To determine the importance of the independent variable in the regression in explaining the variation in the dependent variable, we need to perform hypothesis tests or create confidence intervals to evaluate the statistical significance of the slope coefficient. looking at the magnitude of the slope coefficient does not tell us anything about the importance of the independent variable.

So we're going to want to test-- Is our slope, our regression coefficient x,y-- Could it be equal to zero? If it is, that's bad. That would mean no relationship. Invalid. So we're going to want to perform that test. Is it possible the slope/ correlation coefficient could be 0 ? So we're going to want to perform hypothesis tests. The other thing we're going to want to do is create a confidence interval. So look, using my regression equation, I'll predict a value for Y given a value for X. I'll predict a value for Y. But we know the actual value for Y could be a little bit above or a little bit below. So we're going to want to use standard error of the estimate ,SSE, and something called our prediction interval, which we'll get into later, to put a confidence interval around that predicted value for Y. If you're trying to put a confidence interval around a predicted value for Y using actual historical information for X, then you'd use standard error of the estimate. If you want to predict the value for Y using a predicted value for X, then we're going to have to modify the standard error of the estimate.

Facts: (important) Sf > SEE prediction error is greater than the standard error of the estimate. The prediction error also takes into account uncertainty with the regression parameters + Look at the picture, it is the formula of Sf. No need to memorize, but understand what's in the formula. It explains a lot about Sf in M7L3 because

Standard deviation

L1QM-TBB1206-2107* (I dropped it here because it was a bit special, not a big deal. Still review the test bank) An analyst is investigating whether the systematic risk of a security is different to that of the market. She performs a linear regression of daily excess returns of the security versus the daily excess returns of the market over a period of 250 trading days and calculates the slope coefficient to be 1.3538 with a standard error of 0.1345. Given a critical t-statistic associated with a significance level of 5% of approximately 2, which of the following statements is most accurate? The analyst should reject the null hypothesis that the security has the same systematic risk as the market. The analyst should fail to reject the null hypothesis that the security has the same systematic risk as the market. The analyst should accept the alternative hypothesis that the security has the same systematic risk as the market.

The null hypothesis for the test will be that the true beta of the security is 1. The alternative hypothesis is that the true beta is not 1. The test statistic takes the value (1.3538 - 1) / 0.1345 = 2.63, which is greater than the critical statistic of 2; hence, the analyst should reject the null hypothesis. My interpretation: Ho = 1 Ha ≠ 1 recap: test statistics: avg x - HV/ SE It takes "slope coefficient", 1.3538, as "sample statistics" or replace "avg x". HV is 1. it finds test statistic as: 2.63. 5% alpha in t value = 2. we reject null.

Calculation standard error of estimate (SEE) Method #2 (don't let these small flashcards confused you)

Unexplained VAR in Y or (SSEy)/Total variance in Y or (TSS) = SEE^2

Definition/Calculation Linear Regression Equation (important) (really not familiar)

Y = b0 + b1*X x = independent variable Y= dependent variable b1 = slope b0 = intercept term slope = ΔDV/ΔIV or ΔY/ΔX + If the slope is positive, there's positive relationship. + If we know there's a relationship between x and y, b1 or slope won't be 0 and rcc or Corr Coeff won't be 0.

Calculation To find F-Stat (we're at the area where test might not test these, pay attention to concepts)

step 1) set up null hypothesis H0 = all slope coeff = 0 Ha = at least 1 slope coeff isn't 0 step 2) RSS/(SEE^2) = F- value or F-test statistics (check ANOVA Table) or MSR/MSE (don't get confused on this, if it confuses you, just move on, it's ok) step 3) look up F critical value(remember F test is a one tailed test that is bounded by 0 to the left)

Facts: (Important) Don't get confused with this. Variance of Y isn't Variance of Regression. Variance of Y: Σ (Y - ¯Y)^2/(n -2) Variance of Regression: Σ (Y - ^Y)^2/(n -2) (Check pictures)

Σ (Y - ¯Y)^2 = Σ of the squares of the errors for Y Σ (Y - ^Y)^2 = Σ of the squares of the regression error.

Module 7: Intro to Linear Regression

Conjuntos de estudio relacionados

M&B Ch. 6, 7, 9

AAMC FL-2

MGMT - Performance Management

Promulgated Contracts Test Questions

Chapter 13: Alcohol Use

SEC 301

management chapter 10

Chapter 3 Quiz

Midterm

Chemistry- Exam 1

CIS Chap 1

The Cold War

TX Govt CH.4 The Executive Department

Econ Final

Anatomy and Physiology ( Skeletal System)

Managerial ACG 2071 Ch 1 to 4

Exam 3

Python While Loops

Social Problems 13th Movie: Final

EXAM 2 NOTES