Econometrics Unit 1 Study Guide

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

1) What is the rule of thumb formula for approximating what value of correlation (r) could be considered statistically significant, in a specific sample of N? 2) In a sample of size N=50, approximately how large would the correlation need to be, in order to suggest a meaningful relationship between 2 variables?

1) |r| >= 2/(N^0.5) Essentially: If correlation coefficient (r) > or = 2/(N^0.5), in sample N, the correlation coefficient (r) is likely to signify a meaningful relationship --> NOT statistically significant, which can only be gotten from p-value, but meaningful) 2) Recall that meaningful relationship is when |r| >= 2/(N^0.5) i. Plug in: |r| >= 2/(50^0.5) ii. Solve: *r would have to be > or = 0.283 to signify a meaningful relationship*

1. Which fascist allegedly came up with associations between variables? 2. What is the purpose of regression models? 3. What is y determined by? 4. What is the "true model/relationship"? 5. What equation describes the trendline? 6. i. What is the covariance? ii. What can you interpret from it? 7. What is correlation coefficient? a. What can you interpret from it? b. How do you know if there is a real relationship between 2 variables? 8. What is the p-value? What is the range of it? What is considered low/high value? 9. What does statistically significant mean? What p value/range labels it as such? 10. What are the rules for means/variances/covariances?

1. Francis Galton (father of racist eugenics) 2. Regression models = try to link y(outcome variables) with x(explanatory variables) 3. y = determined by x + other factors; aka function of x 4. y = constant term(beta0 aka beta-naught) + slope term(beta1) * x + unobserved factors (external variables AKA context lmao) --> Called true model/relationship 5. y hat = beta0 hat - beta1 hat*x hat = ^ on top of beta = predicted/estimated 6. i. Covariance = measures how 2 variables relate (strength/direction) --> [Sum of all (x-mean of x)(y-mean of y)]/N-1 ii. Can interpret if covariance is: -Positive- variables trend together/variables get big together/linear relationship -Negative- variables have inverse relationship/variable gets big and other gets small 7. Correlation coefficient AKA r = covariance of xy/(SDx*SDy)--> do we have meaningful connection between 2 variables? a. What we can interpret from it: i. Sign --> (+/-)sign of r = (+/-)sign of covariance ii. Magnitude --> 0 = have no correlation whatsoever; 1 = variables are the same (except maybe in scale)--> ie. fahrenheit and celsius; -1 = --> ie. 2 categories with dummy variables, they move in exactly opposite directions --> Most random lists of numbers have ~0.5ish correlation, whether negative/positive b. There is convincing evidence of a REAL relationship between 2 variables if--> *|r| > [2/(N^(1/2))]* 8. p-value = tells us whether/not the relationship between 2 values is random/from coincidence AKA whether/not relationship = statistically significant --> p-value: = 0-1 (p<0.05 = statistically significant relationship) *p-value tiers:* - p<0.10 = suggestive of a relationship - p<0.05 = convincing relationship (most often used to determine statistically significant relationship) - p<0.01 = extremely convincing of nonrandom intentional relationship --> Low p-value(no coincidence; real) = relationship unlikely to come from random occurrence/coincidence (intentional relationship) --> High p-value(coincidence) = fairly consistent with random selection AKA relationship likely to come from random occurrence/coincidence 9. We are statistically confident that there exists some relationship. just saying that there is statistical evidence for some kind of relationship. --> p-value < 0.05 = statistically significant relationship 10. *Rules:* 1) Linear transformations- suppose z = a+bx --> x,z = variables; a,b = constants --> ie- F = 32 + 9/5C (this is a perfect linear function that has correlation of 1) i) z bar AKA mean of z = a+b(x bar) --> ie- if we know avg C temp = 20-->we know that avg F = 32+9/5(20) ii) Variance(z) = b^2*variance(x) --> ie- Var(C temp) = 100--> var(F) = (9/5)^2*var(C) 2) Linear combinations- suppose we have 2 variables, x and y. new variable: w = x+y --> ie- total compensation(w) = salary(x)+bonus(y) i) Mean of w = mean of x + mean of y ii) var(w) = var(x)+var(y)+(2*covar(x,y))

STATA: 1. What are the Qualifiers/if conditions on stata? 2. How do you generate a dummy variable for people taller than 6 feet? 3. How do you generate a graph? 4. How do you find correlation? What is a do file?

1. To summarize variable: summ variable if *other variable* == 1 --> summ height if female == 1 2. To create dummy variable: generate *new variable name* = (some condition that would define it) --> generate oversix = (height>6) 3. To graph variables: graph twoway (type of graph you want to do y-variable x-variable) 4. To find correlation between 2 variables: corr x-variable y-variable 5. To find p-value: pwcorr variables, sig 6. To find beta1 (or b1) = summ variables, * ratio of standard deviation (?) 7. Do file = STATA program File--> new do-file--> allows you to list of programming commands 8. To open document: use "name of data set"

Exam 1 2016: 6) When a person saves, assets grow with an annual percentage yield of 7%, compounding continuously. If initial assets are A0 , what will assets be after 1.5 years? (If you cannot solve, set up your final calculation.) 7) Explain three reasons why we might want to identify outliers in a dataset. 8) Suppose that consumer's demand can be modeled as QD = x(W/P)^y, where QD = quantity demanded, W = wealth, and x and y = parameters. Write ln(QD) in terms of the other variables, and find the price elasticity of demand. 9) What term is used to describe statistical confidence that there exists a relationship or effect between two variables? 10) This table has values of the Consumer Price Index from 2010 through 2015. Using the CPI as price deflator, find out how much a widget, selling for $712 in 2010, would cost in 2015 dollars. Year: 2010; 2011; 2012; 2013; 2014; 2015 CPI: 217; 220; 226; 230; 234; 237

6) A = P(1+R/n)^(nt) --> A = resulting assets --> P = principal amount --> R = rate --> n = frequency of compounding per 12 mo --> t = time in years Given: R = 0.07 n = 1 if compounded continuously P = A0 t = 1.5 i. Plug in: A = A0(1+0.07)^1.5 ii. Simplify: A = A0(1.07^1.5) iii. Solved: *A = A0(1.1068)* 7) 3 reasons for identifying outliers: i. Understand how mean is affected by outliers ii. Identify sampling errors/bias that might've resulted in outliers iii. Can attribute data skew to outliers iv. Outliers are interesting! 8) Qd = x(W/P)^y Qd = quantity demanded W = wealth x, y = parameters ln(Qd) = ? PED = ? --> Recall that PED in ln = change in ln(Qd)/change in ln(P) i. Apply ln to both sides of equation: ln(Qd) = ln[x(W/P)^y] --> ln(Qd) = ln(x) + (ln(W^y)-ln(P^y)) ii. Simplify: *ln(Qd) = ln(x) + (yln(W)-yln(P))* PED = change in ln(Qd)/change in ln(P) = y from ln(Qd) *PED = -y* --> *This is because the coefficient of ln(P) or ln(Price) is ALWAYS the PED!* 9) Statistical significance (also p-value) 10) CPI = [new basket of interest/old basket]*given price of interest (just % change) i. Plug: [237/217]*712 ii. Solved: *777.622*

What is the Rule of 72? How is it calculated? ie- I save money with interest rate of 6% per year. How many years until it doubles?

Quick way to calculate length of time to double something (typically money): *72/interest rate* ie- I save money with interest rate of 6% per year. How many years until it doubles? Answer: 72/6 = 12 years to double

What happens to the margin of error in an opinion survey when the sample size doubles, everything else the same?

Recall that margin of error = z*[(P(1-P))/n]^0.5 Margin of error is divided by (2^0.5)

Exam 1 2016: 11) Examine the Stata output (https://imgur.com/gallery/OxgLP3z). Based on the p-value, is there a statistically significant relationship between x and y? (Briefly explain your reasoning.) 12) What three factors can contribute to the observed correlation between two variables, x and y, aside from the causal effect of x?(Definitions are not necessary.) 13) Suppose that we estimate the regression Y = 1.89 + 2.24Xn where Xn = inverse of X = X^(-1). Find the level of X that would make Y equal to zero. (If you cannot solve, set up your final calculation.) 14) a) Using the following STATA output (https://imgur.com/gallery/j1Slo5Q), calculate B0 and B1. b) What is the interpretation of the mean of a dummy variable? 15) In a dataset on US workers, a variable called education measures the # of years of school that a worker has completed. What does the STATA command "gen x = (education >= 12)" do?

11) P-value = the value at bottom of column x--> p-value can't be negative, won't be = 1, so p-value = 0.1085 --> p-value = 0.1085, > 0.05--> fails to reject null hypothesis 12) i. Reverse causality ii. Unobserved/external heterogeneities iii. Sampling bias/error 13) Y = 1.89 + 2.24Xn Xn = inverse of X--> 1/X --> Level of X that would make y = 0? i. Plug Xn = 1/X into equation: Y = 1.89 + 2.24(1/X) ii. Since Y needs to = 0: 0 = 1.89 + 2.24/X iii. Simplify: -1.89 = 2.24/X --> -1.89X = 2.24 iv. Solved: *When X = -1.185, Y = 0* 14) a) Recall that: --> B1 = cov(x,y)/var(x) i. Find cov(x,y): corr y x, cov = covariance in STATA speak; between x and y = 2.951 ii. Find var(x): var(x) = SD^2 = (1.0068)^2 = 1.0136 iii. Calculate B1: 2.951/1.0136 = *B1 = 2.91* --> B0 = mean of y - mean of x(B1) i. Find mean of y = -6.05 ii. Find mean of x = -0.111 iii. Calculate B0: -6.05+0.111(2.89) = *B0 = -5.73* b) Mean of dummy variables = all the 1's added together/N -->*Thus, mean of dummy variables = essentially the proportion of 1's in the sample!* 15) gen x = (variable >= 12)--> create dummy variable = 1, for all variables >=12 *Creates a dummy variable that = 1, for all values of education > or = to 12*

Problem Set 1: 4. During five years, the inflation rates are 3.8%, 9.2%, 2.1%, 7.4%, 5.3%. Calculate the average annual inflation rate. (Hint: A question like this one always requires taking a geometric mean of the (1+R) values) 5. In a sample of N = 100 observations, Di is a dummy variable that takes a value of 1 for 64% of the sample, and 0 for the other 36%. Calculate the mean and standard deviation in this variable. 6. The following shows frequency distribution of variable x in sample with N=100 observations: Value of x = 1; 2; 3; 4 Frequency = 37; 29; 14; 20 Calculate mean; median; SD in x. 123456789 10

4) Geometric mean for growth rates! = {[(1+R1)(1+R2)...(1+Rn)]^1/n}-1 (1+0.038)(1+0.092)(1+0.021)(1+0.074)(1+0.053) = [1.3088^(1/5)]-1 = 0.05529 = 5.529% 5) For dummy variables, mean = sum of all 1's and 0's/N--> 64+0/100 = *mean = 0.64* - SD = {[sum of all (x-mean)^2)]/N-1}^1/2 --> [{[((1-0.64)^2)*64]+[((0-0.64)^2)*36]}/99]^1/2 = *SD = 0.482* 6) i. Mean = [(1*37)+(2*29)+(3*14)+(4*20)]/100 = 2.17 ii. Median = between the frequencies of 50 and 51 --> frequency of 50 = 2; frequency of 51 = 2--> median = 2 iii. SD = {[sum of all (x-mean)^2)]/N-1}^1/2 --> [{[((1-2.17)^2)*37]+[((2-2.17^2)*29]+[((3-2.17)^2)*14]+[((4-2.17)^2)*20]}/99]^1/2 = *SD = 1.137*

1. Does correlation imply causality? 2. Does causality imply correlation? 3. What does the causal effect of x on y imply?

1. Correlation does NOT imply causality 2. Causality DOES imply correlation 3. Causal effect of x on y = exogenous change in x, would result in change in y (someone forces y to change as result of x)

Suppose that we estimate the regression line y = 3 + 4(x) a) What is the change Δy when Δx = 2 ? b) Suppose we have two observations in the dataset, and the difference in the x values is exactly Δx = 2, while the difference in their y values is Δy = 9. --> Why is this finding different from the result in part (a)?

a) Suppose x1=1; x2=3--> y1=3+4(1)=7; y2=3+4(3)=15 --> Change in y = 15-7 = 8 b) Different bc x values are greater than (a)--> and since this regression line has a positive linear relationship--> y values are also greater than (a)! --> Change in x and y = in percentages, so if values are greater, raw value change = greater! --> (a) = percentage change, while (b) deals with raw numbers; so raw numbers may change in greater quantity but still follows same % change rule

What are the factors contributing to observed correlation in observational data (aside from the causal effect), otw known as endogeneity? What is the causal effect?

*Factors contributing to observed correlation in observational data (aside from the causal effect):* 1) Reverse causality--> y causes x, in part *ie:* Let's say that... --> Health = B0+B1(medication) + ui --> Mental health = B0+B1(anti-anxiety meds) + ui i. Observational data: interviewing people on how much meds they take + what their health levels are ii. Experimental data: someone tells you what/how much medication (real/placebo) you take + measure differences in health outcomes --> Will generally appear that medication is associated with worse health (maybe bc medications corporations produce = aimed at increasing illness and adding other symptoms so you buy more products --> *This is actually because health also in part affects medication; people with worse health take more medication!--> health and medication feed into each other* --> Makes it difficult to identify causal effect as researcher --> To identify what causes what, have to use critical/dialectical analysis + social context! Wow! 2) Unobserved Heterogeneity (external variables)- another factor associated with x which causes y *ie:* House sales prices in CA as a function of characteristics: i. Size ii. # rooms iiii. Air conditioning --> Found that having AC = reduces sales price (?) --> Unobserved heterogeneity = climate/location of housing --> Whether or not house has AC = affected by climate of location --> In principle, measuring/controlling solves problem! --> Take social context, external variables into account! 3) Selection bias: process of selecting sample from population creates a correlation when there did not exist one originally --> Biased selection --> Perhaps: no real association between healthy and tasty --> What is tasty varies across peoples/cultures and can be conditioned by society --- Causal effect is just the effect of x on y! :-D

Quiz 2 1) Y = B0+B1*X Mean of x = 5; SD of x = 2 Mean of y = -4; SD of y = 3 Cov(x,y) = -0.5 Calculate B0 and B1. 2) a) In Stata, what do we use to set a variable equal to some value (often with commands "generate" or "replace")? b) In Stata, what do we use to describe the condition when 2 values happen to already be the same (often an "if" statement or when defining a dummy variable)? 3) Work this question without a calculator: ln(2) = 0.69; ln(3) = 1.10...ln(6) = ?; ln(32) = ? 4) A "hat" on a variable means: a. optimal b. going to a party c. standardized d. derivative e. estimate 5) In a regression, what is a residual? a. difference between observed and predicted values of y b. minimum value of objective function c. unobserved characteristics that contribute to outcome d. error in the estimates of the Bs 6) Looking at the following image(https://imgur.com/gallery/kNXqVUk), find the: i. B0 ii. B1 iii. Fraction of differences in y that can be explained by x iv. p-value to test whether there is a statistically significant relationship between x and y 7) Correlation implies causality (T/F) 8) Causality implies correlation (T/F) 9) What is unobserved heterogeneity? a. chalk b. linear dependence with an omitted variable c. some other characteristic, associated with x, causing an effect on y d. reverse causality e. variable with no direct effect on y f. difference in variability of the unobservables

1) B1 = [cov(x,y)]/[var(x)] B0 = mean of y — mean of x(B1) --> Var = (SD)^2 B1 = -0.5/(2)^2 = -0.125 B0 = -4-5(-0.125) = -3.375 2) a) = b) == 3) ln(6) = ln(3*2) = ln(2)+ln(3) = 1.79 ln(32) = ln(2x2x2x2x2) = sum of ln(2) 5 times = ln(2)*5 = 3.45 (or alternatively, 4) e. estimate 5) a. difference between observed and predicted values of y 6) i. B0 = -6.56 (under coef.; beside _cons. = constant) ii. B1 = 2.73 (under coef.; beside x) iii. Fraction of differences in y that can be explained by x = r^2 (or coefficient of determination) = *0.7792, or 77.92% of differences in y can be explained by x* iv. p-value = 0 (p > |t|) --> which is statistically significant! 7) F 8) T 9) c. some other characteristic, associated with x, causing an effect on y --> Literally just means external variables lmao

Why do we have multiple variables in experiments?

1. To control for other differences --> Control = including other variables as explanatory variables 2. There are multiple inputs (multiple explanatory variables) --> ie- Q = F(K,L) K = capital; L = labor; Q = output--> K and L are BOTH explanatory variables! 3. Precision-more factors we can control for, more confident we can be that we estimated effect of displacement correctly

Problem Set 2: Part 2) Suppose a dataset had N=4 individuals, with values of Xi and Yi- Xi: 10; 12; 14; 16 Yi: 18; 15; 10; 1 7) Estimate the model ln(Yi) = β0 + β1*Xi + ui. 8) Describe the relationship between a change in X and the change in Y (The answer involves percentages.) --> Suppose we estimate b1 = -0.45. Which statement would accurately describe the relationship between changes in x and y? A. When x increases by 1, y decreases by 45%. B. When x increases by 100%, y decreases by 0.45%. C. When x increases by 100%, y decreases by 45. D. When x increases by 1, y decreases by 0.45. E. When x increases by 1, y decreases by 0.45%. 9) Estimate the model ln(Yi) = b0 + b1*ln(Xi) + ui. 10) Describe the relationship between a change in X and the change in Y. (Again, the answer involves percentages.) --> Suppose that we estimated ß1 = 1.2. Which statement would accurately describe the relationship between changes in x and y? A. When x increases by 100%, y increases by 1.2. B. When x increases by 1, y increases by 1.2%. C. When x increases by 100%, y increases by 120%. D. When x increases by 1%, y increases by 12%. E. When x increases by 1, y increases by 1.2. F. When x increases by 1, y increases by 120%. 11) Can the R2 values of the two regressions (one with x as an explanatory variable, the other with ln(x)) be compared, to determine which model fits the data better? A. No, since one uses x as the explanatory variable and the other uses ln(x). B. Yes, since the outcome variable is the same, ln(y), and the sample is the same. C. No, since the models are not linear. D. No, since the sample has fewer than 30 observations. E. Yes, since the sample is the same and the number of observations is the same.

Part 2) 7) Since we're finding betas for ln(y) = b0+b1*x+ui, need to derive new values for X! --> new y = ln(y); x stays the same! Original: --> Xi: 10; 12; 14; 16 --> Yi: 18; 15; 10; 1--> plug into ln(y) = new y! New: --> X2: 10; 12; 14; 16 --> Yi: 2.89; 2.708; 2.303; 0 Now apply B1 and B0 formulas: --> b1 = m = [(N*sum of indv. x*y)-((sum of x)(sum of y))]/[(N*sum of indv. x^2)-((sum of just x)^2)] --> b0 = b = [(sum of y)-b1(sum of x)]/N i. Sum of x = 52 ii. Sum of y = 7.901 iii. Sum of indv. xy = (10*2.89)+(12*2.708)+(14*2.303)+(16*0) = 93.638 iv. Sum of indv. x^2 = (10^2)+(12^2)+(14^2)+(16^2) = 696 v. b1 = [(4*93.638)-(52*7.901)]/[(4*696)-(52^2)] = *b1 or m = -0.4538* vi. b0 = [7.901-(-0.4538*52)]/4 = *b0 or b = 7.875* vii. *ln(y) = 7.875-0.4538x* _________________________________________________________________________________________________ 8) Relationship between x and y for ln(Y)=B0 + B1*(X)--> Remember that: i. Y=B0 + B1*ln(X) + u = A 1% change in X is associated with a change in Y of 0.01*B1 ii. ln(Y)=B0 + B1*X + u = A change in X by one unit (∆X=1) is associated with a (expected(B1))*100 % change in Y --> So if B1 = -0.45--> 1 unit change in x = [(B1)*100]% change in y = If x increases by 1 unit, y changes by [(-0.45)*100]% change in y = 45% decrease in y --> *A. If x increases by 1, y decreases by 45%.* iii. ln(Y)=B0 + B1*ln(X) + u = A 1% change in X is associated with a B1% change in Y, so B1 is the elasticity of Y with respect to X. _________________________________________________________________________________________________ 9) Since we're finding betas for ln(y) = b0+b1*(ln(x))+ui, need to derive new X and Y values! Original: --> Xi: 10; 12; 14; 16--> plug this into ln(x) = new Y! --> Yi: 18; 15; 10; 1--> plug this into ln(y) = new X! New: --> X2: 2.303; 2.485; 2.639; 2.773 --> Y2: 2.89; 2.708; 2.303; 0 Now apply B0 and B1 formulas: --> b1 = m = [(N*sum of indv. xy)-((sum of x)(sum of y))]/[(N*sum of indv. x^2)-((sum of just x)^2)] --> b0 = b = [(sum of y)-b1(sum of x)]/N i. Sum of x = 10.2 ii. Sum of y = 7.901 iii. Sum of indv. xy = (2.303*2.89)+(2.485*2.708)+(2.639*2.303)+(2.773*0) = 19.463 iv. Sum of indv. x^2 = (2.303^2)+(2.485^2)+(2.639^2)+(2.773^2) = 26.133 v. b1 = [(4*19.463)-(10.2*7.901)]/[(4*26.133)-(10.2^2)] = *b1 or m = -5.565* vi. b0 = [7.901-(-5.565*10.2)]/4 = *b0 or b = 16.166* vii. *ln(y) = 16.166-5.565*ln(x)* _________________________________________________________________________________________________ 10) Relationship between x and y for ln(Y)=B0 + B1*ln(X)--> Remember that: i. Y=B0 + B1*ln(X) + u = A 1% change in X is associated with a change in Y of 0.01*B1 ii. ln(Y)=B0 + B1*X + u = A change in X by one unit (∆X=1) is associated with a (exp(B1))*100 % change in Y iii. ln(Y)=B0 + B1*ln(X) + u = A 1% change in X is associated with a B1% change in Y, so B1 is the elasticity of Y with respect to X. --> So if B1 = 1.2--> 1% change in x = b1% change in y = If x increases by 1%, y increases by 1.2% *C. If x increases by 100%, y increases by 120%. (multiplied by 100) _________________________________________________________________________________________________ 11) B. Yes, since the outcome variable is the same, ln(y), and the sample is the same.

1) What are the 3 properties of logarithms? 2) What are the 2 properties of natural logarithms? 3) What is the logarithm definition of elasticity? 4) Transform the Cobb Douglas function (y=A*(x1^n1)*(x2^n2)) in terms of ln(y). 5) What is the PED of ln(x) = ln(w)-ln(p)? 6) In general, what can logarithms be used for(6)?

1) i. log(xy) = log(x)+log(y) ii. log(x^n) = n*log(x) iii. log(x/y) = log(x)-log(y) 2) i. Change in ln(x) = % change in x ii. ln(1+r) = r, when r is small 3) Elasticity = change in ln(y)/change in ln(x) 4) y=A*(x1^n1)*(x2^n2)--> apply properties of logarithms i. ln(y) = ln[A*(x1^n1)*(x2^n2)] ii. *ln(y) = ln(A)+n1(ln(x1))+n2(ln(x2))* 5) PED of ln(x) would be -1! --> -1 = the coefficient of ln(P), and coefficient of Price (P) is ALWAYS = to PED :-) 6) Uses of logarithms: i. Rescale graphs/variables according to different orders of magnitude ii. % changes iii. Elasticities iv. Transform nonlinear functions--> linear functions v. Simply optimization functions (no need for ECON 400) vi. Transform right-skewed variables into symmetric distributions

1) What are the rules for linear transformations on the following? i. Mean of (a+bx) ii. Variance of (a+bx) iii. Covariance of (a+b(x,y)) iv. Correlation of (a+b(x,y)) 2) What are the rules for linear combinations on the following? i. Mean of (x+y) ii. Var(x+y) iii. Cov(x+y,z) iv. E(xy) 3) How do you calculate the simple deviation?

1) Linear transformations: i. Mean of (a+bx) = a+b(mean of x) --> So if y=a+bx, mean of function y = a+b(mean of x) ii. Variance of (a+bx) = (b^2)(Var(x)) --> So if y=a+bx, variance of function y = (b^2)(Var(x)) iii. Covariance of (a+b(x,y)) = (b)(cov(x,y)) --> So if y=a+b(x,y), covariance of function y = (b)(cov(x,y)) iv. Correlation of (a+b(x,y)) = corr(x,y) --> So if y=(a+b(x,y)), correlation of function y = corr(x,y) 2) Linear combinations: i. Mean of (x+y) = E(x)+E(y) --> E(x) = expected value/mean of x = sum of all values of x*probability ii. Var(x+y) = var(x)+var(y)+(2*cov(x,y)) iii. Cov(x+y,z) = cov(x,z)+cov(y,z) --> Distribute x across (y,z) iv. E(xy) = E(x)*E(y) --> When x and y are independent variables 3) Simple deviation = x - mean of x --> Raw top of SD equation!

Problem Set 1: The following questions are related to logarithms: 14. If a number is doubled, its logarithm: A. is multiplied by e^0.5 B. is multiplied by e^2 C. increases by +2 D. increases by +log(2) E. is multiplied by 2 15. If Q = A⋅P^b is the equation of a demand curve, write the relationship between ln(Q) and ln(P). The relationship between ln(Q) and ln(P) is: A. ln(Q) = ln(A) + b*ln(P) B. ln(Q) = ln(A) + ln(b) * ln(P) C. ln(Q) = A + b * ln(P) D. ln(Q) = ln(A) * ln(P^b) 16. If y = Ab^x , how is ln(y) related to x? The relationship between ln(y) and x is: A. ln(y) = (ln(A) * ln(b))x B. ln(y) = ln(A) + x * ln(b) C. ln(y) = x * (ln(A) + ln(b)) D. ln(y) = (ln(A) + ln(b))x 17. The relationship is: A. ln(y) = A * (a * ln(x1) + (1-a) * ln(x2)) B. ln(y) = ln(A) + ln(a) * ln(x1) + ln(1-a) * ln(x2) C. ln(y) = ln(A) * a * ln(x1) * (1-a) * ln(x2) D. ln(y) = ln(A) + a * ln(x1) + (1-a) * ln(x2) 18. The relationship between y and x is: A. y = ln(a) * xb B. y = e(a+xb) C. y = ea + b*ex D. y = ea * xb 19. When P0 = $10 , demand is X0 = 130 . When P1 = $15 , demand is X1 = 80. Calculate the elasticity of demand using the midpoint method. Compare it to the answer from using logarithms. 20. In this problem, apply the rule that ln(1+δ ) = δ-->when δ is small. Create a table with three columns: i. In one column, list values of x from 1.00, 1.01, 1.02, ...1.20. ii. In the second column, evaluate ln(x) using a calculator. In the third column, show the difference between the actual value of ln(x) and the approximation that ln(1+δ ) = δ.

14. Remember properties of logs: i. Logb(x*y) = log(x) + log(y) ii. Logb(x^a) = a*log(x) iii. Logb(x/y) = log(x)-log(y) --> Apply number being doubled to actual number! --> ln(2) = 0.693--> ln(2*2) = 1.386 --> ln(15) = 1.176--> ln(30) = 1.477 --> *D. increases by +log(2)* --> ln(2*2) = ln(2)+ln(2) 15. Demand curve: Q = A⋅P^b--> multiply both sides by ln --> ln(Q) = ln(A*P^b) Apply rule i., log(x*y)=log(x)+log(y): --> ln(Q) = ln(A)+ln(P^b) Apply rule ii., log(x^a) = a*log(x) --> *A. ln(Q) = ln(A)+b*ln(P)* 16. y = Ab^x--> ln(y) and x; multiply both sides by ln --> ln(y) = ln(Ab^x) Apply rule i., log(x*y)=log(x)+log(y) --> ln(y) = ln(A)+ln(b^x) Apply rule ii., log(x^a) = a*log(x) --> *B. ln(y) = ln(A) + x*ln(b)* 17. y = A * (x1^a)(x2^(1-a))--> write in terms of natural logarithms ln(y) = ln(A*(x1^a)(x2^(1-a))) Apply rule i., log(x*y)=log(x)+log(y) --> ln(y) = ln(A) + ln(x1^a) + ln(x2^(1-a)) Apply rule ii., log(x^a) = a*log(x) --> *D. ln(y) = ln(A) + a*ln(x1) + (1-a)ln(x2)* 18. ln(y) = a+b*ln(x)--> write in terms of y and x Multiply both sides by ln to get rid of ln! --> ln(ln(y)) = ln(a + b*ln(x)) Apply rule i., log(x*y)=log(x)+log(y): --> y = ln(a)*x^b--> ln = e --> { D. y = e^a*x^b } 19. P0 = 10; X0 = 130 P1 = 15; X1 = 80 --> *PED = change in ln(Qd)/change in ln(P)* PED using midpoint method vs. PED using ln method! --> PED = % change Q/% change price i. Midpoint method = ([(new x - old x)/(1/2(x1+x2))]/[(new y - old y)/1/2(y1+y2)])*100 --> x(Q): (80-130)/(1/2(80+130)) = -0.476 --> y(P): (15-10)/(1/2(15+10)) = 0.4 Multiply answers by 100 --> -47.6/40 = *PED = -1.19* ii. ln method = [ln(x2)-ln(x1)]/[ln(y2)-ln(y1)]--> y = price --> x(Q): ln(80)-ln(130) = *-0.486* --> y(P): ln(15)-ln(10) = *0.405* --> -0.486/0.405 = *PED = -1.2* 20. approximating rule: ln(1+δ ) = δ , when δ is small; compare approximation vs. reality as x gets bigger! x = 1; 1.01; 1.02; 1.03; 1.04; 1.05; 1.06; 1.07; 1.08; 1.09; 1.1; 1.11; 1.12; 1.13; 1.14; 1.15; 1.16; 1.17; 1.18; 1.19; 1.2 ln(x) = 0; 0.01; 0.02; 0.03; 0.039; 0.049; 0.058; 0.068; 0.077; 0.086; 0.095; 0.104; 0.113; 0.122; 0.131; 0.14; 0.148; 0.157; 0.166; 0.174; 0.182 approximation using rule = 0; 0.01; 0.02; 0.03; 0.04; 0.05; 0.06; 0.07; 0.08; 0.09; 0.1; 0.11; 0.12; 0.13; 0.14; 0.15; 0.16; 0.17; 0.18; 0.19; 0.2 ln(x) - approximation = 0; 0; 0; 0; 0.001; 0.001; 0.002; 0.002; 0.003; 0.004; 0.005; 0.006; 0.007; 0.008; 0.009; 0.01; 0.012; 0.013; 0.014; 0.016; 0.018

1) What is a logarithm? Solve the following: i. log3^81 ii. log2^8 iii. log2^1.41 iv. log2^0.25 2) What are the properties of the logarithm? 3) What are the applications of logarithms?

1) In theory: how many times do you multiply x by itself to get n? logx^k = n --> x^n = k i.- log3^81 = 4 --> 3^4 = 81 ii.- log2^8 = 3 --> 2^3 = 8 iii.- log2^1.41 = 1/2 iv. - log2^0.25 = -2 2) Same base properties (common base = e = ~2.718 = natural logarithm = ln(x)): i. Logb(x*y) = log(x) + log(y) ii. Logb(x^a) = a*log(x) iii. Logb(x/y) = log(x)-log(y) 3) Applications: i. Find % difference: ie- 2 values of x--> x1, x2 % difference between those values =~ ln(x2)-ln(x1) --> x1 = 95 --> x2 = 105 % difference can also be found via... -Midpoint method = (change in x)/(1/2(x1+x2)) = 10% change -Elasticity (% change in Q)/(% change in P) = [(new Q-old Q)/old Q]*[old P/(new P-old P)] ii. Approximating delta: ln(1+delta) = ~= delta (if delta is close to 0; typically within 0.2 of 0) --> ie-ln(1.05) =~ 1.05 --> delta = 0.05 --> exact answer = 0.0488 iii. Simplifying Cobb Douglas demand function: Q = alpha * (w/p); w = wealth ln(Q) = ln(alpha*(w/p)) = ln(alpha) + ln(w/p) = *ln(alpha)+ln(w)=ln(P) = ln(Q)* iv. Simplifying maximizing equations: max f(x) = max ln(f(x)) ie- u(Q1, Q2) = (Q1^alpha)(Q2^(1-alpha)) --> apply logs to it: --> ln[(Q1^alpha)(Q2^(1-alpha))]=ln(Q1^alpha)+ln(Q2^(1-alpha)) = alpha(lnQ1)+(1-alpha)(lnQ2) --> max alpha ln(Q1)+(1-alpha)ln(Q2)-lambda(w-Q1P1-Q2P2) --> Derivative of Q1 = (alpha)*(1/Q1) + lambda(P1) --> Derivative of Q2 = (1-alpha)(1/Q2) + lambda(P2) v. Taking variables that are very right skewed and transform them into symmetry --> For what purpose? to make inequality less obvious? doesn't even mention the purpose of this lol

1) What is marginal effect AKA SLOPE? 2) Find marginal effect aka slope for the following: i. y = a+bx ii. y = a+bx+c(x^2) iii. ln(y) = a+bx 4. How do you calculate % change?

1) Marginal effect AKA SLOPE = change in y/change in x --> Also = derivative! ---- 2) Generally, to find slope = take derivative of the function i. Linear function = y = a+bx --> Slope = b; approximately = to dy/dx (derivative) ii. y = a+bx+c(x^2) --> Slope AKA change in y/change in x = varies--> generally = b+2cx --> dy/dx of y = b+(dy/dx*cx^2)--> *slope = b+2cx* ie- y = x^2 x = 5, what is the slope of this function? i. dy/dx of x^2 = 2x ii. 2(5) = 10, which change in y/change in x 3. ln(y) = a+bx Slope: when x changes by 1, ln(y) increases by b --> dy/dx of (a+bx)--> *b = slope* --> Interpret effect of x on y for logs = PERCENTAGE CHANGE! --> y = e^(a+bx) --> dy/dx = by --> change in y/change in y = by--> divide both sides by y --> (change in y/y)/change in x = b ie- Sally's ln(earning) = 11.512 Sam's ln(earnings) = 11.736 Difference in ln9earnings) = 0.224--> 22.4% = slope = marginal effect! 4. % change in z = change in z/z 5. MPG = function of weight --> MPG=39.44-0.006*weight R^2 = 0.6515 VS. MPG = function of displacement --> MPG = 30.06-0.0044*displacement R^2 = 0.4969 --> Weight = better for explaining MPG (higher R^2!) i. R^2 value = measures goodness of fit to actual values (the higher the better—so model that has higher r^2 value = better ii. p-value = how significant relationship is --> p<0.05 = significant (can reject nullifying hypothesis!)

Exam 1 2016: 1) A dataset contains N = 5 observations of X: 3.4, 5.3, 1.6, 7.8, 1.9. Calculate the variance in X. 2) What is the relationship between the mean, median, and skewness of a distribution? 3) Inflation was high in the late 1970s. In 1976, '77, '78, and '79, inflation rates were 6.50%, 7.59%, 11.35%, and 13.50%. --> Calculate the average annual inflation rate over this 4-year period (using a technique that is similar to calculating an average interest rate). 4) In a sample of observations, mean of X = 133.92 , and Sx = 4.28 . What is the standardized value of Xi = 140 ? 5) Salesperson receives a salary S with a fixed component plus a commission based on sales x: S = 1000 + 0.20*x. For sales, mean of x = 3429; var(x) = 9912. What are mean and variance in s?

1) Var(x) = [Sum of all (x- mean of x)^2]/N-1 Mean of x = (3.4+5.3+1.6+7.8+1.9)/5 = 4 Var(x) = [((3.4-4)^2)+((5.3-4)^2)+((1.6-4)^2)+((7.8-4)^2)+((1.9-4)^2)]/4 = 6.665 2) Mean is pulled by outliers! (direction of skew) --> Right skew = mean pulled by outliers to the right = mean > median --> Left skew = mean pulled by outliers to the left = mean < median 3) Average inflation rate = geometric mean = [(1+r1)(1+r2)...(1+rn)]^(1/n) - 1 (1.065*1.0759*1.1135*1.135)^(1/4) - 1 = 0.09698 = 9.7% avg annual inflation rate! 4) Standardized value = z score = (x-mean of x)/SD (or S, which = SD of sample) i. Plug into formula: z score = (140-133.92)/4.28 ii. Solved: *z score = 1.421* 5) Mean of x = 3429; Var(x) = 9912 S = 1000+0.20x --> Remember properties of linear transformations: i. Mean of (a+bx) = a+b(mean of x) ii. Var of (a+bx) = (b^2)(var(x)) iii. Cov of (a+b(x,y)) = b(cov(x,y)) iv. Corr of (a+b(x,y)) = corr(x,y) To find mean of S = 1000+0.20x: i. Recall that mean of a+bx = a+b(mean of x) ii. Plug [mean of x] into x: Mean of S = 1000+0.20(Mean of x) --> Mean of S = 1000+0.20(3429) iii. Solved: *Mean of S = 1685.8* To find var of S = 1000+0.20x: i. Recall that Var of a+bx = (b^2)(var(x)) ii. Plug in: var(x) = 9912; b^2 = (0.20^2) iii. Solved: (0.20^2)(9912) = *Var of S = 396.48*

What is the relationship between x and y in the following circumstances? 1) Y=B0 + B1*ln(X) + u 2) ln(Y)=B0 + B1*ln(X) + u 3) ln(Y)=B0 + B1*X + u 4) Y=B0 + B1*(X) + u 5) y=A*B^X

1) Y=B0 + B1 x ln(X) + u: x=+1%; y=+(0.01*B1) --> ln(x) shrinks the change, so it's (0.01*B1) --. 1% change in X is associated with a change in Y of 0.01*B1 --> Thus, B1 = units increase in Y, when X increases by 100% 2) ln(Y)=B0 + B1*ln(X) + u: x=+1%; y=+B1% --> ln's on both sides balance out --> 1% change in X is associated with a B1% change in Y, so B1 is the elasticity of Y with respect to X. --> Thus, B1 -~ elasticity of Y with respect to X (B1 = $ change Y/% change X) 3) ln(Y)=B0 + B1 x X + u: x=+1; y=+(B1*100)% --> ln(y) magnifies change, so it's (B1*100)% --> change in X by one unit (∆X=1) is associated with a (B1*100) % change in Y --> Thus, B1 =~ % change in y, when X increases by 1 4) Y=B0 + B1*(X) + u: x=+1; y=+B1 --> change in X by one unit (∆X=1) is associated with a B1 change in Y. 5) y=A*B^X: x=+1; y=+10% --> When X increases by 1, Y increases by 10%

1. What are the different organizations of data? 2. What are the common measures of central tendency? What do they express? 3. i.What is left skew? ii. What is right skew? iii. What is symmetric distribution?

1. i. Cross Sectional- many observations observed once ii. Time Series- single observation over time iii. Panel data- many observations over time 2. i. Mean, median, and mode = average, middle, most common values ii. They express a "typical" value 3. i. Left skew = data skewed AWAY from the left ____/\ (median > mean) ii. Right skew = data skewed AWAY from the right /\____ (median < mean) --> More frequency of people with lower values than higher values! --> ie- property values, income/wealth, etc. --> more poor people and tiny number of rich people who accumulated wealth through theft lol iii. Symmetric distribution = data is symmetric __/\__ (median = mean) --> Less frequency of people with higher values than higher values! --> ie- distribution of scores on tests

*Quiz 1:* 1. In statistics, the word "skewed" means: A. Extreme B. Unbalanced C. Biased D. Obscured 2. A dummy variable takes only values of -1, 0, and +1; indicating whether a condition is false, neutral, or true. (T/F) 2. In the past three years, the annual interest rate on my savings account has been R1, R2, R3. (These values are all expressed as decimals.) Which formula can be used to calculate the average interest rate, R? A. R = ((1+R1)(1+R2)(1+R3))⅓ - 1 B. R = 1 + ((1+R1) + (1+R2) + (1+R3))⅓ C. R = 1 + ((1+R1)(1+R2)(1+R3))½ D. R = √((1+R1)(1+R2)(1+R3)) E. R = 1 - √((1+R1) + (1+R2) + (1+R3)) / 3 3. Your sample contains 3 observations: 10, 12, 14. Which value is closest to the variance in the sample? A. 1.00 B. 1.15 C. 1.33 D. 1.41 E. 2.00 F. 2.33 G. 4.00 4. If a variable is "skewed to the right", then: A. The mean is greater than the median. B. The mean is the same as the median. C. The mean is less than the median. D. The mean is a hyperreal number, and the median is a surreal number. 5. According to the empirical rule, the fraction of observations within one, two, and three standard deviations of the mean are often approximately: A. 0, ⅔, 95% B. ½, ⅔, 99% C. 0, ¾, 88.9% D. ½, ⅔, 95% E. 33.3%, 66.6%, 99.9% F. ½, 95%, 99% G. ½, ⅔, ¾ H. ⅔, 95%, 99.9% 6. In all samples, at least what fraction of observations are within 1, 2, and 3 standard deviations of the mean? A. 0, 1/3, 2/3 B. 1/2, 3/4, 7/8 C. 1/3, 2/3, 3/3 D. 0, 3/4, 8/9 7. In a sample, the variable X has a mean of 18 and a standard deviation of 5.9. What is the "standardized value" associated with 11.7? 8. If a statistician knows the "standardized value" or "z-score" of a particular observation, the statistician knows roughly how common or uncommon the observation is — without knowing the particular details of the distribution, like the mean or variance. (T/F) 9. What are two problems with using the range as a measure of dispersion? (multiple choice) A. The range cannot be calculated in a sample with an odd number of observations. B. The range is always less than the variance. C. The range cannot be calculated when the sample is skewed. D. The range depends on the variance in observations. E. The range is determined by unusual observations. F. The range is expected to change with the sample size.

1. B. Unbalanced--> unbalanced = not evenly distributed 2. False, dummy variables take on values of 0 or 1. 3. A. R = ((1+R1)(1+R2)(1+R3))⅓ - 1--> formula for growth rates (geometric mean) 4. G. 4.00--> variance = [sum of all (X-mean)^2]/N-1 --> Mean = (10+12+14)/3 = 12 --> [((10-12)^2)+((12-12)^2)+((14-12)^2)]/2 = 4 5. A. The mean is greater than the median. (mean is pulled by outliers--> in right skew's case, towards right = it's greater than median) 6. Empirical rule = 2/3-95-99.9% rule 7. Chebyshev (at least) =At least 1-(1/k^2) of sample within k SDs! determined by # of SDs --> At least 1-(1/1^2) of sample within 1 SD--> at least 0 of sample within 1 SD --> At least 1-(1/2^2) of sample within 2 SDs--> at least 3/4 of sample within 2 SDs --> At least 1-(1/3^2) of sample within 3 SDs--> at least 8/9 of sample within 3 SDs D. 0, 3/4, 8/9 8. Mean = 18; SD = 5.9 z score AKA standardized value AKA how many SDs away from mean = (x-mean)/SD --> z score of 11.7 = (11.7-18)/5.9 = *z score = -1.0677* 9. This would be *true*. The z score = raw data on normal distribution, so you wouldn't need to know about the distribution/mean/variance of the real data itself, if you have z score/raw data. Z score measures how many SDs away from mean the observation is(mean is always = 0)--> can use empirical rule to guess how common the data is (within 1 SD on either side of mean, 2/3 of sample falls in; within 2 SDs on either side of mean, 95% of sample falls in; within 3 SDs on either side of 99.9% of sample falls in) --> REMEMBER THAT Z SCORE DATA ALL ON NORMAL DISTRIBUTION, WITH Z SCORE/SDs = X AND FREQUENCY = Y, MEAN = 0! 10. (MC) E. The range is determined by unusual observations. F. The range is expected to change with the sample size.

1. What is the union of A and B? 2. What are comprehensive events? 3. Is it possible for events to be both comprehensive and mutually exclusive? 4. What is the probability measure? 5. What are the 3 assumptions of the probability measure? 6. How do you calculate P(AuB) 7. What does P(A|B) mean? 8. What is the probability table?

1. Everything in A OR B (AuB) 2. Unions that cover all possible outcomes (no other event is possible) 3. Yes, it is possible. --> ie- A or A'--> cannot both happen, so if A happens no other event is possible = covers all possible outcomes! 4. Probability measure- function that states frequency of each event 5. Assumptions of the Probability Measure: i. 0 < P(A) < 1 ii. P(Space, or ALL events) = 1 = 100% iii. P(A) = 1-P(A) --> Complement rule; A = A' 6. P(Aub) = P(A)+P(B) - P(AnB) 7. P(A|B) = probability of A is contingent on what happens to B P(A|B) = P(AnB)/P(B) --> B = remaining possibilities P(A | B)*P(B) = P(AnB) = P(B | A)*P(A) 8. Probability table that can help with solving questions: A A' B P(AnB) P(A'nB) P(B) B' P(AnB') P(A'nB') P(B) A A'

1. Tell me about regression model! yay... 2. What do we use regressions for? 3. What are the requirements of the regression model?

1. Least squares regression line = relationship between x and y *y = beta0+beta1(x)+unobserved variables* *AKA y = b0+b1(x)* AKA y = b+mx lmaooo i. y = observed response with change in xi; this is the dependent/response variable ii. x = predictor value for y; this is the independent/explanatory variable iii. beta0 AKA b0 = y intercept iv. beta1 AKA b1 = slope --> b1 = cov(x,y)/var(x) (x CANNOT = 0) *Coming up with best prediction of this line:* --> *beta hat 1 aka b0 = cov(x,y)/var(x)* = correlaton(x,y)*(SDy)/(SDx)<--STATA --> *beta hat 0 AKA b0 = mean of y - (beta hat 1*mean of x)* rxy = Sy/Sx -Population regression line = accurate, but typically not possible -Estimated regression line = sample -Coefficient of determination (r^2) -Correlation coefficient (r) -Coefficients AKA coef. = betas -Cons AKA constant = beta0 --> ie-effects of education on workers' earnings --> Earnings = beta0+beta1 * # years schooling + unobserved factors (LIKE RACE GENDER SEXUALITY LMAOOOOOO) Regression model --> Earnings = -32822 + 532.9 * # years schooling --> Earnings go up by $532.9 with every year of schooling that increases....lmao. Current population survey (CPS) --> Collected by Bureau of Labor Stats (BLS) --- 2. Regressions = useful for... i. Predictions ii. Effects of change in y on change in x--> (change in y/change in x) --- 3. *Requirements for regression model to be effective:* i. Uses the correct functional form/specification (should not be using functions like parabola when we're looking for linear functions) ii. Necessary for beta hats: a) var(x) CANNOT = 0; need SOME variation in explanatory variable b) NO duplicated variables; cannot have 2 variables in regression function that measure the same thing --> Cannot have temperature in F that measures temperature in C; they are the same variable and don't exist in vaccuums...crazy how neolibs can't understand how this also applies to ideas and politics lmao iii. Cov(x,u) = 0; corr(x,u) = 0 --> u = unobserved factors --> Cannot have unobserved factors that are correlated with explanatory(x) variables --> When this is satisfied beta hats = unbiased estimates of effect of x on y --> ie- earnings = beta0-beta1*schooling + "ability" --> ability = used by labor economics to describe innate ability, not taking into account poverty or material circumstances lmao; as if individual is separated from society and world

Problem Set 1: For each of the following numbers, calculate the: mean; median; standard deviation; geometric mean; harmonic mean. 1. N=10; X=1;1;2;3;5;8;13;21;34;55 2. N=5; Y=10;13;14;15;18 3. Standardize the values from set (2); find the mean, SD of the standardized set

1. Mean = (1+1+2+3+5+8+13+21+34+55)/10 = 14.3 Median = 6.5 SD = {[((1-14.3)^2)+((1-14.3)^2)+((2-14.3)^2)+((3-14.3)^2)+((5-14.3)^2)+((8-14.3)^2)+((13-14.3)^2)+((21-14.3)^2)+((34-14.3)^2)+((55-14.3)^2)]/(10-1)}^1/2 = 17.8 Geometric mean = (1*1*2*3*5*8*13*21*34*55)^1/10 = 6.4 --> *Only use it this way when values are NOT rates = sum of all values)^1/n* Harmonic mean = [(1/1 + 1/1 + 1/2 + 1/3 + 1/5 + 1/8 + 1/13 + 1/21 + 1/34 + 1/55)/10]^-1 = 3 2. Mean = (10+13+14+15+18)/5 = 14 Median = 14 SD = {[((10-14)^2)+((13-14)^2)+((14-14)^2)+((15-14)^2)+((18-14)^2)]/(5-1)}^1/2 = 2.9 Geometric mean = (10*13*14*15*18)^1/5 = 13.7 Harmonic mean = [(1/10 + 1/13 + 1/14 + 1/15 + 1/18)/5]^-1 = 13.5 3. Standardization = z = (x-mean)/SD Y = 10;13;14;15;18; mean = 14; SD = 2.9 10-14/2.9 = -1.4 13-14/2.9 = -0.3 14-14/2.9 = 0 15-14/2.9 = 0.3 18-14/2.9 = 1.4 Mean of standardized set = (-1.4-0.3+0+0.3+1.4)/5 = 0 SD of standardized set = {[((-1.4-0)^2)+((-0.3-0)^2)+((0-0)^2)+((0.3-0)^2)+((1.4-0)^2)+/(5-1)}^1/2 = 0.998 ~= 1

1. Another term for regression? 2. How do you measure how effective the regression model is at predicting outcome? 3. If examining MPG in regression model, what are potential explanatory(X) variables of it?

1. OLS = Ordinary Least Squares --> Trying to make the residuals (predicted y (with hat) - actual y) as SMALL as possible --> ON GRAPH: predicted y = on curve; actual y = varies in distance from predicted OLS curve --> regression equation: y = B0+B1+u 2. How well model predicts outcome = r^2 3. Explanatory = x --> Weight; engine size; foreign vs. domestic manufacturing

Roulette Wheel: P(event): 9/38; 9/38; 9/38; 9/38; 9/38 Outcomes: 9 red+even; 9 red+odd; 9 black+even; 9 black+odd; 2 house numbers What is the probability of the union P(AuB)? 1. P(red u black) 2. P(red or odd) 3. P(26 wins | odd loses) 3. P(red | odd)

1. P(red u black) = P(red)+P(black) = 18/38 + 18/38 = 36/38 2. P(red or odd) = P(red u odd) = P(red)+P(odd) = 18/38 + 9/38 = 27/38 --> Odd = 9 red+odd, 9 black+odd--> Do NOT double count for red! so only 9 unique odd to count for this case! 3. If odd loses, there are 20 possible outcomes that win (38-18 odd outcomes = 20) = bet on 1 in remaining 20 winning outcomes/20 possible winning outcomes --> P(26 wins|odd loses) --> P(26 wins n odd loses)/P(odd loses) = 4. P(red | odd) = P(red n odd)/P(odd) --> P(red n odd) = 9/38 --> P(odd) = 18/38 --> (9/38)/(18/38) = 9/18 = 1/2

1. What is a population? 2. What is a parameter? 3. What is a sample? 4. What is a statistic?

1. Population = group of interest to researcher 2. Parameter = values from the population 3. Sample = subset of population that is available to researcher (what we look at as representative of the population) 4. Statistic = values from the sample

What are the 2 types of variables you can have in regression?

1. Quantitative variables 2. Dummy variables --> y = B0+B1*dummy+ui -B0 = avg. y for dummy=0 -B1 = (avg. y for dummy=0) - (avg. y for dummy=1)

EXAM 2 (NOT ON MIDTERM 1): Probability Theory! 1. What is S or space? 2. What are A, B? 3. What are the 2 types of events? 4. What is the complement of event? 5. What is the intersection of A and B? 6. What is n? What is u? 7. What is mutually exclusive?

1. S = space of possible outcomes --> The sample...lol 2. A, B = events --> Some set of possible outcomes within the space (S) --> Subset of the entire space of S 3. 2 types of events: i. Simple/basic outcomes- 1 possible outcome ii. Complex events- multiple possible outcomes 4. Everything outside the event --> If A is the event, complement = A', or A with some kind of subscript 5. Intersection of A and B = everything in BOTH A and B 6. n = multiply = AND u = addition = OR 7. Mutually exclusive - NO intersection --> Event A necessitates elimination of Event B

1. What is a dummy/binary/indicator variable? 2. What are the 2 types of data? 3. What is econometrics? 4. What are the 2 branches of statistics?

1. Takes values of 0 or 1 to indicate whether some condition is met 0 = no 1 = yes 2. i. Experimental = researcher manipulates the values to see the effect --> Hard to run large experiments ii. Observational = real world observations of how variables relate to each other (most of econ) 3. Econometrics = statistical analysis + economic theory 4. 2 branches: i. Descriptive stat (sample): presenting characteristics of a sample ii. Inferential stat (population: guessing properties of the population

1. What are the intent of measures of dispersion? 2. What are the most common measures of this?

1. These statistics describe whether values = clustered together or spread out 2. Most common measures = i. *Standard deviation(population)/variance(sample)*: --> Variance(u) = [(x1-mean)^2+(x2-mean)^2+(xn-mean)^2]/(N-1) --> SD(S or Sx) = sq. rt[variance] Advantages: i. Includes Chebyshev's inequality = within k SDs of mean, always have at least (1-1/(k^2)) of the sample within k SDs of mean (k being # SDs away from) --> Within 2 SDs of mean, always have at least (1-1/(2^2)) = *3/4* of sample to be within those 2 standard deviations. that's literally it "At least this much falls within..." ii. Empirical rule (2/3-95-99.9 rule) = --> Within 1 SD = approx. 2/3% falls in; 1/6 on each tail do NOT fall into the 1 SD --> Within 2 SDs = approx. 95% falls in; 2.5% on each tail do NOT fall into the 2 SDs --> Within 3 SDs = approx. 99.9% technically falls in; 0.15% on each end do NOT fall into 3 SDs "Approximately how much falls within..." iii. Standardizing values = measuring how many standard deviations you are from the mean (z score!) --> z = (x-mean)/standard deviation = standardized the values! --> ie = 95th percentile = 95% fall below this score; you're in top 5% ii. *Range*: --> Advantages: easy to use/understand --> Disadvantages: dependent on sample size (larger sample = larger range); determined by unusual values (the outliers); not very useful statistically iii. *Interquartile range (difference between 75th + 25th percentiles of distribution)*: --> Advantages: unaffected by sample size; determined by most common path of the population --> Disadvantages: less convenient/easy; not very useful statistically

Problem Set 1: Calculate the SD in x and y; covariance; correlation coefficient for the following: 10. Random sample of 7 (x,y) pairs of data points: (1,5)(3,7)(4,6)(5,8)(7,9)(3,6)(5,7) X 11. Random sample of 5 (x,y) pairs of data points: (12,200)(30,600)(15,270)(24,500)(14,210) 12. From the values of x and y in (11), calculate the value of z = x + y for each observation; then calculate z bar. Confirm that z bar = x bar + y bar.

10. First find mean of x and y in order to plug into SD formula! -Mean of x = (1+3+4+5+7+3+5)/7 = 4 -Mean of y = (5+7+6+8+9+6+7)/7 = 6.857 --> SD of x = (([(1-4)^2]+[(3-4)^2]+[(4-4)^2]+[(5-4)^2]+[(7-4)^2]+[(3-4)^2]+[(5-4)^2])/6)^(1/2) = *SD of x = 1.915* --> SD of y = (([(5-6.857)^2]+[(7-6.857)^2]+[(6-6.857)^2]+[(8-6.857)^2]+[(9-6.857)^2]+[(6-6.857)^2]+[(7-6.857)^2])/6)^(1/2) = *SD of y = 1.345* --> Covariance = [sum of (x1-mean of x)(y1-mean of y)]/n-1 = measures strength/direction of relationship between x and y data sets --> = ([(1-4)(5-6.857)]+[(3-4)(7-6.857)]+[(4-4)(6-6.857)]+[(5-4)(8-6.857)]+[(7-4)(9-6.857)]+[(3-4)(6-6.857)]+[(5-4)(7-6.857)])/6 = *covariance = 2.333* --> Correlation coefficient = covariance/(SDx*SDy) = shows relationship between data sets (1=strong positive; -1=strong negative; 0=no relationship) = absolute value = relationship strength --> = 2.33/(1.915*1.345) = *correlation coefficient or r = 0.905 = strong positive relationship* 11. First find mean of x and y in order to plug into SD formula! -Mean of x = (12+30+15+24+14)/5 = 19 -Mean of y = (200+600+270+500+210)/5 = 356 --> SD of x = (([(12-19)^2]+[(30-19)^2]+[(15-19)^2]+[(24-19)^2]+[(14-19)^2])/4)^(1/2) = *SD of x = 7.681* --> SD of y = (([(200-356)^2]+[(600-356)^2]+[(270-356)^2]+[(500-356)^2]+[(210-356)^2])/4)^(1/2) = *SD of y = 182.565* --> Covariance = [sum of (x1-mean of x)(y1-mean of y)]/n-1 = measures strength/direction of relationship between x and y data sets --> = ([(12-19)(200-356)]+[(30-19)(600-356)]+[(15-19)(270-356)]+[(24-19)(500-356)]+[(14-19)(210-356)])/4 = *covariance = 1392.5* --> Correlation coefficient = covariance/(SDx*SDy) = shows relationship between data sets (1=strong positive; -1=strong negative; 0=no relationship) = absolute value = relationship strength --> = 1392.5/(7.681*182.565) = *correlation coefficient or r = 0.993 = strong positive relationship* 12. x = 12;30;15;24;14 y = 200;600;270;500;210 z = 212;630;285;524;224 z bar (mean) = (212+630+285+524+224)/5 = *z bar = 375* x bar (mean) = 19 y bar (mean) = 356 *375 = 19+356! Confirmed :)*

Problem Set 2: Part 3) Suppose that we want to study the demand function Q = a*P^b, where Q and P are prices, and a and b are unknown parameters. 12) Which of the following models is equivalent to Q = a * Pb and can be estimated by linear regression? A. ln(Q) = ln(a) + b * ln(P) B. ln(Q) = b * ln(a*Q) C. ln(Q) = b * (ln(a) + ln(P)) D. Q = ln(a) + b * P 13) What is the economic interpretation of b? A. Marginal utility B. Opportunity cost C. Discount factor D. Marginal cost E. Elasticity 14) According to standard demand theory, what can we predict about the sign or magnitude of b? A. Negative, but nothing more B. Positive and greater than 1 C. Positive and less than 1 D. Equal to -1 E. Between -1 and 0

12) A. ln(Q) = ln(a) + b * ln(P) --> ln(ab) = ln(a) + ln(b) --> ln(a^b) = b*ln(a) 13) E. Elasticity --> This is bc this is a DEMAND function--> slope = elasticity of consumer, which is always negative. wow it's coming back to me smh 14) A. Negative, but nothing more --> Since this is demand equation, slope AKA marginal utility has diminishing returns, so it is always negative.

Problem Set 1: 13. Over 7 year period, annual percentage returns on common stocks and US treasury bills were: Stocks: 4%; 14.3%; 19%; -14.7%; -26.5%; 37.2%; 23.8% T-bills: 6.5%; 4.4%; 3.8%; 6.9%; 8%; 5.8%; 5.1% a. Calculate each average rate of return as a geometric mean. b. Calculate the correlation in the percentages (not decimals!). The correlation uses the arithmetic mean always. c. Calculate the correlation in the gross rates of return; that is, (1+ (Q%/100)).

13. a. Geometric mean = [[(1+rate in decimal form)...(1+Rn)]^(1/n)]-1 --> Stocks(x): [[(1+0.04)(1+0.143)(1+0.19)(1-0.147)(1-0.265)(1+0.372)(1+0.238)]^(1/7)]-1 = *Avg rate of return aka geometric mean of stocks = 0.06027 = 6.03%* --> T-bills(y): [[(1+0.065)(1+0.044)(1+0.038)(1+0.069)(1+0.08)(1+0.058)(1+0.051)]^(1/7)]-1 = *Avg rate of return aka geometric mean of stocks = 0.0577 = 5.78%* b. Correlation coefficient = covariance/(SDx*SDy) i) Covariance = [sum of (x1-mean of x)(y1-mean of y)]/n-1 = measures strength/direction of relationship between x and y data sets--> in percentages, not decimals! --> = ([(4-6.03)(6.5-5.78)]+[(14.3-6.03)(4.4-5.78)]+[(19-6.03)(3.8-5.78)]+[(-14.7-6.03)(6.9-5.78)]+[(-26.5-6.03)(8-5.78)]+[(37.2-6.03)(5.8-5.78)]+[(23.8-6.03)(5.1-5.78)])/6 = *covariance = -24.242* ii) SDx = [sum of all ((x-mean)^2)/(N-1)]^1/2 --> = ([((4-6.03)^2)+((14.3-6.03)^2)+((19-6.03)^2)+((-14.7-6.03)^2)+((-26.5-6.03)^2)+((37.2-6.03)^2)+((23.8-6.03)^2)]/6)^(1/2) = *SDx = 22.42* iii) SDy = [sum of all ((y-mean)^2)/(N-1)]^1/2 --> = ([((6.5-5.78)^2)+((4.4-5.78)^2)+((3.8-5.78)^2)+((6.9-5.78)^2)+((8-5.78)^2)+((5.8-5.78)^2)+((5.1-5.78)^2)]/6)^(1/2) = *SDy = 1.471* --> Correlation coefficient aka r = -24.242/(22.42*1.471) = *r = -0.7392 = relatively strong negative relationship* ---- c. Correlation in gross rates of return = convert each decimal to gross rate using (1+(Q%/100)) --> Stocks in gross rate of return(x) = 1.04; 1.143; 1.19; 0.853; 0.735; 1.372; 1.238 --> T-bills in gross rate of return(y) = 1.065; 1.044; 1.038; 1.069; 1.08; 1.058; 1.051 Mean of x = (1.04+1.143+1.19+0.853+0.735+1.372+1.238)/7 = *mean of x = 1.082* Mean of y = (1.065+1.044+1.038+1.069+1.08+1.058+1.051)/7 = *mean of y = 1.058* --> Covariance = [sum of all (x-mean)(y-mean)]/N-1 --> = [((1.04-1.082)(1.065-1.058))+((1.143-1.082)(1.044-1.058))+((1.19-1.082)(1.038-1.058))+((0.853-1.082)(1.069-1.058))+((0.735-1.082)(1.08-1.058))+((1.372-1.082)(1.058-1.058))+((1.238-1.082)(1.051-1.058))]/6 = *covariance = -0.00243* --> SDx = ([sum of all ((x-mean)^2)]/N-1)^1/2 --> = ([((1.04-1.082)^2)+((1.143-1.082)^2)+((1.19-1.082)^2)+((0.853-1.082)^2)+((0.735-1.082)^2)+((1.372-1.082)^2)+((1.238-1.082)^2)]/6)^(1/2) = *SDx = 0.223* --> SDy = ([sum of all ((y-mean)^2)]/N-1)^1/2 --> = ([((1.065-1.058)^2)+((1.044-1.058)^2)+((1.038-1.058)^2)+((1.069-1.058)^2)+((1.08-1.058)^2)+((1.058-1.058)^2)+((1.051-1.058)^2)]/6)^(1/2) = *SDy = 0.0147* --> Correlation coefficient aka r = -0.00243/(0.223*0.0147) = *r = -0.741 = relatively strong negative relationship*

Exam 1 2016: 16) A researcher estimates the relationship of Y = B0+B1(X): https://imgur.com/gallery/Z4DOSVn a) Write out the prediction line, using the numbers in the table above. b) What fraction of the variation in Y can be explained by the model? c) According to these estimates, does X have a statistically significant effect on Y? d) Verify the calculation of the R^2 value, using other information in the output. (Showing your work is essential). 17) This question involves estimating a linear regression. N=3 x: 6; 3; 9 y: 5; 13; 9 a) Calculate B0 and B1. b) Calculate R^2 18) This question is about change of units in a regression. Suppose that we estimate the effect of temperature, measured in degrees Fahrenheit, on rainfall, measured in inches. We find Ri = 0.25 − 0.045⋅Tf . [Note: these values are not intended to be realistic.] a) If we switched to measuring temperature in degrees Celsius, using the formula Tc = 5/9 (Tf − 32), what would be our estimates of B's in the regression Ri = B0+B1(Tc)? b) In we instead switched to measuring rainfall in centimeters, using the conversion Rcm = Ri*(2.54), what would be our estimates of the B's in the regression Rcm = B0+B1(Tf)? c) Express correlation between the variables Rcm and Tc, in terms of the correlation between Ri and Tf . (You will have only an expression relating the correlation coefficients, not a numerical answer.) 19) This question uses properties of logarithms. The table below gives values of the natural logarithms for several numbers. The first five parts are worth 2 points apiece. The last two are worth 4 points each. ln(2) = 0.69; ln(3) = 1.10; ln(5) = 1.61; ln(7) = 1.95 a) ln(10) b) ln(12) c) ln(1) d) ln(81) e) ln(378) g) ln(3.15) h) Using logarithms, approximate the percentage difference between 3 and 5. Showing your work is essential.

16) a) Find y = B0+B1(x): i. B1 = intersection of Coef. and x = -0.01488 ii. B0 = intersection of _cons and Coef. = 5.895 iii. Prediction line: *y = 5.895 - 0.01488x* b) Fraction of the variation in y that can be explained/DETERMINED by model = coefficient of determination = r^2 --> *r^2 = 0.0002* c) Statistical significance = p-value i. P-value = intersection between P>| t | and x = *0.109* ii. Analysis: *Since p-value is 0.109 which is < than 0.05, we fail to reject the null hypothesis (hypothesis that relationship between x and y = due to chance/error)—so no, x does NOT have statistically significant effect on y* d) Verify r^2 calculation: recall that r^2 = [cov(x,y)/(SDx*SDy)] = SSmodel/SStotal i. SSmodel = intersection between SS and Model = 261.116 --> SSmodel = MSS = sum of squares ii. SStotal = intersection between SS and Total = 1197344.47 --> SStotal = TSS = total sum of squares iii. Solve: *261.116/1197344.47 = 0.0002 = proven!* 17) a) Recall that B1 = cov(x,y)/var(x); B0 = mean of y-mean of x(B1) cov(x,y) = [sum of all (xn-mean of x)(yn-mean of y)]/N-1 var(x) = [sum of all (xn-mean of x)^2]/N-1 Finding B1 = cov(x,y)/var(x) i. Find mean of x: (6+3+9)/3 = 6 ii. Find mean of y: (5+13+9)/3 = 9 iii. Find cov(x,y): [(6-6)(5-9)+(3-6)(13-9)+(9-6)(9-9)]/2 = -6 iv. Find var(x): [(6-6)^2+(3-6)^2+(9-6)^2]/2 = 9 v. Solve: cov(x,y)/var(x) = -6/9 = *B1 = -0.6* Finding B0 = mean of y-mean of x(B1) i. Solve: 9-6(-0.6) = *B0 = 12.6* b) R^2 = [cov(x,y)/(SDx*SDy)]^2 i. Cov(x,y) from (a) = -6 ii. SDx = (var(x))^0.5 = (9)^0.5 = 3 iii. SDy = [sum of all (yn-mean)^2/N-1]^0.5 --> [[(5-9)^2+(13-9)^2+(9-9)^2]/2]^0.5 = 4 iv. Solve: [cov(x,y)/(SDx*SDy)]^2 = [-6/(3*4)]^2 = *r^2 = 0.25* 18) *OG equation: Ri = 0.25-0.045(Tf)* a) Convert to celcius: Tc = 5/9(Tf-32)--> find Ri = B0+B1(Tc)! i. Find Tf in terms of Ri: Tf = (-Ri+0.25)/0.045 ii. Plug Tf into Tc: Tc = 5/9[((-Ri+0.25)/0.045)-32]--> 5/9 = 0.556 iii. Simplify: Tc = [(0.556*(-Ri+0.25))/0.045]-(0.556*32) --> Tc = [(-0.556Ri+0.139)/0.045]-17.792 iv. Consolidate terms, simplify: Tc + 17.792 = -12.356Ri+3.089 v. Isolate Ri on one side: Ri = (Tc+14.703)/-12.356 vi. Solved: Ri = -0.081Tc-1.19 --> *B0 = -1.19; B1 = -0.081* b) Convert to cm: Rcm = Ri*2.54--> find Rcm = B0+B1(Tf)! i. Plug Ri into Rcm: Rcm = (0.25-0.045Tf)*2.54 ii. Solved: Rcm = 0.635-0.1143Tf --> *B0 = 0.635; B1 = -0.1143* c) Relationship between corr(Rcm,Tc) and corr(Ri,Tf) = ? Thinking about this critically, there's no difference between the correlations of rain and temperature in different units--> we talked about how unit changes does NOT change statistical relationships, as unit changes are cosmetic and do not alter the raw data in any way! Thus, *corr(Rcm,Tc) = corr(Ri,Tf) 19) Given: ln(2) = 0.69; ln(3) = 1.10; ln(5) = 1.61; ln(7) = 1.95 --> Applying 3 age old properties of logarithms! a) ln(10) = ln(5*2) = ln(5)+ln(2) = 1.61+0.69 = 2.3 b) ln(12) = ln(3*2^2) = ln(3)+(ln(2)*2) = 1.1+(0.69*2) = 2.48 c) ln(1) = 0 d) ln(81) = ln(3^4) = ln(3)*4 = 1.1*4 = 4.4 e) ln(378) = ln(7*54) = ln(7*6*9) = ln(7*3*2*3^2) = ln(7)+ln(3)+ln(2)+(ln(3)*2) = 1.95+1.1+0.69+(1.1*2) = 5.94 g) ln(3.15) = ln(315/100) = ln(63/20) = ln((7*3^2)/(5*2^2)) = [ln(7)+(ln(3)*2)]-[ln(5)+(ln(2)*2)] = (1.95+(1.1*2))-(1.61+(0.69*2)) = 1.16 h) Using logarithms, approximate the percentage difference between 3 and 5. Showing your work is essential. --> APPROXIMATE-so use the given logarithms to approximate! i. % difference of x using logarithms = change in ln(x) ii. Solve: ln(5)-ln(3) = 1.61-1.1 = *0.51 = ~51% difference between 3 and 5*

Problem Set 1: 7. A sample of data has a mean of 115 and a variance of 25. a. Use Chebychev's theorem to determine the minimum proportion of observations between 100 and 130. b. Use the empirical rule to find the approximate proportion of observations between 110 and 125. *both are normally distributed!* 8. The mean of a sample = 350, and its standard deviation = 20. Approximately what proportion of observations is in the interval between: a. 290 and 410? b. 310 and 390? 9. The mean of a sample is 650, and its variance is 625. Approximately what proportion of the observations is: a. greater than 625? b. less than 650? c. less than 700? d. between 625 and 700?

7) a. Chebyshev(AT LEAST) = at least 1-(1/(k^2)) of the sample will fall within k SDs of mean (mean +/- k SDs) = at least 75% of the sample will fall within 2 SD's of the mean --> k = # of standard deviations from mean i. SD = 5; mean = 115 ii. at least 75% of sample will fall within (115-(2*5)) and (115+(2*5))--> at least 75% of sample will fall within 105-125 iii. 100 and 130 are both 3 SDs from the mean! iv. So at least 1-(1/3^2) = 1-(1/9) = *at least 88.89% of sample will fall within 100-130* -- b. Empirical rule(APPROX) AKA 68-95-99.7 rule = --> 3 SDs = approx 99.9% of sample will fall within --> 2 SDs = approx 95% of sample fall within --> 1 SD = approx 66.67% of sample will fall within i. SD = 5; mean = 115 ii. 110 = 1 SD from 115; 125 = 2 SDs from 115 iii. 1 SD = 66.67%; 2 SDs = 95% and the area is split symmetrically by mean! --> Find area of 115+ side of bell curve that's within 1 SD = 66.67/2 = 33.34% --> Find area of 115+ side of bell curve that's within 2 SDs = 95/2 = 47.5% iv. 1 side of this distribution is within 1 SD = 33.34%; other side of this distribution is within 2 SDs = 47.5%--> add together to find total area that falls within 110-125--> 33.34%+47.5% = *approx 80.84% of data falls within 110-125!* 8) Mean = 350; SD = 20--> approx = empirical rule! a. 290-410 --> z score aka how many SDs away from mean = (x-mean)/SD = (290-350)/20 = 3 SDs below 350 = (410-350)/20 = 3 SDs above 350 --> *Approx 99.9% of data falls within 3 SDs of mean* b. 310-390 --> z score aka how many SDs away from mean = (x-mean)/SD = (310-350)/20 = 2 SDs below 350 = (390-350)/20 = 2 SDs above 350 --> *Approx 95% of data falls within 2 SDs of mean* 9. Mean = 650; variance = 625--> approx = empirical rule! --> SD = 25 a. greater than 625 (left of mean)--> find z score aka how many SDs away from mean value is = (x-mean)/SD --> Since this is left of mean, add area from left of mean + all area to right of mean (50%) --> (625-650)/25 = 1 SD away from mean; approx (66.67/2) = 33.34% within 1 SD to the left of mean --> 33.34% + 50% (all that is greater than mean itself) = *83.34% of data = greater than 625* b. less than 650, just asking for area of sample to left of mean = 50% c. less than 700 (right of mean)--> find z score aka how many SDs away from mean value is = (x-mean)/SD --> Since this is right of mean, add area from right of mean + all area to left of mean (50%) --> (700-650)/25 = 2 SDs away from mean; approx (95/2) = 47.5% within 2 SDs to the right of mean --> 47.5% + 50% (all that is less/to left of mean) = *97.5% of data = less than 700* d. between 625-700--> find z scores aka how many SDs away from mean values are = (x-mean)/SD --> (625-650)/25 = 1 SD less than mean = (66.67%/2) = approx 33.34% of data fall within this section --> (700-650)/25 = 2 SDs greater than mean = (95%/2) = approx 47.5% of data fall within this section --> Find sum to get proportion = 33.34% + 47.5% = *80.84% of data = between 625-700*

Problem Set 2: Part 1) Suppose a dataset has N=5 individuals, with values of Xi and Yi: Xi: 12; 8; 10; 3; 12 Yi: 12; 3.6; 9.6; 3.7; 6.5 1) Calculate bo and b1 2) Calculate y for each individual 3) Calculate u for each individual 4) Calculate the R^2 value 5) Suppose that, instead of using variable X, we used V = 100X as the explanatory variable. How does it change the estimates of bo and b1? (Answer this question using properties of regression coefficients, not by recalculating the b) 6) Suppose that, instead of using the variable X, we used W=X+10 as the explanatory variable. How does it change the estimates of bo and b1? (Again, answer using properties of regression coefficients)

Part 1) Remember that least sq regression line--> y=bo+b1(x) --> b1 = cov(x,y)/var(x) (x CANNOT = 0) REMEMBER: y=mx+b --> y=b0+b1(x) _________________________________________________________________________________________________ 1) Xi: 12; 8; 10; 3; 12 Yi: 12; 3.6; 9.6; 3.7; 6.5 --> b1 = [(N*(sum of all indv. x*y))-(sum of x*sum of y)]/[(N*sum of all indv. x^2)-((sum of x)^2)] --> b0 = [(sum of y)-m(sum of x)]/N i. Sum of x = 45 ii. Sum of y = 35.4 iii. Sum of all indv. (x^2) = (12^2)+(8^2)+(10^2)+(3^2)+(12^2) = 461 iv. Sum of all indv. (xy) = (12*12)+(8*3.6)+(10*9.6)+(3*3.7)+(12*6.5) = 357.9 v. b1 = [(5*(357.9))-(45*35.4)]/[(5*461)-((45)^2)] = *b1 or m = 0.702* vi. b0 = [(35.4)-0.702(45)]/5 = *b0 or b = 0.762* vii. *y = 0.762 + 0.702x* _________________________________________________________________________________________________ 2) Our least sq. regression line for this data set--> y = 0.762 + 0.702x--> plug (Xi) into equation to find y :-) Xi: 12; 8; 10; 3; 12 Yi: 12; 3.6; 9.6; 3.7; 6.5 y(predicted): 9.186; 6.378; 7.782; 2.868; 9.186 _________________________________________________________________________________________________ 3) u = unobservables aka error terms aka standard error of regression --> how observed data differs from population (y-predicted y) --> Residual = how observed data differs from sample In this case we're looking for the residual! --> u aka residuals = yo - yp = original value - predicted value --> original value = given data set; predicted value = value predicted from regression line Xi: 12; 8; 10; 3; 12 Yi: 12; 3.6; 9.6; 3.7; 6.5 y(predicted): 9.186; 6.378; 7.782; 2.868; 9.186 *yo-yp = Yi-y(or u): 2.814; -2.778; 1.818; 0.832; -2.686* _________________________________________________________________________________________________ 4) y = 0.762 + 0.702x R^2 value = [cov(xy)/(SDx*SDy)]^2 i. Cov = (sum of all(x-meanx)(y-meany))/n-1 --> Meanx = (12+8+10+3+12)/5 = 9 --> Meany = (12+3.6+9.6+3.7+6.5)/5 = 7.08 --> Cov = ([(12-9)(12-7.08)]+[(8-9)(3.6-7.08)]+[(10-9)(9.6-7.08)]+[(3-9)(3.7-7.08)]+[(12-9)(6.5-7.08)])/4 = *cov = 9.825* ii. SDx = ((sum of all (x-mean)^2)/n-1)^0.5 --> SDx = (([(12-9)^2]+[(8-9)^2]+[(10-9)^2]+[(3-9)^2]+[(12-9)^2])/4)^0.5 = *SDx = 3.742* iii. SDy = ((sum of all (y-mean)^2/n-1)^0.5 --> SDy = (([(12-7.08)^2]+[(3.6-7.08)^2]+[(9.6-7.08)^2]+[(3.7-7.08)^2]+[(6.5-7.08)^2])/4)^0.5 = *SDy = 3.689* iv. r value = 9.825/(3.742*3.689) = r = 0.712; moderately strong positive correlation v. *r^2 value = 0.5069* _________________________________________________________________________________________________ 5) I think your steps are clear and your calculations are correct. However, this question ask you to use properties of regression coefficients instead of recalculating b. My answer for this question would be 'the constant term remains the same but the slope coefficient will be divided by 100'. Try to open a separate Excel file and plot the 5 data points before and after you multiply X by 100. Note the new b1_hat and b0_hat are just approximations. Due to the effect of the error term, you might encounter a slightly different constant term and slightly different b1_hat/100, compared with your results. Xi--> V=100X; new b0 and b1? V: 1200; 800; 1000; 300; 1200 Yi: 12; 3.6; 9.6; 3.7; 6.5 OG Regression Line: *y = 0.762 + 0.702x* --> B1 = divided by 100 = 0.00702 --> B0 = the same = 0.762 *Remember that when x*100, B1/100!* *Double check:* --> b1 = [(N*(sum of all indv. x*y))-(sum of x*sum of y)]/[(N*sum of all indv. x^2)-((sum of x)^2)] --> b0 = [(sum of y)-m(sum of x)]/N i. Sum of V = 4500 ii. Sum of y = 35.4 iii. Sum of all indv. (V^2) = (1200^2)+(800^2)+(1000^2)+(300^2)+(1200^2) = 4610000 v. Sum of all indv. (Vy) = (1200*12)+(800*3.6)+(1000*9.6)+(300*3.7)+(1200*6.5) = 35790 v. b1 = [(5*(35790))-(4500*35.4)]/[(5*4610000)-((4500)^2)] = *b1 or m = 0.007018* vi. b0 = [(35.4)-0.00702(4500)]/5 = *b0 or b = 0.7638* vii. *y = 0.7638 + 0.007018x* _________________________________________________________________________________________________ 6) Similar to the question above, my intuitive answer would be 'The slope term remains the same. The constant term is reduced by 10*b1_hat'. Intuitively, the regression line is shifted to the right. Xi--> W=X+10; new b0 and b1? W: 22; 18; 20; 13; 22 Yi: 12; 3.6; 9.6; 3.7; 6.5 OG regression line: *y = 0.762 + 0.702x* --> B1 = the same --> B0 = reduced by 10*B1 = -6.258 *Remember that when x+10, B0-(B1*10)* *Double check:* --> b1 = [(N*(sum of all indv. x*y))-(sum of x*sum of y)]/[(N*sum of all indv. x^2)-((sum of x)^2)] --> b0 = [(sum of y)-m(sum of x)]/N i. Sum of W = 95 ii. Sum of y = 35.4 iii. Sum of all indv. W^2 = (22^2)+(18^2)+(20^2)+(13^2)+(22^2) = 1861 iv. Sum of all indv. Wy = (22*12)+(18*3.6)+(20*9.6)+(13*3.7)+(22*6.5) = 711.9 v. b1 = [(5*(711.9))-(95*35.4)]/[(5*1861)-((95)^2)] = *b1 or m = 0.7018* vi. b0 = [(35.4)-0.7018(95)]/5 = *b0 or b = -6.254* vii. *y = -6.254+0.7018x*

What are the 3 types of means within classical mathematics?

i. *Arithmetic mean* = sum of all observations/# of observations --> Doesn't always work well --> ie-to beach = 75 mph (150 mi distance; 2 hours) -from beach = 25 mph (150 mi distance; 8 hours) --> Total distance = 300 miles in 8 hours--> 300/8 = *actual average speed is 37.5 mph* --> this is actually using the harmonic mean! ii. *Geometric mean* = (product of all observations)^(1/# of observations) --> For growth rates (R=decimal): [(1+R1)(1+R2)...(1+Rn)]^(1/n)]-1 -->) [$100(1+R1)(1+R2)(1+Rn)] = $100(1+R bar)(1+R bar) --> Rn = decimal form --> R bar = average interest rate --> *1+R bar = Sq. Root N[(1+R1)(1+R2)...(1+Rn)]* --> N = # of observations --> ie- I'm saving for college. I put $100 into mutual fund with variable rate fo return. I gain 20% in year 1; lose 20% in year 2. What is the average rate of return? --> Year 1 = +20% (100 + (0.2x100) = now have $120 in account) --> Year 2 = -20% (120 - (0.2x120) = now have $96 in account) --> [$100(1+0.2)(1-0.2)] = $100(1+rate)(1+rate)--> 0.96 = (!+R bar)^2 iii. *Harmonic mean* = sum of all reciprocals of values/# of observations, take reciprocal of that --> [(1/X1 + 1/X2 + ...1/Xn)/N]^(-1) --> N = # observations --> X = each value within N --> Not uncommon; used for averaging ratios (ie- fuel efficiency, CPI for inflation, etc.) --> ie-to beach = 75 mph (150 mi distance; 2 hours) -from beach = 25 mph (150 mi distance; 8 hours) --> Harmonic mean = [(1/75 + 1/25)/2]^(-1) = *37.5 mph* --> ie- Car A = 20 miles per gallon; Car B = 40 miles per gallon Average miles/gallon = ? --> I drive each car for 80 miles--> 160 miles total for 6 gallons total --> 160 gallons on 6 gallons--> *avg miles/gallon = 26.69 mpg* OR using harmonic mean: --> [(1/20 + 1/40)/2]^(-1) = 26.69 mpg!

Econometrics Unit 1 Study Guide

Ensembles d'études connexes

Chapter 13: Viruses, Viroids, and Prions

Ace Fitness Exam

The Interesting Narrative of Olaudah Equiano

Art Appreciation 1-8, 11 & 13 Fill in the blank

ASVAB Practice

Chapter 55: Saunders Musculoskeletal review - AHN p. 1095

Bailey Sociology: All Quizzes for Final

Psych quiz

HEIT B16 Quiz 8 Review

ECO 202 - EXAM 2

health insurance questions

EOC Review Game- Paris & Daneris

Unit 3 - Multi-Step Equations and Inequaliaties

Chapter 6 quiz

CCNP SWITCH

Business Statistics Chapter 4

MIS 309 Final Exam

Environmental Science Chapter 21 Review

Quiz 4 - Blood Vessels

A&P Ch. 5 KAS