ECON 334 R code

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

What is Pr(T>−1)?

1-pt(-1,df=20)

Attenuation bias

A form of bias in which the estimated coefficient is closer to zero than it should be If x is measured with error, someone says 13 years when it's not actually 13

A variable formula

Intercept parameter meanY - b*meanX

Does yi=β0+β1x^2i+ui violate the linearity assumption?

No it does not. Let x∗i = x^2i, then we have the model yi=β0+β1x∗i+ui, which looks just like our simple linear regression model.

SSR DF & SST DF

SSR DF = n - k -1 SST DF = n - 1

B variable formula

Slope parameter cov(data$Y, data$X) / var(data$X)

The variance of a estimator is the same as what...

The SST of the estimator

Sampling error

The difference between the sample and population mean.

SST of x1 and addition of variables

Unaffected by adding more variables

Variance for a linear regression model formula

Var(𝛽̂ 1)= 𝜎2/𝑆𝑆𝑇𝑥 𝜎̂^2 = SSR/DF

Without using the cor function in R, What is the correlation of CEO salary and stock return?

covsalret = sum((data$salary-avesal) * (data$return-avereturn) )/(n-1) varret = sum((data$return-avereturn)^2)/(n-1) corsalret = covsalret/(sqrt(varsal*varret))

How to make a single line plot?

### Income = 500 + 200*education -5*education^2 for education =0:20 educ = 0:20 income = 500 + 200*educ - 5*educ ^2 plot(educ ,income ,type='l' , main=" Relationship Between Income and Education " ,xlab=" Education " ,ylab="Income" ,xlim=c(0 ,20) ,ylim=c(0 ,2500))

Percentage chance formula

( (5.6-6.4)/6.4)*100

F-Stat

( (R2un - R2r) / q ) / (1-R2un) / (n-k-1) q = number of tests k = number of variables un = unrestricted equation r = restricted equation Will always be positive F-stat is a function of the SSR One tailed

Range for F-distribution and Chi-squared distribution

(0,+∞)

P-Value for F-test

(1-pf( F, q, n - k - 1) ) F = f stat pf = ? q = number of tests

We would like to test a hypothesis about the relationship between β1 and β3. To test this hypothesis we re-formulate the hypothesis using a convenience parameter θ, where we can test the hypothesis with the null H0 : θ = 0. If the original null hypothesis is H0 : β1 = β3, what is the re-formulated model that would allow us to test this hypothesis by testing θ? (a) y = β0 + θx1 + β2x2 + β3(x3 + x1) + β4x4 + u (b) y = β0 + θx1 + β2x2 + β3(x3 − (1/5)x1) + β4x4 + u (c) y = β0 + θx1 + β2x2 + β3(x3 + (5)x1) + β4x4 + u (d) y = β0 + θx1 + β2x2 + β3(x3 − x1) + β4x4 + u (e) None of these are correct

(a) y = β0 + θx1 + β2x2 + β3(x3 + x1) + β4x4 + u

Range for normal and t distribution.

(−∞,+∞)

Which of the following can cause the usual OLS t statistics to be invalid (that is, not to have t distributions when doing hypothesis testing)? Explain. Heteroskedasticity. A sample correlation coefficient of .95 between two independent variables that are in the model. Omitting an important explanatory variable.

1 and 3 generally cause the t statistic not to have a t-distribution under Homoskedasticity is one of the LRM assumptions and was used in deriving the variance of the estimator. As we will see later we will need to correct the estimator of the variance when we have heteroskedasticity. Omitting an important violates the assumption of mean independence and causes our estimator to be biased. With a severely biased estimator the mean of the sampling distribution will not be what we think it is and the usual t distribution will be invalid. The LRM assumptions contain no mention of the sample correlations among independent variables, except to rule out the case where the correlation is one.

All of the Gauss-Markov assumptions

1. Linear in parameters (Squared coefficient is no no, or multiplied, but variable like "educ" can be squared) 2. Random Sampling (This occurs if each individual is equally as likely to be picked when taking the sample, All of our data points come from the same population process) 3. Sampling variability in x (Cannot estimate the causal effect of pot smoking on wages if nobody in the data smoked pot) 4. Mean independence (E(u | x) = E(u) or 0 5: Homoskedasticity in errors (Errors or variance stays linear, doesn't fan out towards the end BECAUSE of X) Var(ui | xi) = var(ui) var(ui) = sigma^2(A CONSTANT) Non-autocorrelation. Cannot predict X2 100% with X1. 6. u ∼ N(0, σ2 ) Kinda the same as central limit theorem or large n. Note a linear function of normal RVs is also a normal RV:

Given the information above. How many percentage points has the unemployment rate increased? 6.4 to 7.5%

1.1 PP

use the difference in natural logs to find the approximate percentage difference

100*(log(42000)-log(35000))

Suppose the unemployment rate in the United States goes from 6.4% in one year to 7.5% in the next. By what percent has the unemployment rate increased?

17.2%

X Z = 2x What is var(z)?

2^2 var(x)

𝑚𝑎𝑡ℎ10ˆ=32.14−0.319𝑙𝑛𝑐ℎ𝑝𝑟𝑔 What is the interpretation of the coefficient on𝑙𝑛𝑐ℎ𝑝𝑟𝑔?

A 10% increase in eligibility for students for the lunch program, we have a 3.2% decrease in math scores.

What is the constant elasticity model?

A log-log model

Covariance

A measure of linear association between two variables. Positive values indicate a positive relationship; negative values indicate a negative relationship (ui - umean)*(vi -vmean) + ... total/n-1

When the degrees of freedom for a t-distribution equals ∞∞ its probability density function is exactly equal to another distribution. What is the name of that distribution?

A normal distribution with mean 0 and variance 1

in a log-log model ln(rd)=−4.105+1.076ln(sales) Explain what a 1% increase in sales does to rd

A one percent increase in sales is estimated to increase rd by about 1.08%.

Multicollinearity

A situation in which several independent variables are highly correlated with each other. This characteristic can result in difficulty in estimating separate or independent regression coefficients for the correlated variables. While two variables might explain lots of the R2 if they are highly correlated least squares will have trouble estimating their true parameters and they will likely be statistically independent together while they wouldn't be on their own. THEY WILL HAVE HUGE SE

confounding factors

A third variable that can affect the relationship between the dependent and independent variable Might not be known what this variable is exactly, also might be many many variables

If you assume Ui comes from a normal distribution then B1^...

ALSO COMES FROM A NORMAL distribution bruh

How can it be that the R-squared is smaller when the variable 𝑎𝑔𝑒 is added to the equation?

An important fact about 𝑅2 is that it never decreases, and it usually increases, when another independent variable is added to a regression and the same set of observations is used for both regressions. However, if two regressions use different sets of observations, we generally cannot tell how the R-squareds will compare. It could go either way. Here, in the two equations, the sets of observations are different (i.e., 𝑛=142 for the first equation and 𝑛=99 for the second equation). Therefore, R-squared is smaller when the variable age is added to the second equation.

the variance of a t-distribution does depend on the degrees of freedom through the formula DF/(DF - 2).

As the degrees of freedom increases, the variance of the t-distribution decreases.

consistent estimator

As you increase the size of a random sample, the values of the estimator get closer and closer to the parameter values estimator can be consistent while being bias

Should PC and noPC both be included as independent variables in the model?

Bruh no it makes no sense, we cannot hold noPC fixed while changing PC. We have only two groups based on PC ownership so, in addition to the overall intercept, we need only to include one dummy variable. If we try to include both along with an intercept, we have perfect multicollinearity (The dummy variable trap)

Why are the standard errors smaller in the second regression than the first regression?

COULD HAVE A BIGGER SAMPLE SO SST IN THE DENOMINATOR IS BIGGER so even if sigma^2 goes up it doesn't matter that much

TO be unbiased

Close to the true population parameter

What else is R2 called?

Coefficient of determination R2 is the variance of Y

AS K increases...

DF decreases

estimated variance/covariance matrix

Diagonals are estimated variance, Off-diagonals are estimated covariance SE is SQRT of variance diagonal.

Suppose some of my Bˆ1, B^2s.... are zero, for example suppose Bˆ1 was zero. Explain why dropping x1, and all of the other regressors that have zero coefficients from the regression will decrease the variance of BˆK, only through this R2 component, being sure to argue why doing so would have no effect on the other parts of the variance formula.

Dropping regressors that have zero coefficients will lower R2 which will, in turn, decrease the variance of BˆK. Additionally, regressors that do not help to explain y, if are added into (u), var(u) will not change. The SST part of the variance formula doesn't change when we add or remove variables.

Dummy (indicator) variables

Dummy (indicator) variable trap - These variables are collinear, must omit one "base" group, other violates collinearity - Effects are relative to the omitted group

In a study relating college grade point average to time spent in various activities, students are asked how many hours they spend each week in four activities: studying, sleeping, working, and leisure. Any activity is put into one of the four categories, so that for each student, the sum of hours in the four activities must be 168. To study the determinants of time use of GPA, the following model is estimated GPA = β0 + β1leisure + β2study + β3work + u Which of the following statements about this model are correct (mark all that are correct): (a) β0 is the predicted value of sleep time when all other variables are 0 (b) β3 is the change in GPA for each additional hour of work (c) The variables leisure, study, work are perfectly collinear (d) β3 is the effect work has on GPA when controlling for leisure and study but not sleep (e) None of these statements are correct

Zero conditional mean of errors

E( u | x ) = 0 cov( ui, xi) = 0

mean independence

E(U | A) does not equal E(U). In this situation we can use our known variable to predict the other variables in our error term, U.

Consider the following simple regression model: y = β0 + β1x + u, where E(u) = 0, but E(u|x) 6= E(u). Suppose z is an instrument for x. Which of the following conditions denotes the exogeneity assumption for z?

E(u|z) = E(u)

Endogenity

Error ui and regressor roofi are determined at the same time Problem distinguishing cause and effect endogenous

Residuals are called

Errors

At what value of experimenter does additional experience actually lower predicted ln(wage)?

Exper / [2(exper^2)]

Why is Pr(F<−1)=0?

F distribution only takes positive numbers

Are $educ$ and $age$ jointly significant in the original equation at the 5\% level? Justify your answer.

F-stat = ((R2un-R2r)/q) / ((1-R2un) / (n-k-1)) FCV = qf(1-.05, q, n-k-1) pval = 1-pf(F, q, n-k-1) R2r = summary(modD)$r.squared R2u = summary(modC)$r.squared dendf = modC$df.residual numdf = 2 Fstat = ((R2u-R2r)/numdf)/((1-R2r)/dendf) Fstat pval = 1-pf(Fstat,numdf,dendf)

The sample average is the best predictor of a random variable only if the random variable you are predicting is normally distributed?

FALSE

Consider the two models Model A: y = β0 + β1x1 + uA Model B: y = β0 + β1x1 + β2x2 + uB Let σ 2 A = Var(u A ) and σ 2 B = Var(u B ). If β1 = 0 and β2 = 0, then there is not enough information to determine if σ 2 A < σ2 B or if σ 2 A > σ2 B

FALSE, var A would have a larger variance. In model B the addition of B2 will bring down sigma^2 and make the variance lower Both = 0 so they will not effect R2

How do you focus only on females?

Females will either equal 1 or 0.... mean(data$EXAMPLE[data$female==0])

Does it seem likely that a firm's decision to train its workers will be independent of worker characteristics? What are some of those measurable and unmeasurable worker characteristics?

Firms are likely to choose job training depending on the characteristics of workers. Some observed characteristics are years of schooling, years in the workforce, and experience in a particular job. Firms might even discriminate based on age, gender, or race. Perhaps firms choose to offer training to more or less able workers, where `ability' might be difficult to quantify but where a manager has some idea about the relative abilities of different employees. Moreover, different kinds of workers might be attracted to firms that offer more job training on average, and this might not be evident to employers.

Comments on 2SLS

Get standard errors from regression package - Can't use reported S.E. from the second stage Standard errors are always larger - Variability in roof^ Can use multiple instruments. More the better Used anywhere mean independence is violated Finding a good instrument is difficult Need large (300+) sample size, biased in small samples

lm(formula = col ~ log(gpa8), data = data) E(Intercept) -0.1906 log(gpa8) 0.5718 Specifically, how do you interpret the coefficient estimate of 0.5718 on ln(gpa8)?

Given a 100% increase in GPA the probability of attending college increases 57.18 percentage points Given a 10% increase in GPA the probability of attending college increases 5.718 percentage points.

Would a negative correlation necessarily show that smaller class sizes cause better performance? Explain.

Given the potential for confounding factors, such as parents education, funding of the school, child's intelligence, a negative correlation would not be strong evidence that smaller class sizes actually lead to better performance.

Robust

Good in any situation

Testing for heterokasticity

H0: Data is Homoskedastic if you fail to reject it is Heteroskedastic, if we fail to reject it doesn't mean data is Homoskedastic, it means data is inconclusive

OLS estimators are normally distributed

IN THEORY B1^ comes from a normally distributed but in practice it comes from a t distribution

Suppose that you are asked to conduct a study to determine whether smaller class sizes lead to improved student performance of fourth graders How would you conduct your experiment?

Ideally, we could randomly assign students to classes of different sizes. That is, each student is assigned a different class size without regard to any student characteristics such as ability and family background. ceteris paribus We would want variation in students

What is the difference between a sample average and the expected value of a random variable?

If we think of the age of ECON339 students as a random variable, then this random variable has some unknown mean. The mean is the expected value or a measure of the central tendency of this random variable. If we took a random sample of ECON339 students we could compute the sample average, which is an estimate of the mean. The sample average will change with the collected sample, however the expected value is a fixed value. The law of large numbers says that as the sample size increases, the sample average gets closer and closer to the expected value of the random variable.

What is the difference between a sample average and the expected value of a random variable?

With OVB when do you overstate the bias?

Imagine you have math4 = intercept - class size + u There are probably lots of other variables in U that help to explain math4, we could also predict some of these variables using class size. One of these variables is funding. When we do not include funding in our regression, class size has to hold some of that effect and will maybe get a coefficient of -10, when if we included funding that coefficient would be closer to -5. In this situation, we are overestimating the effect of class size due to OVB

Bad control / over controlling is the same thing

In a regression on effect of alcohol on college GPA if you add a control for attendance you will not get accurate results because the effect of alcohol mainly on college GPA would probably be missing classes

If 𝑥1 is highly correlated with 𝑥2 and 𝑥3 in the sample, and 𝑥2 and 𝑥3 have large partial effects on 𝑦y, would you expect 𝛽̃ 1 and β^1 to be similar or very different? Explain.

In this scenario, 𝛽̃ 1β~1 and 𝛽̂ 1β^1 are likely to be very different. In the context of our example, if attendance (𝑥1x1) is correlated with GPA (𝑥2x2) and SAT (𝑥3x3) then this means that I can predict GPA and SAT knowing attendance. Furthermore, GPA and SAT have large partial effects on exam score. This implies GPA and SAT are "important" variables and omitting them will produce drastically different (biased) estimates than when they are included, i.e. E(𝛽̃ 1)=𝛽1+biasE(β~1)=β1+bias and E(𝛽̂ 1)=𝛽1E(β^1)=β1. In this example, the bias is likely positive, so 𝛽̃ 1>𝛽̂ 1β~1>β^1.

voteA=45+6ln(expendA)+u If the candidate increases expenditures by10%10%what will be the approximate change in the vote share?

Increase would be 6 x .1 = .6 a .6% increase in vote share

What is the problem with linear functions to describe demand curves?

Linear demand functions generally cannot describe demand over a wide range of prices and income Sometimes you get negative numbers

Gauss-Markov Theorem

Mathematical result stating that, under certain conditions, the OLS estimator is the best linear unbiased estimator of the regression coefficients conditional on the values of the regressors. HAS THE LOWEST VARIANCE

exogenous

Mean independence variable created

Definition of a valid instrument

Mean independent Relevance: derivative of roof / derivative of z does not = 0 - As roof changes it also changes z The instrument is something that should not be in the Att regression, but that effects roof -Correlation of att and z does not equal 0 - Correlation of u and z does equal 0 The instrument z is correlated with att ONLY through roof and not through u CANNOT PREDICT ROOF* or Ui with whatever other variable is in the regression. E( u | cap, weekend) = 0, E(Roof* | cap, weekend) = 0

With hetereoskedasticity OLS IS NO LONGER MVLUE, only LUE. There ARE MORE EFFICIENT ESTIMATORS

Might use GLS instead or other efficient estimators Not better though because they are unbiased they are only better because they have corrected standard errors

MVLUE

Minmum variance linear unbiased estimator (OLS)

w (normally distributed) z (normally distributed) Y = w^2 + z^2 is Y normally distributed?

No you dummy you squared w and z

If we estimated the first equation above, but did not control exper, and instead estimated a shorter equation log(wage)~I(hours/1000)+poorhlth+educ. Is there enough information above to say exactly what the coefficient on poorhlth would be in this shorter equation?

No, not enough info

If you find a positive correlation between output and training, would you have convincingly established that job training makes workers more productive? Explain

No, unless the amount of training is randomly assigned. The many factors listed in parts (a) and (b) can contribute to finding a positive correlation between output and training even if job training does not improve worker productivity.

Data is collected on home prices (price) and related to neighborhood characteristics and attributes of the housing structure. The variables in the model are lot size (lotsize), distance to employment center (dist), number of bedrooms (rooms), student teacher ratio at the local school (stratio), access to highways (radial). Consider the following model: log(price) = β0 + β1lotsize + β2dist + β3rooms + β4stratio + β5radial + u We would like to test the hypothesis that radial has no effect on log(price), i.e., H0 : β5 = 0, by estimating the restricted model log(price) = β0 + β1lotsize + β2dist + β3rooms + β4stratio + u Under what conditions would we fail to reject the null hypothesis: (a) the R-squared from the unrestricted model is sufficiently smaller than the R-squared from the restricted model (b) the R-squared from the restricted model is sufficiently smaller than the R-squared from the unrestricted model (c) the SSR from the restricted model is sufficiently larger than the SSR from the unrestricted model (d) the SSR from the unrestricted model is sufficiently smaller than the SSR from the restricted model (e) the R-squared from the restricted model is similar to the R-squared from the unrestricted model

None of the above

Suppose the variable x2 has been omitted from the following regression equation, y = β0+β1x1+β2x2+u. β˜ 1 is the estimator obtained when x2 is omitted from the equation. Under which conditions is there negative bias in β˜ 1 from omitting x2 (mark all correct answers) β2 = 0 and x1 and x2 are positively correlated (b) β2 > 0 and x1 and x2 are not correlated (c) β2 < 0 and x1 and x2 are not correlated (d) β2 = 0 and x1 and x2 are negatively correlated (e) None of these would result in a negative bias

None of them (e) B2 < 0 and x1 and x2 are positively correlated B2 > 0 and x1 and x2 are negatively correlated ONLY WAY

What will happen to the R^2 if noPC is used in place of PC?

Nothing happens to the R-squared. Using noPC in place of PC is simply a different way of including the same information on PC ownership

B(hat1) = B1 + 3X X is normally distributed Is my boy B(hat1) also normally distributed?

Obviously

Which of the following can cause OLS estimators to be biased? Explain.

Only bullet point (2), omitting an important variable, can cause bias, and this is true only when the omitted variable is correlated with the included explanatory variables. The homoskedasticity assumption played no role in showing that the OLS estimators are unbiased. Homoskedasticity was used to obtain the usual variance formulas for the 𝛽̂ 𝑘β^k. Further, the degree of collinearity between the explanatory variables in the sample, even if it is reflected in a correlation as high as .95, does not affect the linear regression model assumptions. Only if there is a perfect linear relationship among two or more explanatory variables is full rank violated.

Which of the following are consequences of heteroskedasticity? Explain. 1. The OLS estimators, are biased 2. The usual F statistic no longer has an F distribution 3. The OLS estimators are no longer BLUE

Parts (2) and (3). The homoskedasticity assumption played no role in showing that OLS is unbiased. But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid, even in large samples. As heteroskedasticity is a violation of the Gauss-Markov assumptions, OLS is no longer BLUE.

Let 𝑋X be a random variable distributed as Normal with mean 5 and variance 4. Find the probabilities of the following events: Pr(𝑋≤6)Pr(X≤6), Pr(𝑋>4)Pr(X>4), and Pr(|𝑋−5|>1)Pr(|X−5|>1). (hint: ditch that old z-table you got from your stats class and use 𝖱R. The command pnorm(6,mean=5,sd=2) gives the cumulative density function at 6 for a normal with mean 5 and standard deviation 2)

Pr(X<6) pnorm(6,mean=5,sd=2) [1] 0.6914625 1- Pr(X>4) pnorm(4,mean=5,sd=2) ## [1] 0.6914625 Pr(|X−5|>1) (1-pnorm(6,mean=5,sd=2))+pnorm(4,mean=5,sd=2)

When you put data and a hypothesis test together you get a

Random variable

FIND THE SE of the first regression using the SE of a variable in the short and long regression.

SE Bshort/ SE Blong

Find how much of the variance is explained by the long regression in the short regression

SIGMA^2 Bshort/Blong

SSR R formula

SSR = sum( (data$Y - (a + b*(data$X))^2) Tells us how far our y1,x1... yN,xN is from the regression line

SSR

SSR is the sum of squared residual, remaining variability in y after accounting for x The additional amount of explained variability in Y due to the regression model compared to the baseline model. (y-E(y))^2 or (yi - a - b*xi)^2

𝜎̂^2 formula

SSR/DF

What describes the % total variation in y not explained by x or that is not described by the regression line?

SSR/SST

Things to consider when defining the regression function

Should you just include interaction and squared terms of all variables? You are choosing things that explain y OvB Multicolinearity Including irrelevant variables Should you drop insignificant variables? Bad controls/ over controlling

Students that are high ability readers are more likely to go to college. If high ability readers are more likely to have college educated parents, explain how this fact makes it difficult to infer a causal effect of reading ability on college enrollment from the observed correlation of reading ability and college enrollment.

Since the family background of students with high reading ability is different than the family background of students with low reading ability we cannot consider these two groups all else equal, that is there may be forces other than differences in reading ability driving these different decisions. In particular, if students with high reading ability are more likely to have college educated parents, it may be that their parents have more financial resources to send their kids to college. Thus we cannot state for certain whether these students are more likely to go to college because of their reading ability or because their parents tend to have more resources.

If 𝑥1x1 is highly correlated with 𝑥2 and 𝑥3, and 𝑥2 and 𝑥3 have small partial effects on 𝑦y, would you expect SE(β~1) or SE(β^1) to be smaller? Explain.

Since 𝑥2 and 𝑥3 have small partial effects on 𝑦y, including them will not substantially reduce 𝜎2. However, it will increase 𝑅2𝑥1 in the denominator of the formula for SE(𝛽̂ 1)SE(β^1), so we would expect 1)SE(β^1)>SE(β~1).

Who derived the t-distribution?

Some dude who was working for the Guinness factory and was trying to do quality assurance and found it out but had to publish his name as students

B(hat)j = Bj SE = ?

Sqrt(var(Bj))

Root mean square deviation

Square root( (residuals)^2/N-2)

Consider the multiple regression model y = β0 + β1x1 + β2x2 + u With n observations, the sum of squared residuals associated with this model is given by: SSR = sym (yi − βˆ 0 − βˆ 1xi1 − βˆ 2xi2) 2 One of the criteria satisfied by the OLS estimator is that ∂SSR/∂βˆ 1 = 0

TRUE

The following model satisfies the linearity assumption of the linear regression model: y = exp(β0 + β1x + u)

TRUE

math10ˆ=−69.34 +11.16ln(expend) How big is the estimated spending effect? Namely, if spending increases by 10%, what is the estimated percentage point increase in math10?

The estimated percentage point increase in math10 is b1/10 or 11.16/10 = 1.116

Instead, suppose Cov(𝑃𝑆,𝑀𝑇)=29Cov(PS,MT)=29 and Cov(𝑃𝑆,𝐹𝑖𝑛𝑎𝑙)=−9Cov(PS,Final)=−9 and Cov(𝑀𝑇,𝐹𝑖𝑛𝑎𝑙)=53Cov(MT,Final)=53, but everything else is the same as above, what is E(𝐺𝑟𝑎𝑑𝑒)E(Grade)?

The expected value does not depend on the covariance so it is the same as the value 81.85 in part (a).

unbiased estimator

The mean of its sampling distribution is equal to the true value of the parameter being estimated E (Bhat) = Bp

In terms of parameters, state the null hypothesis that a 1% increase in A's expenditures is offset by a 1% increase in B's expenditures

The null B2 = -B1 or B1 + B2 = 0

p-value

The probability level which forms basis for deciding if results are statistically significant (not due to chance). p-values indicate the probability of observing your sample data, or more extreme, when you assume the null hypothesis is true.

How come the standard error for the coefficient on years in the multiple regression is lower than its counterpart in the simple regression?

The reduction in the variance occurs because of the substantial reduction in the residual standard error when we include the additional variable, without a substantial increase in the variance inflation factor.

What is the smallest number of murders that can be predicted by the equation? What is the residual for a county with zero executions and zero murders? Equation = murders=5.457+58.56execs

The smallest number of murders is 5.457. The residual for this county would be -5.457

What is the relationship between the variance of a random variable and the standard deviation of a random variable?

The standard deviation is the square-root of the variance. The variance is the SE squared

Suppose that the sleep equation contains heteroskedasticity. What does this mean about the tests computed in parts (a) and (b)?

The standard t and F statistics that we used assume homoskedasticity, in addition to the other linear regression model assumptions. If there is heteroskedasticity in the equation, the tests are no longer valid.

Central Limit Theorem

The theory that, as sample size increases, the distribution of sample means of size n, randomly selected, approaches a normal distribution. No matter what distribution data comes from, exponential, linear, etc. Due to this theorem, it doesn't matter because if you add up enough means it will become a normal distribution. THIS ALLOWS YOU TO MAKE CONFIDENCE INTERVALS

From the given information why are you unable to compute the F statistic for joint significance of motheduc and fatheduc? What would you have to do to be able to compute the F -stat?

The two regressions use different sets of observations. The second regression uses fewer observations because mother and father education are missing for some observations. We would have to reestimate the first equation (and obtain the R-squared) using the same observations used to estimate the second equation.

Standard deviation of the residuals

This value gives the approximate size of a "typical" or "average" prediction error (residual).

determine whether bias is positive or negative.

To figure out the bias you need to know the partial effect of the included variables on the omitted variable. The partial effect of single parenthood on math scores can be assumed to be negative = -S. We would expect that having a higher income would lead to better math scores so we can assume that bias is positive = +I. -S * +I = -(bias) THE OVB IS NEGATIVE

Properties of Estimators

Unbiased - Low sampling error Efficient - Small variance Consistent - Accuracy increases as sample size increases normal distribution

Breusch-Pagan test for Heteroskedasticity (BP test)

Under Homoskedasticity the R^2 should be equal to 0 because you shouldn't be able to use the X's to explain variability R^2 x number of obvservations bptest(mod) COMES FROM A CHI SQUARED DIS

Does a=2 and b=1 produce the best prediction function for y given x, or is there a better choice of a and b?

Use the formula for B to see if it equals 1 and use the formula for a to see if it equations 2

W = Co + C1z Var (w) =

Var (w) = C1^2 var(z)

If x1 is correlated with x2 and x3, x2 and x3 are highly correlated, and x2 and x3 have a large partial effect on y, explain why we do not know whether Var(˜1) will be larger or smaller than Var(ˆ1).

Var(B1) = Sigma Squared / (1-R2)*SSTx1 Since X1 is correlated with X2 and X3 the addition of those variables will cause R2 to go up. This will in turn cause the variance to go up. Additionally, since X2 and X3 help to explain Y then sigma squared will be lower in the estimation that includes X2 and X3, this will in turn make the numerator a lower number which will make the variance lower Without knowing the variables we cannot know for certain where the variance will go.

W = Z1 + Z2 Var(W) if they are not correlated

Var(W) = var(Z1) + var(Z2)

W = Z1 + Z2 Var(W) if they are correlated

Var(W) = var(Z1) + var(Z2) + 2cov(Z1,Z2)

Z = 5var(x) What is the variance of Z?

Var(Z) = 5^2var(x)

w = 5 + Z Var(w) =

Var(w) = var(Z)

Suppose Var(𝑃𝑆)=353Var(PS)=353 and Var(𝑀𝑇)=81Var(MT)=81 and Var(𝐹𝑖𝑛𝑎𝑙)=130Var(Final)=130. Also, suppose Cov(𝑃𝑆,𝑀𝑇)=0Cov(PS,MT)=0 and Cov(𝑃𝑆,𝐹𝑖𝑛𝑎𝑙)=0Cov(PS,Final)=0 and Cov(𝑀𝑇,𝐹𝑖𝑛𝑎𝑙)=0Cov(MT,Final)=0. What is Var(𝐺𝑟𝑎𝑑𝑒)Var(Grade)?

Var(𝐺𝑟𝑎𝑑𝑒)=Var(.15𝑃𝑆+.35𝑀𝑇+.5𝐹𝑖𝑛𝑎𝑙) =15^2Var(𝑃𝑆)+.35^2Var(𝑀𝑇)+.5^2Var(𝐹𝑖𝑛𝑎𝑙) =.0225(353)+.1225(81)+.25(130) = 50.365

In this sample, some firms have zero debt and others have negative earnings. Should we try to use ln(dkr) or ln(eps) in the model to see if these improve the fit?

We probably should not use logs, as the logarithm is not defined for firms that have zero for DKR or eps, therefore we would lose some firms in the regression

THERE IS NO TEST FOR MEAN INDEPENDENCE BUT THERE IS FOR HETEREOSKEASTISITY

We typically ignore these tests and use OLS with corrected standard errors

Choosing instruments

Weak instruments make standard errors large and results not useful Weak instruments means z does not produce enough variation Relevance: z must produce variation in ln(avgprc) Do a joint significance test on instruments - If F-stat < 10, weak instruments

Sum and mean of Residuals

Will always equal 0 if you have a correct regression line

Simultaneous equations S&D

You can use instrumental variables Treat supply as exogenous and demand as endogenous or vis versa, E(ud | rawmatprice) = E(ud)

If xN is normally distributed Z = 2x

Z is normally distributed

WE DON'T NEED ASSUMPTION 6(normal distribution) if we have...

a large N or central limit theorem

significance level (alpha)

alpha represents the probability that tests will produce statistically significant results when the null hypothesis is correct

Statistically Significant

an observed effect so large that it would rarely occur by chance When the p-value is small, 5% or lower

Without using the cov function in R, what is the covariance of CEO salary and stock return?

avereturn = sum(data$return)/n covsalret = sum((data$salary-avesal) * (data$return-avereturn) )/(n-1)

The survey also asked respondents about how religious they were, choosing among 5 options, 5 = very religious, 4 = somewhat, 3 = slightly, 2 = not at all, 1 = anti-religious. Create histogram for this variable. What fraction of respondents reported being very religious?

barplot(table(data$relig),main="Frequency of Affairs",ylim=c(0,200))

From the equation, what is the optimal size of the graduating class (the size that maximizes the test score)? (Round your answer to the nearest integer.) What is the highest achievable test score? Rounded to the nearest integer, the optimal class size is 279 students.

class = 1:400 score = 45.6 + .082*class - .000147*class^2 plot(class,score,'l',xlab="Class Size",ylab="Score") lines(c(279,279),c(0,57.04), col="red")

Hypothesis test with new variance matrix

coeftest(mod, vcov = correctvcov)

Get Heteroskedasticity consistent variance matrix

correctvcov = vcovHC(mod)

CORRECT standard errors

correctvcov = vcovHC(mod, "HC1") round(sqrt(diag(correctvcov)), 4)

Correlation equation

covariance/sqrt(varianceX * variance of Y) SHOULD BE BETWEEN -1 to 1

How many counties are there in the data set? Of these, how many have zero murders? What percentage of counties have zero executions? (Remember, use only the 1996 data.)

data =(data[data$year==1996,])

How do you look at specific rows?

data$example[1000: 1006, ]

How do you find the number of people in the data?

data$length

Error or residual formula

e = y - y(hat) y(hat) is the expected value of y based off the regression line

Consider the linear regression model y = β0 + β1x1 + β2x2 + u. The error term u is said to exhibit homoskedasicity if it has zero conditional mean.

false

As K increases our R^2 will always

increase or stay the same K = number of explanatory variables AJUSTED R^2 might decrease!! OVERFITTING

Decreasing significance level

increases the amount of required evidence

interaction terms

interaction terms should never be included without also including the linear terms, otherwise we will typically reach wrong conclusions.

Weak instruments test R

ivmodsummary = summary(ivmod, diagnostics = TRUE) ivmodsummary$diagnostics

When sample size is small the adjusted R2 is

less than the original R2

Test linear hypothesis with built in function

linearHypothesis() NEED AER library

Now, estimate a model where each one-point increase in IQ has the same percentage effect on wage. If IQ increases by 15 points, what is the approximate percentage increase in predicted wage?

lm(log(wage)~IQ,data=data)

Use the data in KIELMC, only for the year 1981

load('KIELMC.RData') data = data[data$year==1981,] mod = lm(log(price) ~ log(dist),data=data) summary(mod)

How to make a scatter plot

load('NLSY97r13 .RData ') plot(data$gpa8 ,data$wage ,type='p' ,main=" Relationship Between Wage and 8th Grade GPA" ,xlab="8th Grade GPA" ,ylab="Wage" ,xlim=c(0 ,4) ,ylim=c(0 ,100))

What is 𝑠𝑎𝑙𝑎𝑟𝑦salary when 𝑒𝑥𝑝𝑒𝑟=0exper=0? When 𝑒𝑥𝑝𝑒𝑟=5exper=5? (Hint: You will need to exponentiate.)

logsal0 = 10.6 + 0.027*0 exp(logsal0) logsal5 = 10.6 + 0.027*5 exp(logsal5)

What is the best predictor for wage?

mean of wage

The variable avgsal contains information on the average salary at each plant. This variable contains missing values. Removing the missing values, what is the average value of avgsal?

mean(data$avgsal, na.rm=T)

Compute the average of 𝑐ℎ𝑖𝑙𝑑𝑟𝑒𝑛children for those without electricity and do the same for those with electricity. Comment on what you find.

mean(data$children[data$electric==1],na.rm=T) mean(data$children[data$electric==0],na.rm=T)

Use the CPI values from Part (c) to find the average hourly wage in 2010 dollars. Now does the average hourly wage seem reasonable?

mean(data$wage)*(cpi2010/cpi1976)

How do you deal with missing numbers? NAs

miswageTrue = is.na(data$wage) length( miswageTrue ) mean(data$wage[!miswageTrue ]) OR mean(data$actmth, na.rm = T)

Using the equation test H0: B1 = B2 against a two sided alternative

modB = lm(educ~motheduc+I(fatheduc+motheduc)+abil+I(abil^2),data=data)

Without using the mean function in R, what is the average CEO salary among these firms?

n = nrow(data) avesal = sum(data$salary)/n avesal

How do you find how many rows are in the data?

nrow(data$EXAMPLE)

inferential statistics

numerical methods used to determine whether research data support a hypothesis or whether results were due to chance

w (normally distributed) z (normally distributed) Y = w + z Is Y normally distrubted?

obviously

If we have Heteroskedasticity

our estimators are still unbiased as long as they have mean independence However the variance of our estimators, all our t-tests f-stat, confidence intervals... etc are all wrong with heteroskedasticity

How to make multi-line plots

plot(0,0, type="n" ,main=" Relationship Between Income and Education " ,xlab=" Education " ,ylab="Income" ,xlim=c(0 ,20) ,ylim=c(0 ,2500)) lines(educ ,income ,lty =1) lines(c(12 ,12) ,c(0 ,2500) ,lty =2) legend(" bottomright ",legend = c("Income","HS Grad"),lty =1:2)

We use density plots to understand the distribution random variables. Create a plot of the probability density function for the average minutes played per game for players in the NBA using

plot(density(data$avgmin),main="Distribution of Average Minutes",xlab="Average Minutes",xlim=c(0,50))

Probability density functions

plot(density(data$wage ,na.rm=TRUE) ,main=" Distribution of Wage" ,xlab="Wage" ,xlim=c(0 ,100))

What is Pr(T<−1)?

pt(-1,df=20)

F critical value

qf(1- alpha, q, n-k-1) q = number of tests alpha = significance level, most likely .95

At what value T∗ is Pr(T<T∗)=.1?

qt(.1,df=20)

At what value T∗ is Pr(T>T∗)=.1?

qt(1-.1,df=20)

Goodness of fit (R^2)

shows how well the best fit lines explains the data Does x help us predict y? 1 - SSR/SST Values between (0,1)

Consider a linear model to explain monthly beer consumption: beeri = β0 + β1inci + β2pricei + β3educi + ui with E(ui | inci , pricei , educi) =0 Var(ui | inci , pricei , educi) = σ^2pricei To transform this model to one that satisfies homoskedasticity, both the left-handside and right-handside variables need to be multiplied by

sqrt(price)

Variance

standard deviation squared (ui - umean)^2 + ... total/n-1

SST R formula

sum ( (data$Y - mean(data$Y))^2) Describes how far the y1...y2....yN point varys from the mean of Y's

SST

sum of squares total Total variability in y (y-meanY)^2

The variable scrap contains information on the scrap rate at each plant. This variable contains missing values. What fraction of observations have non-missing values for scrap?

sum(!is.na(data$scrap))/nrow(data)

SST = R formula

sum((data$educ - mean(data$educ))^2)

How do you find the sum?

sum(data$EXAMPLE)

The dataset jtrain comes from a job training experiment conducted for manufacturing plants during 1976-1977. The variable sales contains information on the annual sales at each plant. This variable contains missing values. How many observations have missing values for sales?

sum(is.na(data$sales))

What is the sum of squared squared residuals for this model?

sum(mod$residuals^2)

Use the fitted values from the model in part (d). Out of the 660 people in the data, how many have predicted probabilities below zero. How many have predicted probabilities above one?

summary(mod2$fitted.values) sum(mod2$fitted.values>1)

What happens when degrees of freedom gets larger

t-distribution gets close to a standard normal distribution

How to make histograms

table(data$MarijMS) barplot(table(data$MarijMS) ,main=" Frequency of Pot Smoking" ,ylim=c (0 ,3500))

binary, or indicator, variables

take on the values of 0 or 1

confidence interval

the range of values within which a population parameter is estimated to lie Suppose you have a 95% confidence interval of [5 10] for the mean. You can be 95% confident that the population mean falls between 5 and 10. For 95% confidence intervals, an average of 19 out of 20 contain the population parameter.

If you add a variable and Adjusted R2 improves...

then it probably means you should include it and you aren't suffering from overfitting

direction bias

upward bias: estimate > true parameter downward bias: estimate < true parameter

When n is small (under100) use what

use adjusted r-squared so you don't overfit the data

w = 5z var(w) =

var(w) = 5^2var(z)

Consider the variance for βˆK. The variance is a function of an R2 from some regression. (A) explain in words and in equation form, which regression this R2 relates to.

var(βˆK) = Sigma^2 / (1-R2k)*(SSTk) R2 from βˆK is from a regression that looks like βˆK = (EXPECTED VALUE) x1, x2, x3....

Without using the var function in R, compute the sample variance of CEO salary among these firms?

varsal = sum((data$salary-avesal)^2)/(n-1)

The more restricted and the higher q is the more likely

we are to reject the null hypothesis

VIF (variance inflation factor)

will tell us if a variable is causing multicollinearity problem. VIF >10 is generally considered large. FORMULA 1/(1 - R^2) Here R^2 is the regression of B1 = B0 + B2 R^2 will give you the correlation, replace the numbers in the formula.

(1 point) Suppose that Y is a random variable and that E(Y ) = µ. Suppose you have only a single observation on y, called y1. Use y1 to construct a BIASED estimator for µ. Call your estimator ˆµ and prove that your estimator is BIASED.

µ = y1 + 5 E(µhat) = E(y1 + 5 ) E(µhat) = E(y1) + E(5) E(µhat) = µ BIASED

𝑚𝑎𝑡ℎ10=𝛽0+𝛽1(𝑡𝑜𝑡𝑐𝑜𝑚𝑝/1000)+𝛽2𝑠𝑡𝑎𝑓𝑓+𝛽3(𝑒𝑛𝑟𝑜𝑙𝑙/100)+𝑢 What is the effect of increasing total teacher comp? B1

𝛽1 is the effect of an increase in per teacher compensation by $1,000 on the fraction of 10th graders passing the math exam.

ECON 334 R code

Ensembles d'études connexes

Java chapter 4

FINA 3724 Chapter 1

Midterm: Ortho 1 Fall 2022

AK Hoofdstuk 5 groep 8 - Hoe kunnen twee werelddelen botsen?

Python ch 5

Chapter 63: Caring for Clients with Orthopedic and Connective Tissue Disorders

nur 320 Chapter 2: Family-Centered Care and Cultural Considerations

testout labs

chapter 13

Drug Ed Final

Cariotipo humano

Physical Science Laura Whitlock Final

Building and Using Queries- Access B

Assessment

MedSurg Exam II Practice Questions

English 11 A exam

APES UNIT QUIZZES

Astronomy 2

Chapter 11 Audit

Chapter 6: Employer payroll Taxes