Exam 1 Concepts

Ace your homework & exams now with Quizwiz!

Random Variable

Takes on a numerical value; has an outcome that is determined by an experiment; ex) number of heads APPEARING in 5 coin flips

Expected Value

Weighted average of all possible values of X; population mean; X is a discrete random variable in a population taking on a finite number of values

little variation in xi, makes it (easier/harder) to pinpoint how E(y|x) varies with x

harder

what does variance tell us about OLS estimators

how precise/accurate they are -smaller variance=more accurate

When does the unbiasedness assumption 3 fail?

if the sample standard deviation of xi is 0 -population variation is minimal -or sample size is small

ID and describe the systematic component of the simple regression model

if the zero conditional mean assumption holds, then the simple regression model be broken down into 2 components, first being the systematic part -Bo+B1x represents E(y|x) = systematic part of y or the part of y explained by x

ID and describe the nonsystematic component of the simple regression model

if the zero conditional mean assumption holds, then the simple regression model be broken down into 2 components, the second being the nonsystematic part -also called idiosyncratic =u -part of y not explained by x but attributed to other factors

the larger the error variance, the (larger/smaller) is var(^B1)

larger b/c more variation in the unobservables affecting y makes it more difficult to precisely estimate B1

if ^ui is positive, the line _______ predicts yi

under; data points don't lie on OLS line

Assumptions for reliable estimators of Bo and B1

1) E(u)=0 2) E(u|x)=E(u) 3) E(u|x)=0

Explain the third assumption for reliable estimators Bo and B1

=0 conditional mean assumption E(u|x)=0 -when we combine assumption 1 with assumption 2 (given that assumption 2 holds), then we get the zero conditional mean assumption -if holds, implies u and x are uncorrelated

When can the assumptions for unbiasedness fail?

-If any of the 4 assumptions fails 1) if y and x yield nonlinear relationships (log level, log log, level log) 2) random sampling can fail in a cross sectional when samples aren't representative of population 3) always always holds 4) if fails, estimators are biased and x is correlated with u

What happens if we regress on a constant?

-aka we set the slope=0 and no longer need an x -we are estimating the intercept only -the intercept is y bar -the constant that produces the smallest sum of squared deviations is always the sample average

Ordinary Least Squares Estimate

-best/straightest line: smallest # of deviations -minimizing the sum of squared residuals -tells us the value we predict for y when x=xi for the given intercept and slope -there are n such predicted y values -allows us to derive unbiasedness and consistency -it is the estimated version of the population regression function

Level Log Model

-dependent is y -independent is log(x) -decreasing returns -B1 implies that a 1% increase in x is predicted to increase by B1/100 units

Level Level Model

-dependent is y -independent is x -B1 implies if x increases by 1 unit then y will increase by B1 units -constant linear effect of x on y

Log Level Model

-gives approximately a constant % effect -dependent (y) is in log form and independent (x) is in level -B1 is the % change in y given one additional unit of x -since the % change in y is the same for each additional unit of x, then change in y for an extra unit of x increases as x increases -100B1 is the semi elasticity -which says it has increasing returns to x

Interpret what a low R^2 means

-if R^2=0%, then none of total variation in y is explained by x -so a low R^2 says that other factors are more important -remaining % can be explained by unobservables (1-R^2)

Theorem for unbiasedness

-if all 4 assumptions hold, then ^Bo is unbiased estimator of Bo and ^B1 is too -E(^Bo)=Bo says it is unbiased; aka reliable; likely that estimated is close to actual

Residual sum of squares (SSR)

-sample variation in the ^ui -deviations of each residual and the squares of them -variation attributed to different unobservables

What does property 1 of the OLS regression line imply?

-the average of the residuals is 0 -so the sample average of the fitted values, ^yi, is the same as the sample average of the yi

ID and describe the 3rd algebraic property of OLS stats

-the point (x bar, y bar) is always on the OLS line -sample average for x and y, if estimate OLS line, predicted ^y will = average y bar -SO, each yi has 2 parts: a fitted value and a residual. they are uncorrelated.

Explained sum of squares (SSE)

-the sample aviation in the ^yi; y in terms of x -variation attributed to different y values (wage)

Regression through the origin

-when a slope estimator (~B1) and a line of the form: ~y=~B1x passes through the point x=0, ~y=0 -slope estimate still relies on OLS, which in this case minimizes the sum of squared residuals -if intercept ~B1 doesn't equal 0, then its a biased estimator of B1

Identify and explain the population regression function

-2nd interpretation of B1 given by the zero conditional mean assumption E(y|x)=Bo+B1x -E(y}x) is a linear function of x, so this linearity means that a 1 unit increase in x changes the expected value of y by the amount B1 -tells us how the average value of y changes with x -fixed, but unknown, in the population

Log Log Model

-B1 is the elasticity of y w.r.t. x -dependent: y=log(salary) -independent: x=log(sales) -because the change to log form approximates a proportionate change, nothing happens to the slope -implies that a 1% increase in x increases y by about B1% -B1 is the constant elasticity

Steps of the Econometric Analysis Process

1) Research question: quantify relationship b/w y and x 2) Construct an econometric model: relationship to be tested 3) Econometric Model: equation; functional form; other issues; assume linearity wage=Bo+B1edu+u 4. Collect data: want to be close to actual representation; random; credible 5. Estimate model: with statistic software; for Bo (slope) and B1 (intercept); in order to quantify variables

Unbiasedness Assumptions

1) model is linear in parameters Bo and B1 -y,x, and u are all assumed to be random variables in stating the population model 2) we have a random sample of size n -data is credible and representative of whole population 3) sampling variation in x -if sample outcome on x varies in the population, random samples on x will contain variation 4) zero conditional mean -error u has an expected value of 0 given any value of x

algebraic properties of OLS stats

1) the sample average of the OLS residuals=0 -no proof needed 2) sample covariance between xi and the OLS residuals is 0 -because the sample average is 0 3) the point (x bar, y bar) is always on the OLS line

Properties of Variance

1. Var(x)=0 if x=constant 2. if a and b are constant, var(ax+b)= a^2 var(x) 3. Var(ax+b)= a^2 var(x) + 2ab cov(x,y) + b^2 var(y)

ID and describe the zero conditional mean assumption for unbiasedness

=E(u|x)=0 -says error u hs an expected value of 0 given any value of x -x and u must be uncorrelated -OLS estimators as conditional on the values of the xi (independent) in the sample, which says to treat xi as fixed in repeating samples -randomness in ^B1 due entirely to errors in the sample -errors generally different from 0 is what causes ^B1 to differ from B1 -goal: obtained sample is "typical", then estimate should be "near" true population value

ID and describe the goodness of fit measure

=R^2 -the ratio of the explained variation compared to the total variation -fraction of the sample variation in y that is explained by x -percentage of the sample variation in y that is explained by x (multiply fraction by 100)

Total sum of squares (SST)

=SSE+SSR -a measure of the total sample variation in the yi -how spread out the yi are in the sample

Homoskedasticity

=error u has the same variance given any value of x -has no role in showing unbiasedness

Standard Deviation

A common measure of spread in the distribution of a random variable

Standard Deviation of ^B

A common measure of spread in the sampling distribution of ^B

Covariance

A measure of linear dependence between two random variables

Correlation coefficient

A measure of linear dependence between two random variables that doesn't depend on units of measurement and is bounded between [-1,1]

Variance

A measure of spread in the distribution of a random variable

Explain the first assumption for reliable estimators Bo and B1

E(u)=0 -says that as long as the intercept Bo is included in the equation, nothing is lost by assuming that the average value of u in the population is 0 -simply about distribution of unobserved factors in the population; says nothing about relationship b/w u and x

Explain the 2nd assumption for reliable estimators Bo and B1

E(u|x)=E(u) -because u and x are random variables, we can define the conditional distribution of u given any value of x -for any x, the expected (or average) value of u for that slice of the population described by the value of x -equation says that the average value of the unobservables is the same across all slices of the population determined by the value of x -conditional=unconditional, so u and x are uncorrelated -ex) the average level of ability (u factor) must be the same for ALL educational levels (x)

Interpret cov(x,y)>0 and cov(x,y)<0

If cov(x,y)>0 then it has a linear relationship, upward sloping If cov(x,y)<0 then it has a linear relationship, downward sloping

Interpret cov(x,y)=0. How does this affect corr(x,y)?

No linear relationship; can be independent but not based on covariance alone; graphically: looks like a rainbow. Then Corr(x,y)=0. X and Y are uncorrelated random variables

What is the goal of econometrics?

To complete step 5 of the analysis process, which to quantify the variables

Ceteribus Paribus Assumption

all else equal or all things constant; holding other factors fixed so change in u=0 if y=Bo+B1x+u

what is the problem with correlation as the only measure defining the relationship between u and x?

correlation measures only linear dependence between u and x

Pooled (panel) Data

cross sectional individuals which were followed over time

Time Series Data

data over time; observe 1 variable over time; chronological

as the variability in xi increases, the variance of ^B1 (increases/decreases)

decreases -more variability in independent variable is preferred -since more spread out sample of independent variables makes it easier to trace relationship b/w E(y|x) and x -easier to estimate B1

Residual

deviation from actual value of y and the predicted value of y -there are n such residuals

The variable u

error term; disturbance; represents factors other than x that affects y; factors are unobserved

A larger standard deviation of the error means that the distribution of the unobservables affecting y is (more/less) spread out

more

If u and x are uncorrelated, then, as random variables, they are _________________

not linearly related

if ^ui is negative, the line ____predicts yi

over; data points don't lie on OLS line

a larger sample size results in a (smaller/larger) variance for ^B1

smaller -as sample size increases, so does the total variation in the xi -want xi as spread out as possible

Cross Sectional Data

snapshot of population interest at single point in time; many diverse individuals

What do properties 1 and 2 of the OLS regression line imply?

they show that the sample covariance between ^yi and ^ui is 0

Interpet E(collegeGPA | highschoolGPA)

this tells us the average college GPA among all students who have a given high school GPA


Related study sets

Chapter 12: Leaders and leadership

View Set

Quiz 3 Chapter 3 "Business Ethics and Social Responsibility"

View Set

Chapter 3 Decision Structure (Review)

View Set

9) The Axial Skeleton: The Vertebral Column

View Set

Crisis 1 Final Practice Questions

View Set