Econometrics Final
how multicollinearity can lead to apparent contradictions btwn t test results and F test results
help -t test will make it seem like a coef is significant even when it isn't bc looking at it individually while the F test looks at their coefs together
why impure serial correlation should be addressed before correcting for pure serial correlation
-hints that there may be a problem with the functional form and it's easier to solve than pure serial correlation
consequences of hetero for OLS estimates
-how it affects the properties of OLS coef estimates: does NOT cause bias in the coefficients --lack of bias does not mean estimates are accurate -how it affects estimated standard errors: SEs are biased which leads to unreliable hypoth testing and CIs --too low SEs will cause a too high t-score and make it more likely that we will reject a null when it's true (type I error) -how it affects estimated t statistics: higher than they're supposed to be -OLS no longer minimum variance estimator
peculiarities of durbin-watson test
-how table is used and inconclusive region 1. econometricians almost never test the 1 sided null hypoth that there's neg SC in residuals bc it's difficult to explain theoretically 2. sometimes inconclusive --don't try to fix equation if inconclusive --size of inclusive region grows as # of ind vars increases --make sure to see if graph is for df or df-1
importance of an F statistic computed from first stage regressions to determine whether the instrumental vars are sufficiently correlated w/ the endogenous var to permit us to proceed, correct F stat to compute in various cases, and the rule of thumb for an acceptable first stage F value
help -F stat tells us if we have strong correlation -rule of thumb for acceptable first stage F value = 10 or greater --greater than 10 means var is correlated to endog var
2 ways to estimate equations that have dummy dep vars
-linear probability model -binomial logit model -use when any topic involves a discrete choice of some sort (i.e. why do some states have female governors and others don't)
prais winston
in STATA: -unclick CO transformation under Prais Winston and CO reg tab -df=n-1 -std error is larger than w/ OLS --t-stat is smaller and CI is larger -best choice when have a small sample size
why the marginal effect at the sample mean can be diff from the avg marginal effect in a logit (probit) model
help
connection between hc1 standard errors in chapter 10 and newey west standard errors
help -both need large sample size to work well
indirect least squares
-gets consumption as a constant which is a function of investment help
what centering explanatory vars does NOT do to alleviate multicollinearity
- the centered regression has the same fit, the same R-squared, and makes exactly the same predictions as the original uncentered regression
two possible remedies for hetero
-HC standard errors -redefinition of the vars: functional form change can dramatically change the equation (sometimes just need to switch from linear to double log form) --double log form has less variation so less likely to encounter hetero
various ways to estimate marginal impact of an increase in an independent var in logit (probit) model
-Stud's rule of thumb -marginal effect at the sample mean -average marginal effect 1. change an avg observation --create an avg observation by plugging the means of all the ind vars into the estimated logit equation and then calculating avg Dhati --increase ind var of interest by 1 unit and recalculate --diff tells impact of 1 unit increase in that ind var on probability that Dhati=1 for an avg observation 2. use partial derivative --take derivative of logit and see change in the expected Dhati caused by 1 unit increase in X1i holding other vars constant equals B1P(1-P) 3. use rough estimate of .25 --plug Dhati=.5 into B1Pi(1-Pi) get more useful result ^help
how newey-west estimation can change reported F values in stata
-The F statistic reported for Newey-West is based on the reported t statistics F=t²
linear probability model
-a linear in the coefs equation used to explain a dummy dep var Dᵢ = β₀ + β₁X₁ᵢ + β₂X₂ᵢ + Eᵢ -coef interpretation: estimate of the change in the probability that Di=1 caused by a 1 unit increase in the ind var in question, holding other ind vars constant -Pi=true probability that Di=1 --can never observe Pi like we can't observe true betas --true value btwn 0 and 1 but can only observe extremes of 0 or 1 or greater than 1 which isn't possible -hetero is natural side effct so don't need to run test
usual effect of using HC standard errors on OLS coefficient estimate and estimated OLS standard errors and t statistics
-adjust estimation of SEs for hetero while still using OLS estimates of the slope coef -SEs that have been calculated to avoid consequences of hetero --biased but more accurate than uncorrected SEs
common errors that create perfect multicollinearity
-remember that perfect multicollinearity is often self-inflicted -ex: dummy var trap --when you have a bunch of dummy vars and don't leave out the base category -can use scatterplot to see if there's a strong linear relationship twoway (scatter x1 x2) -can use t test -can use f test which says that one of the coefs have to be 0 and can't both be 0 at same time test(x1 x2)
heteroskedasticity-corrected standard errors (HC standard errors)
-adjust estimation of the SEs for hetero while still using OLS estimates of the slope coef -improve SEs w/o affecting estimates of slope coef -calculated to avoid consequences of hetero -usually larger than OLS SEs which means lower scores and decreasing probability that given estimated coef will be diff from 0 -works best w/ large samples -STATA: under SE robust tab under linear regressions --select robust SE and leave on default option -attempting to address problems with t-scores and f-tests -will change the f and t statistics (lowers them) --F statistics that are reported are based on the robust estimate of the variance-covariance matrix of the coefficients. ---if use HC standard errors then can still use STATA's F-stat if there's hetero --can't use ramsey reset test as F test bc reports same results as w/ OLS
structural equation
-aka behavioral equations -characterize underlying economic theory behind each endogenous var by expressing it in terms of both endogenous and exogenous vars -Ys are jointly determined so change in Y1 will make a change in Y2 and that'll cause Y1 to change again -alphas and betas in equation are structural coefs and hypoth should be made about their signs just like before
reduced form equation
-alternative way of expressing simul equations system -express a particular endo var solely in terms of an error term and predetermined (exo plus lagged endo) vars in simul system -eq in notebook for week 14 and pic on phone --Vs are stochastic error term and Pis are reduced form coefs bc they are the coefs of the predetermined vars in the reduced-form equations -impact multipliers: Pis --measure the impact on the endo var of a one-unit increase in value of the predermined var after allowing for feedback effects from the entire simul system -3 reasons to use this: 1. have to inherent simul so they don't violate assumption III --can be estimated w/ OLS w/ other probs 2. interpretation of coefs as impact multipliers means they have economic meaning and useful applications of their own --ex: if want to compare govt spending increase w/ a tax cut in terms of per-dollar impact in 1st year estimates of impact multipliers would allow this comparison 3. equations play important role in estimation technique used more often for simul equations --two-stage least squares
why we need to address any known causes of impure hetero before dealing with pure hetero
-always need to fix functional form before doing further tests
binomial probit model
-another way to estimate equations w/ dummy vars -estimation technique for equation with dummy dep vars that avoids unboundedness prob of linear model by using variant of cumulative normal dist (equation in notebook for week 13 and pic on phone) -similar to logit model --similar graphs --probit model also needs a large sample before hypoth testing is meaningful --R bar squared of questionable value for measuring overall fit -works like logit except S curve is for normal distribution -predictions for tails of ends may have diff results -equation with Pi -harder to interpret marginal impact
how adjusted R squared sub p is computed and why it's superior to other measures
-best way to calc proportion of sample explained correctly by linear model: calculate percentage of 1s explained correctly, the percentage of 0s explained correctly, and then report averages of the 2 percentages estat classification
simultaneity bias
-bias that occurs when applying OLS directly to the structural equations of a simul system -in simul systems the expected value of the OLS estimated structural coefs (betas) are not equal to true betas -can't observe error term so don't know when tᵢ is above avg so it'll appear that Y1 is above avg and therefore Y2 will be above avg --OLS will attribute increases in Y1 caused by error term E1 to Y2 which overestimates B1 (overestimation=simul bias) -bias will have same sign as correlation btwn error term and endo vars that appear as an explanatory var in that error term's equation -don't know whether or not there's bias unless u know the true betas so arbitrarily pick set of coefs to be considered "true" --stochastically generate data sets based on these true coefs and obtain repeated OLS estimates of these coef from the generated data -fix with instrumental var
two formal tests for heteroskedasticity and how to implement each one
-breusch pagan test -white test
lagged endogenous var
-can appear in simultaneous systems when equations involved are dynamic models -not simultaneously determined in current time period so have more in common w/ exo vars than nonlagged endo vars
white test
-can find more types of hetero than any other test -investigates hetero in an equation by seeing if squared residuals can be explained by the equations indep vars, their squares, and their cross-products 1. obtain residuals of estimated reg equation predict residuals, residuals 2. estimate an auxiliary equation using squared residuals as dep var w/ each X from the og equation, the square of each X, and the product of each X times every other X as the exp vars if og equation ind vars = X₁ + X₂ eᵢ² = α₀ + α₁X₁ᵢ + α₂X₂ᵢ + α₃X₁ᵢ² + α₄X₂ᵢ² + α₅X₃ᵢ² + uᵢ 3. test the overall sig of equation from step 2 w/ chi-square test --test stat=NR² --df=number of slope coef in auxiliary reg --null=all slope coef in equation from pt 2 =0 --reject if NR² greater than chi square crit value --reject means evidence of hetero
identification
-can't apply 2SLS to an equation unless that equation is identified --difficult to identify estimates for underidentified equations -precondition for application of 2SLS --structural eq is identified when enough of the system's predetermined vars are omitted from the eq in question to allow that eq to be distinguished from all others in the system ---one equation in the system might be identified while the other isn't -way to identify both curves is to have at least 2 predetermined var in each equation that isn't in the other -general method to determine whether equations are identified=order condition of identification
pure heteroskedasticity
-caused by error term of correctly specified equation -implied unless stated otherwise
impure heteroskedasticity
-caused by specification error such as omitted var --portion of the omitted effect not represented by one of the included exp vars will be absorbed by the error term --find omitted var and include it in reg
what centering explanatory vars does to alleviate multicollinearity
-centered variables have much more pleasing correlations and more attractive VIF measures
lagrange multiplier (LM)
-checks for serial correlation by analyzing how well the lag residuals explain the residual of the og equation --if lagged resids are sig in explaining this time's resids then reject null of no sc 1. run the normal reg 2. obtain residuals (predict residuals, residuals) 3. run reg with residuals as dep var and add L.residuals as and ind var (aka lag) 4. multiply NR² to get test stat (observed chi) which is the LM --for large samples, LM has chi-square dist w/ df=1 aka number of restrictions on null hypothesis and to get critical do display invchi2tail(1,alpha)
bad consequences of simultaneity bias for OLS estimates
-classical assumption III (all exp vars should be uncorrelated w/ error term) is violated w/ simultaneous models --OLS coefs are biased -violated bc var that appears on right hand side of the eq is correlated to the error term --in demand eq the price can be correlated to the error term if they increase/decrease together --error term is correlated w/ explanatory var -instrumental var is way around it
maximum likelihood
-consistent -asymptotically efficient (min var for large samples) -w/ large samples, converges to normal dist which allows use of typical hypoth testing technique (500 or more) -way of estimating any parameter and model that you can write down f(x) = (µˣe^µ) / X! -likelihood that you'd see that sequence for diff values of µ -use to estimates logit model
kinds of applications that are most likely to suffer from heteroskedasticity
-cross sectional help
how serial correlation violates a classical assumption of regression model
-diff observations of error term can't be correlated with each other and with serial correlation they are
2 simple models of how hetero might occur
-discrete hetero -proportionality model of hetero
what newey-west standard errors does and does not do to OLS coef estimates and their likely effect on estimated standard errors
-doesn't affect the estimated betas -SEs are larger than in OLS so t-scores are lower
pseudo-R-squared
-equation in notebook for week 14 and pic on phone -used for logit model -always between 0 and 1 -higher is better -adjusted R bar squared sub p -don't use to compare probit and logit models -STATA: estat classification
studentized residuals aka deleted t residuals
-equation in notebook for week 16 and pic on phone -to get num and denom independent have leave calculation of ith residual out of SE -denom= n-k-1 bc threw out an observation to make num and denom indep -STATA: under stats and postestmation and prediction with standardized/studentized/everage OR studentizedresids, rstudent
standardized residuals
-equation in notebook for week 16 and pic on phone =(size of residual)/(est sample size of resid) -has approximate but not genuine t distribution --for t dist expression in num and denom have to be statistically independent -can talk about how many standard deviations off the fitted line the residual is -T value is like Z value if sample size is large -STATA: under stats and postestmation and prediction with standardized/studentized/leverage OR standardizedresids, rstandard
what pseudo-R-squared is in logit (probit) model
-equivalent of adjusted R squared in measuring goodness of fit
linear probability model and how to interpret its coefficients
-estimate of the change in the probability that Di=1 caused by a 1 unit increase in the ind var in question, holding other ind vars constant
what multicollinearity does and does NOT do to properties of estimators, estimated standard errors, t-values associated to those estimates, and to the stability of your coef estimates as you change specifications
-estimates will remain unbiased as long as 1st 6 classical assumptions are met --mean doesn't change -variances and std errors of estimates will be large --when can't distinguish btwn 2 vars then it's more likely to make large errors in estimating the betas than before we encountered multicol --estimated coefs come from distributions w/ larger variances and larger std errors -OLS is still BLUE (minimum variances are just large) -computed t-scores will fall (bc SE increases with multicol) --CI will widen -estimates will become very sensitive in changes in specification -overall fit of equation and estimation of the coefficients of nonmulticol var will be largely unaffected --measured by R bar squared --hint of multicol is combo of high adjusted R squared with no statistically significant individual reg coefs --possible for F-test of overall sig to reject null hypothesis even tho one of the t-tests on individual coefs can do so
binomial logit model
-estimation technique for equations w/ dummy dep vars that avoids unboundedness problem of the linear probl model by using a variant of the cumulative logistic function (eq in notebook for week 13 and pic on phone) -used observed Dis to estimate logit equation and use estimation to produce Dhatis that we compare to Dhatis produced by estimated linear probability model --limited btwn 0 and 1 -can't be estimated w/ OLS so use ML aka maximum likelihood --chooses coef estimates that maximize the likelihood of the sample data set being observed -interpretation of coef: --absolute size of the coef are different from the absolute sizes of estimated linear probability model coef for same specification and data -slope of graph of logit changes as Dhati moves from 0 to 1 -STATA: have to click report estimated coefficients -has higher chance of odd ball events happening --use logit if think that odd ball events are more likely to happen in certain scenario
modeling choices likely to cause imperfect multicollinearity
-ex: including several vars that all measure similar things help -when you use a polynomial in an explanatory var (square and normal)
meaning and intuition of rho (ρ) and its values
-first order autocorrelation coef -measures functional relationship btwn value of an observation and error term and value of the previous observation of the error term -magnitude indicates strength of SC in equation -rho=0 means no SC -can be neg but look at abs value as it approaches 1 --sign of rho indicates nature of SC
remedies for serial correlation
-generalized least squares (GLS) --method of ridding equation of pure 1st order SC and restoring minimum var property to its estimation Y*ᵢ = β₀* + β₁X₁*ᵢ + uᵢ --pretend i is a t -newey-west std errors: SEs that take into account SC w/o changing the beta hats themselves -do NOT reorder vars -look at specification of equation for poss errors that might be causing impure serial correlation --functional form --omited vars
two-stage least squares (2SLS)
-how to get rid of or reduce simul bias -method of avoiding simul bias by systematically creating vars to replace endo vars where they appear as exo vars in simul equations systems 1. avoid violating assumption III by finding a var that is: --highly correlated w/ endo var --uncorrelated w/ error term --instrumental var -run reg on the reduced form of every right side endo var and then using Y hat from reduced form estimated equation in place of endo var where it appears on right side of structural equation -STATA: need to check box to report first-stage reg and box below it in order to see F value for first stage reg -if reject null hypoth overwhelmingly then interpretation is that one of the instrumental vars must be correlated w/ error term or model is wrong in some other way bc no way to rationalize big spread in coefs -estimates have increased vars and SEs(beta hats) --if have situation where both OLS and 2SLS work then would want to use OLS bc 2SLS uses a proxy based on instruments instead of actual value of the var
leverage
-hᵢ 1/2 ≤ hᵢ ≤ 1 -when it's so large that it becomes interesting --start by thinking about avg leverage of observations and then look for an observation that has a much bigger leverage than average -average leverage=(k+1)/n -measures distance from mean of X -potential to mess with results but does not necessarily mean that it will mess with results --high leverage observations don't change much and even outliers don't change much but if have a high leverage w/ an outlier then will change ---how you get a big cook's distance
relationship btwn durbin-watson statistic and rho
-if current residual is equal to previous residual then numerator of d equation would be 0 -extreme pos SC would give you a durbin-watson stat of 0
how leverage and cook's distance help identify influential observations
-if have a high leverage and an outlier then will have a large cook's distance -greater than 1 is big
how using certain functional forms such as polynomials and interaction terms can lead to imperfect multicollinearity
-if have have one variable and square it then those two variables are perfectly correlated with each other
newey-west standard errors
-if serial correlation doesn't cause bias in the beta hats but does impact the SEs then it makes sense to adjust the equation in a way that changes the SEs but not the betas -calculated to avoid consequences of 1st order SC -SEs are larger than in OLS which means lower t-scores and decreases probability that a given estimated coef will be sig diff than 0
understand how you might informally confirm the existence of hetero with scatter plots
-if the graph has a cone shape then it's a hint that there may be hetero
why you might want to compute marginal effects not at means but separately if there's a dummy var among the X vars
-if you care about a particular case and want to know if under certain conditions that dependent var will be 0 or 1
GLS
-in STATA: prais followed by dep and ind vars -GLS estimates usually diff from OLS ones -GLS works well if rho hat is close to actual rho but GLS rho hat is biased in small samples --if rho hat is biased it introduces bias into GLS estimates of beta hats -has CO clicked
predetermined variable
-includes all exo vars and lagged endo vars -implies that exo and lagged endo vars are determined outside system of specified equations or before current period
role of sample size in alleviating multicollinearity
-increasing sample size reduces impact of multicol -larger data sets will allow more accurate estimates than a small one bc larger sample normally will reduce var of estimated coefs which diminishes impact of multicol
breusch-pagan test
-investigates whether the squared residuals can be explained by possible proportionality factors (hetero test) -basically doing an F test of overall sig for 2 diff reg but have to use chi squared test bc F test requires that vars be normally dist 1. obtain residuals from estimated reg equation eᵢ = Yᵢ - Ŷᵢ = Yᵢ - β₀ - Ḃ₁X₁ᵢ - β₂X₂ᵢ predict residuals, residuals 2. use squared residuals as dep var in auxiliary equation --vars on right side of equation eᵢ² = α₀ + α₁X₁ᵢ + α₂X₂ᵢ + uᵢ gen residuals2=residuals^2 reg residuals2 indvars residual 3. test overall sig of equation w/ chi-squared test --null: alpha1=alpha2=0 (homo bc if alphas =0 then var also equals alpha0 which is a constant) --alt: null is false --test stat =NR² --df=number of slope coef in auxiliary reg --reject null if NR² is greater than or equal to critical chi-square --display chi2tail(df, NR²) -if it fails to find hetero it only means that there is no evidence of hetero related to chosen Zs -in STATA under postestimation breuch-pagan and click use right hand side vars
endogenous variable
-jointly determined and not bc it is in both equations -often on left side of equation but not always --ex: in supply and demand equations, price is on the right side -must be as many equations as there are endo vars -determined inside system of specified equations and during current period
how econometricians pick a lag length when using newey-west std errors
-l.y means lag Y aka Y from lat period -l2.y is Yt-2 etc -lags = n^.25 and round down
limitations of the linear probability model
-limitations discussed in Stud p.435 -linear probability models will have hetero error terms 1. R bar squared is not an accurate measure of overall fit --Di can only equal 0 or 1 but Dhati must move in a continuous fashion from one extreme to the other 2. Dhati is not bounded by 0 and 1 --depending on values of the ind vars and the beta hats the right hand side may be outside the meaningful range --can get around prob by assigning Dhati=1 to all values above 1 and Dhati=0 to neg values of Dhati 3. error term is neither homoscedastic nor normally distributed --Di can only take on 2 values --most ignore hetero and nonnormality and apply OLS to linear prob model -best way to calc: calculate percentage of 1s explained correctly, the percentage of 0s explained correctly, and then report averages of the 2 percentages = Rbarp2
imperfect multicollinearity
-linear functional relationship btwn 2 or more indep vars that is so strong it can significantly affect estimation of the coefficients of the vars -strong but imperfect linear relationship btwn 2 or more of the x vars -when 2 or more exp vars are imperfectly linearly related X₁ᵢ=𝛼₀ + 𝛼₁X₂ᵢ + Uᵢ --has a stochastic error term bc x1 can't perfectly explain x2 -NO classical assumption that rules out imperfect multiol
difference btwn marginal effect at the sample mean and avg marginal effect
-marginal effect at sample means: estimate marginal effect for a particular case •If care about a particular case: oEx: interested in a certain girl will be returning her second year Has an above average GPA Factor terms=stuff like program etc If want marginal effects for gpa and program type those into vars •Then click at tab and click where you want it oMake specification gpa=3.8 To predict probability that she will return: •Go to margins and unclick "marginal effects of response" which takes us to conditional means oMargins program, at (gpa=(3.8)) •Go to post estimation predict and enter new var into the place where it asks for new var -avg marginal effect: going point by point and averaging the slope of each line
simple correlation coefficient (r)
-measure of strength and direction of linear relationship btwn 2 vars -high is about .8 -high r means multicol but low r does not mean otherwise
instrumental var (an instrument)
-method of avoiding violation of assumption III (all exp vars should be uncorrelated w/ error term) by producing predicted values of endo vars that can be substituted for endo vars where they appear on the right side of the structural equations -produced by running OLS equation to explain endo vars as function of 1 or more instrumental vars -every predetermined var in simul system is candidate to be an instrumental var for every endo var -if only choose 1 instrumental var then throwing away info --2SLS uses linear combination of all predetermined var to avoid this --form linear combo by running reg for given endo var as function of all the predetermined vars in reduced form equation to generate predicted value of endo var -if only have 1 instrument for 1 endo var then can't define bias in 2SLS --if don't have extra instruments then distribution of 2SLS has such fat tails that mean of dist isn't even defined (can't say that theta hat is unbiased if don't have expected value for estimator theta) -want to use F test on all of the instruments and not use F from output reg --output F is testing if ALL the coefs are simultaneously equal to 0 but that's not what we want --want to test a subset --test(var) or vars
variance inflation factor
-method of detecting severity in multicol by looking at extent to which given exp var can be explained by all the other exp vars in the equation -each exp var in equation has a vif -index of how much multicol has increased the variance of an estimated coef -no hard rules with VIF decision rule (look for 5 or higher) -possible to have multicol effects in equation that doesn't have large VIFs X₁= α₁ + α₂X₂ + α₃X₃ + ... + αᵢXᵢ + V --V = stochastic error term --auxiliary/secondary regression bc X₁ not included on right side VIF(Ḃᵢ) = 1/(1-Rᵢ²)
order condition for identification
-method of determining whether a particular equation in a simul system has the potential to be identified -necessary but not sufficient condition of identification -need to determine: 1. # of predetermined vars in entire simul system 2. # of slope coef estimated in equation in question -condition for an equation to be identified is that the number of predetermined vars in the system be greater than or equal to the number of slope coefs in equation of interest
why we need large sample sizes for newey-west to work well
-more accurate than uncorrected SEs for larger samples (greater than 100) in face of SC
first order serial correlation
-most commonly assumed kind of SC -current value of error term is function of previous value of error term Et = ρE(sub t-1) + U(sub t) -- ρ = rho =1st order autocorrelation coef --u=classical (not serially correlated) error term --E=error term of equation in question -rho: measures functional relationship btwn value an observation of the error term and the value of a previous observation --closer to 1 means stronger SC
newey-west standard errors aka HAC standard errors
-number of lags you are correcting grows slowly w/ sample size --have more sample sizes per lag so both go to infinity if the other does -have to have large sample size for it to work well -under time series reg -similar to OLS but CI is wider -uses same coef as OLS so has the same predicted coef as OLS but F changed --knows there's a prob with F so gets the F by looking at corrected Ts and not sum of squares
how pure hetero (aka just hetero) is a violation of classical assumptions of reg model
-observations of error term are drawn from distribution that does NOT have a constant variance
simultaneous equation
-one in which Y clearly has an effect on at least 1 of the Xs in the addition to the effect that the Xs have on Y --distinguish btwn vars that are simultaneously determined (Ys aka endogenous vars) and those that are not (Xs aka exogenous vars) --equation in notebook for week 14 and pic on phone -estimation of an equation changes every time the specification of any equation in the entire system is changed -use two-stage least squares to estimate them instead of OLS -X influences Y but Y also influences X
redundant variable
-only one var out of a couple or a few is needed to represent the effect on the dep var that all of them currently represent -doesn't matter which var(s) you drop so base decision on theoretical underpinnings of model -when var is dropped from equation its effect will be absorbed by other exp vars to the extent that they are correlated
positive and negative serial correlation
-pos: implies error terms tend to have same sign from one period to the next --if error term takes on a large value in one time period the following observations would retain a portion of this og large value and have same sign as original -neg: error term has tendency to switch signs from neg to pos and back again in consecutive observations --less likely than pos
how impure serial correlation can arise
-probs with functional form --omitted variable --need be be in logs and not linear etc
cook's distance
-proposes measuring the "distance" btwn Bj and B(-i) by calculating the F-stat for the "hypothesis that Bj=Bj(-i) oRecalculated for each observation of i oDo NOT interpret as F-tests oEquation in notebook Look for values of Di that are larger than the rest oProvides summary index of influence on coefficients =(measure of discrepancy) x (measure of leverage) -composite measure -if get a cook's distance larger than 1 then basically screams to look at that observation
how to confirm existence of serial correlation by means of informal residual plots and through use of tests
-regressing current on lagged residuals and the durbin-watson d test
first stage for 2SLS
-run OLS on reduced form equations for each of the endo vars that appear as explanatory vars in the structural equations in the system -predetermined vars are uncorrelated w/ reduced form error term so OLS estimates of reduced form coef (Pi hats) are unbiased --can then use Pi hats to estimate endo vars -Y hats are used in place of Ys on right side of structural equations --eq in notebook for week 14 and pic on phone -bigger 1st stage F the better (10 is acceptable line) --don't get from reg output
how standardized and studentized residuals help one identify outliers
-say how many standard deviations away from fitted line the residual is
kinds of applications in which pure serial correlation is most likely to be a problem
-seasonal order serial correlation help
properties of 2SLS estimates as outlined on p.471-472 of Stud
-shortcomings of his discussion -point 2's claim that "the bias in 2SLS for small samples typically is of the opposite sign of the bias in OLS" is not correct, and the appropriate way to judge the fit of the reduced form is with the F stat 1. SLS estimates are still biased --bias in 2SLS is usually smaller than expected bias due to OLS --bias bc of remaining correlation btwn Y hats produced by 1st stage reduced form regs and the Es ---as the sample size gets larger then 2SLS bias decreases 2. if the form of the reduced form equation is poor then 2SLS will not rid the equation of bias --instrumental var supposed to be highly correlated to endo var --to extent that fit of reduced form eq is poor, instrumental vars aren't highly correlated w/ og endo var and no reason to expect 2SLS to be effective --as fit of reduced form eq increases then usefulness of 2SLS will increase 3. 2SLS estimates have increased var and SEs --but 2SLS will almost always be a better estimator of the coefs of a simul system than OLS
exogenous variable
-some vars are always exo (weather) but others can be either --depends on number and chars of the other equations in the system
outlier
-standardized and studentized residuals
second stage of 2SLS
-substitute reduced form Y hats that appear on the right side of the structural equations and then estimate these revised structural equations -eq in notebook for week 14 and pic on phone -dep vars are still og endo vars and substitutions only for endo vars that appear on right hand side of equations -if 2nd stage equations estimated w/ OLS, the SEs will be incorrect so use computer's 2SLS estimate procedure -OLS applied to each stochastic equation in structural equations after substitution is made -to get the predicted value for exo var --everything to the right of the var ignores that there's a 1st stage
why using HC standard errors also changes reported F values
-test knows that there's a problem with F so gets the F by looking at the corrected Ts and not sum of squares
purpose and interpretation of certain post-estimation tests
-test of overidentifying restrictions and the test of endogeneity help
consequences of serial correlation for F statistics
-they are inaccurate bc F stat no longer has an F distribution when there is SC
durbin-watson d statistic d
-used to determine if there is a 1st order serial correlation in the error term of an equation by examining the residuals of a particular estimation of that equation -only use when assumptions are met: --1. reg model includes an intercept term --2. SC is 1st order in nature --3. reg model does NOT include lagged dep var as indep var =0 if extreme pos SC =2 if no SC =4 if extreme neg SC --bc d=2(1-ρ) -not used as frequently bc approach requires you to be able to find out distribution of DW if null is true --dist depends on specific features of X vars --if don't know dist then don't know if you're in acceptance or rejection region --K=number of exp vars EXCLUDING constant (at bottom of table) -STATA reports d stat with d-stat(# of exp vars including constant aka k+1, n) STATA: estat dwatson --df under total is n-1 after 1st observation has been lost
how to use VIF to diagnose multicollinearity
-value that's considered "large enough to be important"= 5 1. run an OLS reg that has Xi as a function of all the other exp vars in the equation 2. calculate the variance inflation factor for beta hat VIF(Ḃᵢ) = 1/(1-Rᵢ²) STATA: estat vif
use of correlation matrices
-value that's considered "large enough to be important"=.8 corr x1 x2 x3 etc -correlated matrix btwn Xs -1 just means that x1 is perfectly correlated w/ itself so don't worry about diagonals
proportionality factor Z
-variance of the error term changes proportionally to Zi -Z may or may not be one of the Xs in the linear reg equation -var(Eᵢ) = σ²Zᵢ help
perfect multicollinearity
-violates classical assumption VI (no ind var is a perfect linear function of one or more other ind vars) --if 2 exp vars are related then OLS computer program will find it difficult to distinguish effects of one var from the effects of another -variation in one exp var can be completely explained by movements in another exp var X₁ᵢ=𝛼₀ + 𝛼₁X₂ᵢ --𝛼s = constants --Xs=individual vars in reg model --NO error term -ex: same var measured in diff units (distance measured in miles and the other var in kilometers) --when 2 vars always add up to the same amount (% that voted and % that didn't will always be 100%) -ruins ability to estimate coefficients bc the 2 vars can't be distinguished (can't hold the other constant) -can detect it by asking whether one var equals a multiple of another or if one can be derived by adding constant to another if a var equals the sum of 2 other vars -severe multicol can vary from sample to sample -doesn't cause estimators to be biased or inconsistent
pure serial correlation
-violation of classical assumption IV that diff observations of the error term are uncorrelated w/ each other -most common w/ time series data sets --value of error term from one period depends in some way on value of error term in another period -assumption of uncorrelated observations of error term are violated in correctly specified equation
how to test for pos correlation
1. obtain OLS residuals from equation to be tested and calculate the d stat with d equation 2. determine sample size and number of explanatory vars and then consult stat table to find upper critical d value (du) and lower critical d value(dl) 3. given null hypoth of no SC and 1 sided alt hypoth: • H0: p ≤ 0 (no pos SC) • Ha: p > 0 (pos SC) • Decision rule: o If d < dL --> reject H0 o If d > du --> do not reject H0 o If dL ≤ d ≤ du inconclusive
heteroskedasticity
-violation of classical assumption V -assumption V=observations of error term are drawn from distribution w/ constant var --if met then can assume that distribution has mean of 0 and var of sigma squared -when OLS is applied to hetero models it is no longer the minimum variance estimator (still unbiased) -more likely in cross sectional models than in time series models -common in data sets where there's a wide disparity btwn largest and smallest observed value of dep var -messes you up w/ OLS bc not using correct amount of weight for vars --if some vars have larger error variances then have to give them more weight --OLS gives them all equal weight -consequences are similar to SC (not efficient and messes up SEs but still unbiased)
influential observation
-weird outlier connected by a line -often arise bc of mistakes
sequential binary model
-when a decision has different alternatives
dominant var
-when a var that is definitionally related to the dep var is included as an indep var in a regression equation -masks the effects of other ind vars in equation bc so highly correlated to dep var
why traditional adjusted R-squared can be misleading in linear probability model and how the fraction or percent of right answers may be a misleading indication of goodness
1. R bar squared is not an accurate measure of overall fit oDi can only equal 1 or 0 but Dhati must move in a continuous fashion from one extreme to the other Means that Dhati is likely to be diff from Di for some range of Xi oR bar squared is likely to be much lower than 1 even if model actually does explain the choices involved
generalized least squares (cochrane-orcutt)
1. Yᵢ = β₀ + β₁X₂ + Eᵢ --pretend that i's are t's Eᵢ = ρEᵢ₋₁ + uᵢ 2. ρYᵢ₋₁ = ρβ₀ + ρβ₁Xᵢ₋₁ + ρEᵢ₋₁ 1-2 = β₀(1-ρ) + β₁(Xᵢ - ρXᵢ₋₁) + (Eᵢ - ρEᵢ₋₁) 1-2 is Cochrane-Orcutt transformation -have to check box in prais for CO but don't use bc reg prais is more accurate
solution strategies when confronted w/ multicollinearity
1. do nothing at all bc every step has a drawback --multicol may not reduce t-scores enough to make them insignificant --deletion of multicol var that belongs in equation will cause specification bias 2. drop redundant var 3. increase sample size --not possible for most time series data sets (too expensive/time-consuming)
what generalized least squares does and does not do to OLS coef estimates and their likely effect on estimated standard errors
1. error term not SC --OLS estimation if min var 2. slope coef of beta1 is same as slope coef of og SC equation --coef estimated w/ GLS have same meaning as those estimated w/ OLS 3. dep var has changed compared to that in og equation that means that the GLS r bar squared isn't comparable to OLS r bar squared
the conditions that an instrumental var must satisfy for it to be suitable for use as means of ameliorating simultaneity
1. must be correlated w/ the right hand side var that is misbehaving (that is correlated w/ the error term) 2. can't itself be correlated w/ the error term 3. can't already be an explanatory var itself --in a correctly specified model --implication: if u have a demand curve you're estimating and you think income should be an explanatory var then can't estimate demand durve w/o income and then use income as an explanatory var -if there's some var you knew shifted supply but didn't have an effect on demand then could use it to find shape of demand curve -example in week 14 notes
consequences of serial correlation for the estimated coefficients, standard errors, and t statistics
1. pure SC does NOT cause bias in coef estimates --if SC is impure then bias may be introduced by use of incorrect specification --lack of bias does NOT mean that OLS estimates will be closet to true coef values --std errors will increase bc of SC --still centered around true beta 2. SC causes OLS to no longer be minimum variance estimator (bc assumption VI violated) --SC error term causes dep var to fluctuate in way that OLS estimation procedure attributes to ind vars --overestimates just as likely as underestimates (unbiased) 3. causes OLS estimates of the SE(beta hat)s to be biased --unreliable hypoth testing --biased bc SE are key component in t-stat and SEs cause biased t-scores and therefore unreliable hypoth testing
difficulty of defining a goodness of fit measure when dependent var is dummy var
help -dummy vars have set numbers of 0 or 1 but the goodness of fit measure may go above 1 which doesn't mean anything -in lab had to make anything greater than .5 equal to one and anything below that equal to 0
test of over-identifying restrictions
STATA: estat overid -says if you have bad instruments -if get an error that means equation is perfectly identified
how to obtain, identify, and interpret all the relevant model estimates, marginal effects, predicted probabilities, goodness of fit measures, etc in STATA
help
multivariate logit model
help
test of endogeneity
help
impure serial correlation
Yᵢ = β₀ + β₁X₁ᵢ + Êᵢ -left out an important variable so X₂ᵢ and original error term went into Ê --or another specification error
what is done in the first and second stage of 2SLS when those stages are performed individually and not part of a packaged 2SLS command
help
what pseudo-r-squared does and does not have in common with ordinary r-squared in an OLS model
help
simple models for pure correlation
ex: first order serial correlation help
hc1, hc2, hc3
hc1=default (robust) only good w/ large samples hc2=don't really use hc3=best default bc good for smaller samples
advantage of 2SLS over indirect least squares when you have more valid instruments than endogenous vars
help
connection between sign of estimated logit (probit) coef and whether the estimated probability of the event rises or falls as the independent var rises
help
graphically explain bad consequences of simultaneity bias are for OLS estimates in the case of the famous and important example of supply/demand systems
help
how causation btwn X and Y var results in violation of key assumption of OLS
help
how indirect least squares can produce consistent parameter estimates when there is simultaneity bias
help
if 2 vars are highly correlated...
it'll make it seem like the 3rd (uncorrelated) var is insignificant even tho it's important and its sign will be different than expected
understand that hc standard errors and HAC standard errors won't permit you to construct prediction intervals for new observations because...
they make no assumption about the variance of the error term in a new observation