Econometrics
In a simple random sample of size N from a given population:
- Each member of the population is equally likely to be in the sample - Every possible sample of size N from this population has an equal chance of being selected
discrete random vairable
- countable number of possible values, such as 0,1, 2
what does the standard error value show
- if an estimator has a large standard deviation, there is a higher probability that an estimate will be far from its mean - if an estimator has a small standard deviation, there is a higher probability that an estimate will be close to its mean
selection bias
- occurs when a sample selection systematically excludes or underrepresents certain groups - selection bias often happens when we use a convenience sample consisting of data that are readily available
survivor bias
-we necessarily exclude members of the past population who are no longer around - in a retrospective study (looks at past data for a contemporaneously selected sample)
what is difference between estimate and estimator?
Estimator- is a mathmatical technique applied to a sample of data to produce an estimate of the true population regression coefficient Estimate - is the computed value of a population regression coefficient by an estimator OLS is an estimator The Bs produced by OLS are estimates
what does a higher R^2 mean
The higher the R^2 is, the closer the estimated regression equation fits the sample
Econometrics
econometrics is the quantitative measurement and analysis of actual business and economic phenomena. Tries to quantify theoretical relationships
Intuition behind CLT
even if population doesn't have a normal distribution, the sampling distribution of the mean will approach a normal distribution as the sampling size increases
true or false: the goal is to maximize adjusted R^2
false
should the coefficient be 0 for a nonsensical variable?
in theory yes. but with any given sample there is some random correlation and provides a minor explanation. typical to get non 0 estimated coefficient even for nonsensical variable. only if new coefficient is exactly 0
adjusted R^2
is constructed to "correct" or penalize for more variables comparing 2 regressions with same dependent variable
Level of significance
level of significance indicates the probability of observing an estimated t-value greater than the critical value if the null hypothesis were correct Lower the significance level the better
residual
observation of dependent variable and value estimated from regression equation) The residual is an estimate of the error term (ε)
Self-selection bias-
occur when we examine data for a group of people who have chosen to be in that group
- sample:
part of the population that we actually observe
The exact distribution of t depends on
sample size -As sample size increases, we are increasingly confident of the accuracy of the estimated standard deviation -As the sample size increases N→ infinity, the sample standard deviation approaches the population standard deviation s→σ, and the distribution of t approaches the normal distribution, Z
population:
the entire group of items that interest us
Lower the degrees of freedom,
the less reliable the estimates are likely to be
Recall that efficiency is a measure of
the quality of an estimator. An inefficient estimator has a larger variance and results in a less precise estimate of the "true" parameter value. An inefficient estimator may then result in incorrect inference in a hypothesis test. An efficient estimator is also the minimum variance unbiased estimator (MVUE) and therefore "best".
nonresponse bias
the systematic refusal of some groups to participate in an experiment or to respond to a poll
Explained sum of squares (ESS)
variation that can be explained by the regression
Residual Sum of Squares (RSS):
variation that cannot be explained by the regression
If there is a nonsensical variable, how do we expect R^2 to be impacted
weakness of R^2 is ading a variable will decrease (Never increase) the summed square residual even if its a nonsensical variable. therefore, if you add nonsensical variable, usually increase R^2
if add a new independent variable, why does other coefficients change
when not including we are holding that variable constant, when we include it we are taking it into account
3 reasons for Econometrics
1) Describing Economic Reality - econometrics can quantify and measure marginal effects and estimate numbers for theoretical equations - ex: consumer demand - relationship between quantity demanded (Q) and Price 9P), price of subsittitue, and disposable income 2) Testing Hypothesies about economic theory and policy - much of economics involves building theoretical models and testing them against evidence 3) Forecasting Future Economic Activity - the most difficult use of econometrics is to forecast or predict the future using past data - Economists use econometrics to forecast a variety of variables (GDP, sales, inflation, etc) - accuracy of forecasts depends in large measure on the degree to which the past is a good guide to the future
7 classical assumptions
1) Linear in parameters 2) errors have 0 mean 3) errors uncorrelated with regressors 4) no perfect multicollinearity 5) no heteroskedasticity 6) no serial correlation 7) errors are normally distributed
4 sources of variation:
1) omitted or left-out variables 2) measurement error in the data 3) underlying theoretical equation that has a different functional form (or shape) than the one chosen for the regression 4) purely random and unpredictable behavior
define Central Limit Theorem
CLT- "if Z s a standardized sum of N indep identically distributed random variables with a finite, nonzero standard deviation, then the probability distribution of Z approaches the normal distribution as N increases" - even if population doesn't have a normal distribution, the sampling distribution of the mean will approach a normal distribution as the sampling size increases (Intuition behind CLT)
"Strictly speaking, what is the best interpretation of a 95% confidence interval for the mean?
If repeated samples were taken adn 95% interval was compued for each sample, 95% of the intervals would contain the population mean.
Z-score
Measures how many standard deviations X is above or below its mean -- Z score indicates the number of standard deviations a raw score lays above or below the mean. - Z score table shows % of values to the left of a given z-score on a standard normal distribution - to find score that is to the righ tof the mean - have to do 1- Z score found from table
what is mean and standard deviation of a standardized random variable
No matter what the initial units of X, the standardized random variable Z has a mean of 0 and standard deviation of 1
Omitted condition
Omitted condition- event not represented by a dummy variable - the omitted condition forms the baseline against which the included condition(s) are compared
how is it possible for adjusted R^2 decrease but R^2 increase
R^2 is adjusted for degrees of freedom vs adjusted R^2 is not
type I error
Type 1: reject a true null hypothesis Ex: H0: 0 HA: >0 assume true B is not positive but our estimate leads us to reject the null hypothesis
Type II error
Type 2: Do not reject a false null hypothesis Assume true B is positive but our estimate leads us to not (or fail to) reject the null hypothesis
^R2
^R2 will increase, decrease, or stay the same when a variable is added to an equation depending on whether the improvement in ft outweighs the loss of degrees of freedom ^R2 can be used to compare the fits of equations wththe same dependent variable ^R2 cannot be used to compare the fits of the equations with different dependent variables or dependent variables
stochiastic error term
added to a regression equatio to account for variation in Y that cannot be explained by the included X(s)
One issue w R^2 is adding another indep variable to an equation can never decrease R^2
adding a variable will not change TSS Adding another variable will, in most cases, decrease RSS and increase R^2