Econometrics 1

Ace your homework & exams now with Quizwiz!

heteroskedasticity

- Is the situation when the variances of the error are not constant The variance of the error term, given the explanatory variables, is not constant.at different values, x has different standard errors - different shapes on the graph (at every level of education, characs of data distribution changes, SD is different) (destroys integrity of model) invalidate calculations used to calc t-score OLS estimator is unbiased, but Standard errors is incorrect; need robust regression for test of hypothesis

ceteris paribus

A latin phrase that means "all other things held constant"

coefficient of variation

A measure of dispersion calculated by dividing a distribution's standard deviation by its mean. SD/meanX100% • so what—compares variability in two groups of scores where the means are known to be different (good candidate for the qualifying exam) • allows us to put SD in the yardstick of expected value • if we talk about 20, is 20 large or small, this defines it in terms of the expected mean and SD, like using a yardstick to measure a room, expressing the SD in the unit of the mean

variance

A measure of variability based on the squared deviations of the data values about the mean - average squared differences

dummy variable

A qualitative variable that takes on the value zero or one. Dummy variables only have a slope when they interact- fig 7.1 The main advtg of running dichotomous regression instead of several regression is in the gain in degrees of freedom- as degrees of freedom increases, the statistical power increases - another reason we prefer model with fewer variables - as the number of variables increase, can increase or remain the same- that is why is we use adjusted R squared - Intercept- cons is intercept _ binary Beta 0 when female is 0, then intercept represents males if female = 1 - - we now care about sig of intercept - - it is incorrect to say something is more significant than something else

Adjusted R-Squared

A goodness-of-fit measure in multiple regression analysis that penalizes additional explanatory variables by using a degrees of freedom adjustment in estimating the error variance. (SSR/n-k-1)SST(n-1)

Aysmptote

Properties of estimators and test statistics that apply when the sample size grows without bound.

Confidence intervals

Provide a range of likely values for the population parameter. The importance of confidence intervals comes into play when we are using interval estimation. Interval estimate involves the following steps: 1) select a level of confidence (95% confidence) - note how this is different from the 1% or 5% levels of confidence we use with p-values and point estimates, which is the form of estimation we've been using mostly in class. 2) analyze the sample data 3) extract a number out of a statistical table 4) build an interval that surrounds the sample statistic

Galton 1886

Regression Towards Mediocrity in Hereditary Stature.

Retrospective Data

Data collected in the present based on recollections of past events; apt to be inaccurate because of faulty memory, bias, mood, and situation.

F statistic

F-statistic is the ratio of mean values of SSM / mean SSR the ANOVA test statistic, equal to the ratio of two independent estimates of the common population variance , s2A / s2W , which is also the ratio of explained variance to unexplained variance; F-statistic is the ratio of mean values of SSM / mean SSR

interaction term

One variable interacts with another to impact the dependent variable.The relationship between Y and X1 depends on X2.

Efficiency

if an estimator has smaller variance, it is more efficient - smallest variance is most efficient Ex of target might have unbiased- hits target but does not come close you can correct for bias, but you cannot correct for inefficiency- such as qual studies that do not measure error Consistent and efficient consistent and unbiased consistent but not efficient - which do we use? Most likely choose efficient, altho biased-

x

independent variable, the explanatory variable, the control variable, the predictor variable, or the regressor. (The term covariate is also used for x.)

A statistic

is a numerical value calculated from a sample that is variable and known. Naghshpour, Shahdad (2012-11-10). Statistics for Economics (Kindle Location 4792). Business Expert Press. Kindle Edition. A statistic is a number that is computed from the data in a sample.

unbiased

is achieving consistent results on average • an estimator is said to be unbiased if its expected value on average equals the population characteristic, sample mean equals population mean, if what you are getting is the same as what you are hoping for, on average, then it is unbiased • no such thing as consistency in a statistic, an estimator can be consistent

estimator

it is a rule that can be applied to any sample of data to produce an estimate (p. 102); a statistic based on sample observations that is used to estimate the numerical value of an unknown population parameter

Shortcoming of qualitative analysis (Dr. N)

lack validity because they cannot calculate standard deviation. You can't prove or disprove. With stats, you can calculate margin of error (Type I error) probability of Type I error

hypothesis test

o start with a null and alternative hypothesis based on a hypothesis based on a theory o calculate a test statistic (will either be given to us or will come from the computer) o convert the test statistic into a p value o last step is inference: if p value small enough, reject null, if not small enough, fail to reject • a test is one-tailed if directional • a two-tailed test is half as powerful, looks at two end points in the normal distribution, conceptually the same as confidence intervals

p value

probability value, for your results to be significant, this should be equal to or less than .05 at 5 % level; at 10% level, less than .1 - p value is how much of 1- the area under the curve sums to 1

R-squared

ratio of SSM / SST A statistic, sometimes called the coefficient of determination, defined as the explained sum of squares divided by the total sum of squares. It measures the strength of the relationship between a dependent variable and one or more independent variables

Chi-square Naghshpour, Shahdad (2012-11-10). Statistics for Economics (Kindle Location 4642). Business Expert Press. Kindle Edition.

represents the distribution function of a variance. Naghshpour, Shahdad (2012-11-10). Statistics for Economics (Kindle Locations 4642-4643). Business Expert Press. Kindle Edition.

F statistic

the ANOVA test statistic, equal to the ratio of two independent estimates of the common population variance , s2A / s2W , which is also the ratio of explained variance to unexplained variance • The null hypothesis for an F statistic is always that the model is not a good fit

odds ratio

the chances of returns- a coefficient gives us this information

marginal economic approach

the last unit you produce determines the price

Elasticity

the percent change in one variable given a 1% ceteris paribus increase in another variableA measure of how much one economic variable responds to changes in another economic variable.

consistent

the variance gets smaller as the sample size gets larger

goodness of fit

to see if null hypothesis can be rejected- look at p value of F stat to see validity of model- R squared is measure of goodness of fit

prediction

using previous data to make a prediction about data -outside of data - predicting future temp using our data (book sometimes uses this interchangeably, not correct)

inference

using statistics to draw conclusions about parameters (Dr. N) book's definition: • in inference, take what you observe, then use that to make conclusions about parameters which never observed • deduce a conclusion from observing facts, and based on those facts, decide what is the probability that this can happen or cannot happen

estimation

within our data (contrast with prediction) any kind of computation - estimate- between data (ex: avg temp)- data that we have

simple linear regression model

y = b0 + b1x b2 + u

Z- score

Z- score- a variable is standardized in the sample by subtracting its mean and dividing by its standard deviation (p 189 PDF) - called standardized coefficients or beta coefficents- betaj hat? Z-score- observed- minus expected/SD - 2 options standardize xs - dependent and independent variables- beta coefficients

multicollinearity

a case of multiple regression in which the predictor variables are themselves highly correlated

causal effect

a ceteris paribus change in one variable has an effect on another variable.

parameter

a characteristic of a population that is constant and generally unknown. why no population variance—it is a constant number, constant numbers do not have variance

dichotomous variable

a discrete variable that has only two possible amounts or categories

z score

a measure of how many standard deviations you are away from the norm (average or mean) a variable is standardized in the sample by subtracting its mean and dividing by its standard deviation (p 189 PDF) - called standardized coefficients or beta coefficents

t-statistic

coefficient divdided by standard error. used to test hypotheses about a population when the value of the population variance (and standard deviation) is unknown; uses the same formula as the z-statistic except that the estimated standard error is substituted for the standard error in the denominator

Chow test

-model- restricted/unrestricted - put F formula 2.47 p. 150 Chow test- Dr. N has written articles using Chow test - notion that when two models are not nested- you augment by creating bigger model that becomes unrestricted model -procedure to see if one model is better than another Use the chow test when you think your sample is not homogeneous but you have two categories that should not be lumped together when estimating the regression model

linear

...

Gauss-Markov assumptions

1) Linear 2) Random 3) NPC - no perfect collinearity 4) Zero conditional mean (error =0) 5) HomoSkedascity A set of assumptions under which OLS is BLUE Assumption MLR.1 (Linear in Parameters) The model in the population can be written as y 5 b0 1 b1x1 1 b2x2 1 ... 1 bkxk 1 u, where b0, b1, ..., bk are the unknown parameters (constants) of interest and u is an unobserved random error or disturbance term. Assumption MLR.2 (Random Sampling) We have a random sample of n observations, {(xi1, xi2, ..., xik, yi): i 5 1, 2, ..., n}, following the population model in Assumption MLR.1. Assumption MLR.3 (No Perfect Collinearity) In the sample (and therefore in the population), none of the independent variables is constant, and there are no exact linear relationships among the independent variables. Assumption MLR.4 (Zero Conditional Mean) The error u has an expected value of zero given any values of the independent variables. In other words, E(uux1, x2, ..., xk) 5 0. Assumption MLR.5 (Homoskedasticity) The error u has the same variance given any value of the explanatory variables. In other words, Var(uux1, ..., xk) 5 s2.

Regression - 3 properties

1. Unbiasedness—an estimator is said to be unbiased if, and only if, its expected value is equal to the parameter • Regression is popular because can theoretically can prove that the estimator is an unbiased estimator of the population slope • Theoretically if assumptions of regression are met the estimate is unbiased 2. Consistent—a estimator is said to be a consistent estimator if its variance gets smaller as the sample size gets larger 3. Efficiency—an estimator is said to be efficient if it has a smaller variance than any other estimator • OLS—refers to minimization of the squared errors (distance between the observations and the regression line), meets the definition of efficiency

Type I error

Error of rejecting null hypothesis when in fact it is true (also called a "false positive"). You think you found a cause effect relationship but ONE IS NOT THERE

standard deviation

A computed measure of how much scores vary around the mean score.The square root of the variance

correlation coefficient

A statistical measure of the extent to which two factors vary together, and thus of how well either factor predicts the other. from - to +1 • when close to 1 in absolute value—it is high, when close to 0 it is low • procedures to test whether a correlation is significant or not • still does not mean causality or even related • association is a better word to use than correlation (favorite word of IDV students—correlation is specifically defined relationship), in scientific writing use association instead of using correlation casually

Time series data

A time series data set consists of observations on a variable or several variables over time.

Kahn & Roseman

Advertising as an Engineering Science

Type II error

An error that occurs when a researcher concludes that the independent variable had no effect on the dependent variable, when in truth it did; a false negative.

population regression function (PRF)

E(y/x) = b0 + b1x

Sum of squares model

SSM uses the differences between the mean value of Y and the regression line

Sum of Squares - Residuals

SSR uses the differences between the observed data and the regression line

SST

SST uses the differences between the observed data and the mean value of Y

standard error of the regression

The SER is an estimator of the standard deviation of the error term. This estimate is usually reported by regression packages, although it is called different things by different packages. (In addition to SER, s ˆ is also called the standard error of the estimate and the root mean squared error. (p. 100 of ecometrics PDF)

zero conditional mean

The error u has an expected value of zero given any value of the explanatory variable. In other words, E(u│x)=0

Homoskedasticity

The errors in a regression model have constant variance conditional on the explanatory variables.Var(u/x) =sigma squared. SSM sum of sq model over sum of sq residual- SSM/SSR - SSM/SST is coefficient of determination (it has to be less than one)- we cannot explain more than total variation Sum of squares model cannot be more than sum of squares total Don't take numbers at face value- make sure they are statistically significant - how do you determine? Use t-test- divide coefficient and divide by standard error

R squared

The percentage of total variation in the response variable, Y, that is explained by the regression equation; Explained variance/total variance (of total model in simple regression- 1 variable) On stata, model Sum of Squares divided by total Sum of Squares = R2 - tells us how much of the variation explains

Gauss-Markov Theorem

Under Assumptions MLR.1 through MLR.5, ˆb 0, ˆb 1, ..., ˆb k are the best linear unbiased estimators (BLUEs) of b0, b1, ..., bk, respectively (p. 102 of the PDF) Theorem 3.4. the theorem that states that under the five Gauss-Markove assumptions (for cross-sectional or time-series models), the OLS estimator is BLUE (best linear unbiased estimator )conditional on the sample values of the explanatory variables)., Given classical assumptions I through VI, the OLS estimator of βᵢ is the minimum variance estimator from among the set of all linear unbiased estimators of βᵢ, for i = 0, 1, 2,...,i.

sampling variance of the OLS slope estimator

Var (Betahat) = sigma sq/ SST (1-R2 subj) p. 94 of PDF The variance of a sample mean (Ymeanj) that is used to estimate an unknown population mean (β0j), and is given by σ2/nj.

sampling variances of the OLS slope estimators

Var (Bhat sub j) = Sigma squared divided by SSTj(1-R2subj)

R squared

Wooldrige: In a multiple regression model, the proportion of the total sample variation in the dependent variable that is explained by the independent variable. Explains the variation of Y that is explained by ONE independent variable.

Endogenous Explanatory Variable

an explanatory variable in a multiple regression model that is correlated with the error term, either because of an omitted variable, measurement error, or simultaneity.

exogenous explanatory variable

an explanatory variable that is uncorrelated with the error term.

variance

average differences of the mean, squared. A measure of spread within a distribution (the square of the standard deviation).

Cleary & Sharpe randomness in the stock market

finanical specialists did not do better than random changes in the stock market in the short term; changes in the long term occurred with changes in the economy. Changes in the market are inherently random & hard to predict

y

dependent variable, the explained variable,response variable, the predicted variable, or the regressand

p value

do not use to judge "how signficant" a variable is- either it is, or it isn't based on the pre-determined value. Do not use to compare relative stregnth of variables

u

error term or disturbance in the relationship, represents factors other than x that affect y. A simple regression analysis effectively treats all factors affecting y other than x as being unobserved. You can usefully think of u as standing for "unobserved."

panel data (or longitudinal data)

set consists of a time series for each cross-sectional member in the data set. As an example, suppose we have wage, education, and employment history for a set of individuals followed over a ten-year period. Or we might collect information, such as investment and financial data, about the same set of firms over a five-year time period. Panel data can also be collected on geographical units. For example, we can collect data for the same set of counties in the United States on immigration flows, tax rates, wage rates, government expenditures, and so on, for the years 1980, 1985, and 1990. The key feature of panel data that distinguishes them from a pooled cross section is that the same cross-sectional units (individuals, firms, or counties in the preceding examples) are followed over a given time period.

beta coefficient

standardized variable - when all variables are standardized in order to compare different slopes Standardized Coefficients: Regression coefficients that measure the standard deviation change in the dependent variable given a one standard deviation increase in an independent variable. Standardized Random Variable: A random variable transformed by subtracting off its expected value and dividing the result by its standard deviation; the new random variable has mean zero and standard deviation one. to standardize coefficient- z score- observed-expected divided by Standard deviation Ability to compare slope of unlike items- test scores and salary

central limit theorem

states that in repeated random samples from a population, the sample mean will have a distribution function approximated by normal distribution, the expected value of the sample mean is equal to the true value of the population mean, and the variance of the sample mean is equal to population variance divided by the sample size. Naghshpour, Shahdad (2012-11-10). Statistics for Economics (Kindle Locations 4637-4640). Business Expert Press. Kindle Edition. ..., As the sample size increases, the distribution of the sample mean of a randomly selected sample approaches the normal distribution. • A theorem that states that the sampling distribution curve (for sample sizes of 30 and over) will be centered on the population parameter value and it will have all the properties of a normal distribution. • if we sample randomly and repeatedly, then the sample statistic (mean, median, etc) will be an unbiased estimate of the population parameter and it will have a standard deviation which is going to decline • inference about sample mean—will be unbiased (that is, same as the sample mean), variance will be smaller than the Var. of the population • why no population variance—it is a constant number, constant numbers do not have variance • the sample mean will be an unbiased estimator—on avg. sample mean will equal pop mean, the variance of the sample will equal variance of the pop divided by sample size • that's why if sample size increase, sample variance will get smaller and smaller • what does it mean if sample variance approaches 0? Sample mean = population mean (will become pop mean when probability = 0) • only way to be zero is to have an infinite sample size • as sample size gets larger, estimate becomes more consistent

normal distribution

• Mean=0 • SD=1 • Things more likely to happen around the mean • Look at corners areas in the normal distribution: o the likelihood that something falls in a particular area. • if SD is big, smaller probability of covering the entire area • building the confidence interval approach • If the shaded area under the curve is 0.06, 6% probability that the number would fall in to that area, can't make a conclusion • take a random sample, mean weight is 250, shaded area is 0.008, therefore 0.8% chance that it would fall in this area, 0.008 probability of being wrong • when being analytical, provide the probability of doing a type 1 error • central limit theorem • distinguishes analysis from superstition • no SD for 1 or 2 case samples (sample size 2 has 1 degrees of freedom), must have a minimum of 3, in practice require 30 observations or more


Related study sets

NUR 417 Final Genomics Study Guide

View Set

Democratic Reforms in Britain 5.5

View Set

How Computers Find Each Other on Networks

View Set

Grammar 1 chapter 4 exercise 20 page 107

View Set

The Unit Circle 0 through 90 degrees (trig of degree/radian=?)

View Set

Bacteria That Cause Foodborne Illness

View Set