ECON 425 Exam 1

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Estimated Regression Equation

"True" but unknown, Theoretical Regression Equation (population): Yi = B0 + B1Xi + ei Estimated Regression Equation (sample): Yˆ i = Bˆ 0 + Bˆ 1Xi + ei

Example of multivariate regression: Wagei = B0 + B1Expi + B2Educi + B3Genderi + ei

it specifies that a worker's wage is determined by their experience, education, and gender. B1 gives us the impact on wages of a one-year increase in experience, holding constant education and gender

Example 1 of Multivariate Regression Model

Consumption of Eggst = B0 + B1Pricet + B2Incomet + et Consumption of Eggs = per capita consumption of eggs in year t (in dozens per person) Price = the price of a dozen eggs in year t Income = per capita income in year t (in thousands of dollars) Consumption \of Eggst = 24 ≠ 0.88Pricet + 11.9Incomet The estimated coefficient of Income, 11.9, tells us that egg consumption will increase by 11.9 eggs (in dozens) per person if income goes up by $1,000, holding constant the price of eggs. The estimated coefficient of Price, -0.88, tells us the impact of a one-cent increase in the price of eggs on the per capita consumption of eggs, holding constant per capita income. The intercept term, 24, tells us how many dozens of eggs a person will consume if both independent variables were zero.

data structures

Cross-Sectional Data Time Series Data Pooled Cross Sections Panel or Longitudinal Data

random sampling

Cross-sectional data is a random sample from some population We look at data because we want to learn something about the population A random sample: is representative of the population of interest and gives us the best chance of learning about the population

Cautionary Tale of R^2

Just because a model has a relatively high R^2 does not mean it is a good model For example, an estimated equation with a "good fit" but with an implausible sign for an estimated coecient might give implausible predictions and thus not be a useful equation: Take Law of Demand, which states as Pup, QDdown; Pdown, QDup. Imagine a law of demand model with a relatively high R2 but has a positive coecient for price actually causing an increase in quantity demanded (this goes against years on economic theory and doesn't make a great deal of sense) Or by adding independent variables that have very little to do with the dependent variable but happen to be highly correlated with one another (i.e., using the number of drownings to help explain the sale of mozzarela cheese or the sale of bananas to help explain Tesla's stock price)

Ceteris paribus

means "other things equal" or "everything else held constant" it allows economists to change one variable while holding everything else constant to measure the change in the dependent variable

Simple (Bivariate) Linear Regression

The "simple" or "bivariate" regression model is the case of two variables: Y = B0 + B1X When we "regress" Y on X, we are trying to explain Y through X The constants Bi are the parameters of the econometric model. They describe the directions and strengths of the relationship between the dependent variable and independent variable(s). B0 is the intercept term: it indicates the value of Y when X equals zero B1 is the slope coefficient: it indicates the amount that Y changes when X increases (decreases) by one unit. Y: Dependent variable, Explained variable, Response variable, Predicted variable, Regressand X: Independent variable, Explanatory variable, Control variable, Predictor variable, Regressor

OLS Estimation of Multivariate Regression Models

The application of OLS to an equation with more than one independent variable is quite similar to its application to a simple linear regression mode A multivariate regression model with two independent variables: Yi = B0 + B1X1i + B2X2i + ei Reminder: the goal of OLS is to choose those —ˆs that minimize the summed squared residuals

Assumption 5

The error term has a constant variance Meaning the error term does not vary much as the value of the independent variable changes, it's homoskedastic. We assume that the variance (or dispersion) of the distribution from which the observations of the error term are drawn is constant. That is, the variance of the error term cannot change for each observation or range of observations If it does, there is heteroskedasticity present in the error term When using regression models, researchers look for homoskedastic variance/dispersion. Heterskedastic dispersion/variance, is seen as a problem to be solved.

Assumption 7

The error term is normally distributed Basically implies that the error term follows a bell-shape Strictly speaking not required for OLS estimation - Without it though we would not be able to conduct the t-statistic and the F-statistic, unless the error term is normally distributed Its major application is in hypothesis testing and confidence intervals, which uses the estimated regression coecients to investigate hypotheses about economic behavior (Chapter 5)

Why Use OLS?

There are at least three important reasons for using OLS to estimate regression models: 1. OLS is relatively easy to use - OLS is the simplest of all econometric estimation techniques 2. The goal of minimizing q e2 i is quite appropriate from a theoretical point of view - Residuals measure how close the estimated regression comes to the actual observed data: ei = Yi ≠ Yˆ i (i = 1, 2, 3, ..., N) 3. OLS estimates have a number of useful characteristics/properties - The sum of the residuals is exactly zero - OLS can be shown to be the "best" estimator possible under a set of specific assumptions An estimator is a mathematical technique that's applied to a sample of data to produce a real-world numerical estimate of the true population regression coefficient. OLS is an estimator, and a Bˆ produced by OLS is an estimate

E(Y |X) = B0 + B1X

Unfortunately, the value of Y observed in reality is unlikely to be exactly equal to the deterministic value E(Y |X). As a result, the stochastic element (e) must be added to the equation: y = E(Y |X) = B0 + B1X + e

A multivariate regression model with K independent variables:

Yi = B0 + B1X1i + B2X2i + ... + BK XKi + ei

Multivariate Regression Model

Yi = B0 + B1X1i + B2X2i + B3X3i + ei (i = 1, 2, ..,N) The meaning of the regression coefficient B1 in the equation above is the impact of a one-unit increase in X1 on the dependent variable Y, holding constant X2 and X3 (i.e., ceteris paribus) Multivariate regression coefficients serve to isolate the impact on Y of a change in one variable from the impact on Y of changes in other variables. This is possible because multivariate regression takes the movements of X2 and X3 into account when it estimates the coefficient for X1 Note: if a variable(s) is not included in an equation, then its impact is not held constant in the estimation of the regression coefficients.

Econometrics

is the application of statistical methods to economic models

The Classical Assumptions

1. The regression model is linear, is correctly specified, and has an additive error term 2. The error term has a zero population mean 3. All independent variables are uncorrelated with the error term 4. Observations of the error term are uncorrelated with each other (no serial correlation) 5. The error term has a constant variance (no heteroskedasticity) 6. No independent variable is a perfect linear function of any other independent variable(s) (no perfect multicollinearity) 7. The error term is normally distributed (this assumption is optional but usually is invoked)

Panel or Longitudinal Data

A panel data (or longitudinal data) set consists of a time series for each cross-sectional unit/member in the data. same multiple units are followed over multiple time periods The key feature of panel data that distinguishes them from a pooled cross-section is that the same cross-sectional units (individuals, firms, or countries,..,) are followed over a given time period. ex: Annual obesity rate for WV, KY, VA, and PA from 2000 through 2022, Average HH income for each state for each year for 1990, 1995, and 2000.

Simple vs. Multivariate Regression Model

A simple linear regression model has one independent variable A multiple (multivariate) linear regression model has two or more independent variables. Almost all published papers are some form of multiple linear regression, it is very rare to see a simple linear regression model published.

Review Simple Linear Regression: - Wagei = B0 + B1Educi + ei - Crime Ratess = B0 + B1# of Police Officerss + es - Quantity Demandedx = B0 + B1Pricex + ex

Deterministic: E(Wage|Educ) is the average wage given years of education. If the average wage of college graduates (16 years of education) earns around $55,000, then $55,000 is the expected value of college graduates' wages. A problem is that some college graduates will earn more than the average while others will earn less than the average. To address this variation, since not all college graduates will earn $55,000 wage, we must add the stochastic element (e) to the equation

The Sampling Distribution of B^

Each different sample of data typically produces a different estimate of B The probability distribution of these Bˆ values across different samples is called the sampling distribution of Bˆ These Bˆs usually are assumed to be normally distributed because the normality of the error term implies that the OLS estimates of B are normally distributed as well. Let's look at an example of a sampling distribution of —ˆ by going back to our exam score and hours studied example. ExamScorei = B0 + B1HoursSpentStudyingi +ei Properties of the Mean: An estimator Bˆ is an unbiased estimator if its sampling distributiong has as its expected value (mean/average) the true (population) value of B E(Bˆ) = B Properties of the Variance: We would also like the distribution to be as narrow (or precise) as possible For a Bˆ distribution with a small variance, the estimates are likely to be close to the mean of the sampling distribution

Example 2 of Multivariate Regression Model

Fin Aidi = B0 + B1Parenti + B2HSRanki + et - Fin Aid = the financial aid (measured in dollars of grant per year) awarded to the ith applicant - Parent = the amount (in dollars per year) that the parents of the ith student are expected to contribute to college expenses - HSRank = the ith student's GPA rank in high school, measured as a percentage (ranging from a low of 0 to a high of 100) Fin\Aidi = 8927 ≠ 0.36Parenti + 87.4HSRanki + et - The -0.36 means that the model implies that the ith student's financial aid grant will fall by $0.36 for every dollar increase in their parents ability to pay, holding constant high school rank. - The 87.4 means that the model implies that the ith student's financial aid will increase by $87.40 for each percentage point increase in high school rank, holding constant parents' ability to pay - The 8927 means that the model implies that the ith student's financial aid will be $8,927 if both independent variables are zero.

Regression analysis

is is a statistical technique that attempts to "explain" movements in one variable, the dependent variable, as a function of movements in a set of other variables, called the independent variables

Assumption 6

No independent variable is a perfect linear function of any other independent variable Perfect collinearity between two independent variables implies that: - they are really the same variable, or - one is a multiple of the other, and/or - that a constant has been added to one of the variables Examples: - Controlling for weight in both pounds and kilograms - Controlling for height in both inches and centimeters - Controlling for gender with both a male = 1 and female = 1 dummy - Controlling for population and gender where the male and female numbers sum to the population variable This issue can be resolved by simply dropping one of the perfectly collinear variables from the equation

Assumption 4

Observations of the error term are uncorrelated with each other We are assuming COV(ej , es )=0 If COV(ej , es ) "= 0, the error is no longer "truly random" This assumption is most likely to be violated in time-series models: - An increase in the error term in one time period (a random shock, for example) is likely to be followed by an increase in the next period, also. If, over all the observations of the sample 't+1 is correlated with 't then the error term is said to be serially correlated (or auto-correlated), and Assumption IV is violated

Pooled Cross Sections

Some data sets have a cross-sectional and a time series feature to them combines independent cross-sectional data that has been collected over time ex: Think about a survey that is conducted every fall semester on the incoming freshmen class at WVU. This survey is conducted every year and a new random sample of incoming WVU students is taken using the same survey questions. We are able to increase our sample size by forming a pooled cross-section by combining two or more survey years/occurrences.

Simple Linear Regression Example: Weighti = B0 + B1Heighti + ei

The "i" subscript refers to a different person in the sample Another way to think about the subscripts is: - Weightjustin = B0 + B1Heightjustin + ejustin - Weightcraig = B0 + B1Heightcraig + ecraig - Weightrachel = B0 + B1Heightrachel + erachel

Ordinary Least Squares (OLS)

is a regression estimation technique that calculates the Bˆ's so as to minimize the sum of the squared residuals:

Estimated Regression Equation

The estimated regression equation will have actual numbers in it (the observed, real-world values of X and Y are used to calculate the coefficient estimates) The estimated regression coefficients Bˆ 0 and Bˆ 1 are the empirical best guesses of the true regression coecients and are obtained from the data from a sample of the Ys and Xs Yˆ i is the estimated/fitted value of Yi. It represents the value of Y calculated from the estimated regression equation. It is our prediction of E(Yi|Xi) from the regression equation. The difference between the estimated/fitted value of the dependent variable (Yˆ i) and the actual value of the dependent variable (Yi) is defined as the residual (ei) ei = Yi ≠ Y Essentially, the residual is the difference between the observed Y and the estimated regression line (Yˆ ) The closer Yˆ s are to the Ys in the sample, the better the fit to the equation

Y = B0 + B1X + e

The expression B0 + B1X is called the deterministic component of the regression because it indicates the value of Y that is determined by a given value of X This deterministic component can also be thought of as the expected value of Y given X, the mean value of the Ys associated with a particular value of X.

The Meaning of Multivariate Regression coefficients

The general multivariate regression model with K independent variables can be represented by the following: Yi = B0 + B1X1i + B2X2i + B3X3i + ... + BK XKi + ei Where i, as before, goes from 1 to N and indicates the observation number. X1i indicates the ith observation of independent variable X1, while X2i indicates the ith observation of another independent variable, X2,... Multivariate regression coefficients indicate the change in the dependent variable associated with a one-unit increase in the independent variable in question, holding constant the other independent variables in the equation. The coefficients B1 measures the impact on Y of a one-unit increase in X1, holding constant X2, X3,... but not holding constant any relevant variables that might have been omitted from the equation B1 measure the impact on Y of a one-unit increase in X1, holding constant X2, X3,... but not holding constant any relevant variables that might have been omitted from the equation B1 measure the impact on Y of a one-unit increase in X1, holding constant X2, X3,... but not holding constant any relevant variables that might have been omitted from the equation

Estimating Single-Independent-Variable Models with OLS

The purpose of regression analysis is to take a purely theoretical equation like: Yi = B0 + B1Xi + ei and use a set of data to create an estimated equation like: Yˆ i = Bˆ 0 + Bˆ 1Xi The purpose of the estimation technique is to obtain numerical values for the coecients of an otherwise completely theoretical regression equation The most widely used method of obtaining these estimates is Ordinary Least Squares (OLS)

Describing the Overall Fit of the Estimated Model: Pg. 62 of the exam review

The simplest commonly used measure of fit is R^2 R^2 is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable(s) in a regression model R^2 must lie in the interval 0 < R^2 < 1 The higher R2 is, the closer the estimated regression equation fits the sample data: - A value of R2 close to one shows an excellent overall fit - A value near zero shows a failure of the estimated regression equation to explain the values of Yi better than could be explained by the sample mean Y If the R^2 of a model is 0.50, then approximately half of the observed variation can be explained by the model. If the R^2 of a model is 0.95, that implies most of the variation has been explained, but there still remains a portion of the variance that is essentially random or unexplained by the model.

stochastic (random) error term

is a term that is added to a regression equation to introduce all of the variation in Y that cannot be explained by the included Xs. Y = B0 + B1X + e sometimes you see u or v in place of e

cross-sectional data set

consists of a sample of individuals, households, firms, cities, states, or countries, taken at a given point in time. data on multiple units collected at a single point in time ex: Freshmen at WVU in Fall 2020 or Real GDP from 50 different countries in fourth quarter of 1993

time series data

consists of observations on a variable or several variables over time. data on one unit collected at multiple points in time must be stored in chronological order (daily, weekly, monthly, quarterly, annually) because it often conveys potentially important information The reason is that time series observations (data), can rarely if ever, be assumed to be independent across time ex: Stock prices, Macro variables (money supply, CPI, GDP, unemployment rate), Monthly automobile sales

empirical analysis

uses data to test a theory or to estimate a relationship


Ensembles d'études connexes

Salesforce B2C Commerce Developer_LUU_DAT_FPT

View Set

Astronomy: The Earth, Sun, and Moon

View Set

Sociology True or False Quiz Chapter 4

View Set

EASA Part 66 : Maintenance Practice Question1, EASA Part 66 : Maintenance Practice Question2, EASA Part 66 : Maintenance Practice Question3, EASA Part 66 : Maintenance Practice Question4, EASA Part 66 : Maintenance Practice Question6, EASA Part 66 :...

View Set