ARE106 Study Guide

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

The term "BLUE" stands for "Best Linear Unbiased Estimators." MR estimators are BLUE if they meet the following assumptions:

1. Linearity: The relationship between the dependent variable and the independent variables is linear. 2. Independence: The errors are independent of each other. 3. Homoscedasticity: The variance of the errors is constant across all values of the independent variables. 4. No perfect multicollinearity: None of the independent variables are perfectly linearly related to each other.

Here are the steps involved in the "v method" (simple regression) for estimating an MR model with two independent variables:

1.Regress one of the independent variables (say, X1) on the other independent variable (say, X2) and the constant term. The regression equation is: X1 = β0 + β1X2 + ε1 2.Regress the dependent variable (Y) on both independent variables (X1 and X2) and the constant term. The regression equation is: Y = β0 + β1X1 + β2X2 + ε 3.Calculate the slope coefficient of X2 as: β2* = β2 / β1 where β2 is the coefficient of X2 in the second regression, and β1 is the coefficient of X2 in the first regression.

To fix heteroscedasticity, two approaches can be taken.

Alternatively, ex-post, White's estimator of the variance and standard errors can be used to correct for heteroscedasticity. White's estimator is then used to estimate the regression variance and standard errors.

To fix serial correlation, two approaches can be taken.

Alternatively, ex-post, the Newey-West estimator can be used to correct for serial correlation. This estimator involves weighting the squared residuals by a function of the lag length and using these weighted residuals to estimate the variance of the errors and the standard errors. The weights are chosen to minimize the bias of the estimator due to serial correlation.

To test whether one functional form is better than another, various statistical tests can be used.

Another approach is to use a test involving more than one parameter, such as the F-test or Wald test. The F-test is used to test the overall significance of a group of parameters, while the Wald test is used to test the significance of a single parameter or a group of parameters.

To construct a confidence interval around a coefficient estimate, we can use the t-distribution. The confidence interval provides a range of values that the true population coefficient is likely to fall within, with a specified level of confidence.

CI = bi +/- est se (bi) * critical value

In Chapter 11 of econometric theory and application, the difference between correlation and causation is emphasized.

Correlation refers to the association or relationship between two variables, whereas causation refers to the relationship where changes in one variable cause changes in another variable. It is important to distinguish between the two because correlation does not necessarily imply causation.

Endogeneity

Endogeneity refers to the violation of the exogeneity assumption. There are three types of endogeneity: omitted variable bias, simultaneity bias, and measurement error bias.

To fix serial correlation, two approaches can be taken.

Ex-ante, the model can be specified correctly, and an autoregressive specification of the model can be estimated to capture the serial correlation.

Interactions and indicator variables are used to capture nonlinearities and categorical variables in regression models.

Interactions occur when the effect of one independent variable on the dependent variable depends on the level of another independent variable. Indicator variables, also known as dummy variables, are used to capture categorical variables that take on discrete values.

Measurement error

Measurement error bias occurs when the variables in the regression model are measured with error, leading to bias in the estimated coefficients.

omitted variable bias

Omitted variable bias occurs when a relevant explanatory variable is left out of the regression model, leading to bias in the estimated coefficients of the other variables.

To test whether one functional form is better than another, various statistical tests can be used

One approach is to compare the R-squared values of the different models. The model with the highest R-squared value is generally preferred, although this approach may be influenced by the number of independent variables in the model.

Another option is to use functional form transformations to estimate nonlinear relationships using OLS.

One common transformation is the logarithmic transformation, which can be used to estimate exponential relationships.

To interpret an estimate as a causal effect, we need to satisfy certain assumptions.

One key assumption is that of exogeneity, where the explanatory variables are uncorrelated with the error term in the regression model. When exogeneity is satisfied, we can interpret the estimated coefficient on an explanatory variable as the causal effect of that variable on the dependent variable.

The R-squared value for the MR model is:

R^2 = SSR / SST R-squared measures the proportion of the variation in the dependent variable that is explained by the independent variables in the model.

The standard error of the coefficient estimate for an independent variable is:

SE(βj) = sqrt[s^2 / (SSTj*(1-R^2j))] where SSTj is the total sum of squares for the jth independent variable, R^2j is the R-squared value for the regression of the jth independent variable on all other independent variables in the model, and s^2 is the estimated regression variance.

SSE (sum of squared errors)

SSE is the sum of the squared differences between the actual values of the dependent variable and the predicted values from the regression equation.

SSR (sum of squared regression)

SSR is the sum of the squared differences between the predicted values from the regression equation and the mean of the dependent variable.

SST (total sum of squares)

SST is the sum of the squared differences between the actual values of the dependent variable and the mean of the dependent variable.

Simultaneity bias

Simultaneity bias occurs when two variables are simultaneously determined, and their relationship is not accurately captured by a simple regression model.

The MR model is estimated using a set of sample data, and the most commonly used method for estimation is the ordinary least squares (OLS) method.

The OLS method finds the values of β0, β1, β2, ..., βk that minimize the sum of squared errors between the predicted values of Y and the actual values of Y in the sample data.

multiple regression (MR) model

The multiple regression (MR) model is a statistical model that uses multiple independent variables to explain the variation in a dependent variable. The MR model can be written as: Y = β0 + β1X1 + β2X2 + ... + βkXk + ε where Y is the dependent variable, β0 is the intercept, β1, β2, ..., βk are the coefficients of the independent variables X1, X2, ..., Xk, respectively, and ε is the error term.

To test for serial correlation, the Breusch-Godfrey test (also called the Lagrange Multiplier test) can be used.

This test involves adding lagged dependent variables as additional independent variables in the regression and testing for the significance of the coefficients of the lagged dependent variables.

To test for heteroscedasticity, White's test can be used.

This test involves regressing the squared residuals on the independent variables and their squares. If the resulting F-statistic is significant, then there is evidence of heteroscedasticity.

To test a hypothesis about a single coefficient, we can use the t-test. The null hypothesis is usually that the coefficient is equal to zero, and the alternative hypothesis is that it is not equal to zero.

We can calculate the t-statistic for each coefficient estimate and compare it to the critical t-value from the t-distribution with n-k-1 degrees of freedom, where n is the sample size and k is the number of independent variables in the model. If the absolute value of the t-statistic is greater than the critical t-value, we reject the null hypothesis and conclude that the coefficient is statistically significant at the chosen level of significance (usually 5% or 1%).

A fixed effects model is

a regression model that includes individual or group-specific fixed effects to control for unobserved heterogeneity. It enables us to make causal statements about the effect of time-varying variables on the dependent variable, while holding constant individual or group-specific factors that do not vary over time.

The instrumental variables (IV) method is

a technique for estimating causal effects in the presence of endogeneity. It relies on finding an instrument that is correlated with the endogenous variable but uncorrelated with the error term in the model. The instrument is used to generate variation in the endogenous variable that is independent of the error term, allowing us to identify the causal effect of interest.

Degrees of freedom (df)

df is the number of observations minus the number of independent variables in the regression model.

To fix heteroscedasticity, two approaches can be taken.

ex-ante, the model can be specified correctly, and the functional form can be chosen appropriately to ensure homoscedasticity. Alternatively, ex-post, White's estimator of the variance and standard errors can be used to correct for heteroscedasticity.

If serial correlation is present in the data, the OLS estimators become

inefficient, and the standard errors and t-statistics are biased. This can lead to incorrect inferences and conclusions.

"v method" (simple regression)

is a technique used to estimate a multiple regression (MR) model with two independent variables.

serial or auto-correlation

is the phenomenon where the errors in a regression model are correlated with each other over time or across observations. This violates another assumption of the OLS regression, which assumes that the errors are independent of each other.

heteroscedasticity

is the phenomenon where the variance of the errors in a regression model is not constant across different values of the independent variables. This violates one of the assumptions of the OLS regression, which assumes that the variance of the error term is constant (homoscedasticity).

Heteroscedasticity can be caused by various factors, such as

measurement errors, omitted variables, and nonlinear functional forms.

The R-squared value...

measures the proportion of the total variation in the dependent variable that is explained by the independent variables in the model.

Serial correlation can be caused by various factors, such as

omitted variables, misspecified functional forms, and measurement errors.

To avoid the "dummy trap," which occurs when all the indicator variables are included in the model,

one of the indicator variables must be dropped from the model. This variable serves as the reference group, and the coefficients for the other indicator variables represent the difference in the dependent variable between the reference group and the other groups.

To address underspecification,

one option is to include additional independent variables that are relevant for the relationship being studied. However, adding too many independent variables can also lead to problems such as multicollinearity and overfitting.

The two conditions for a good instrument are

relevance and exogeneity. Relevance means that the instrument should be correlated with the endogenous variable, while exogeneity means that the instrument should be uncorrelated with the error term in the model.

To conduct an RCT,

researchers randomly assign participants to treatment and control groups, administer the treatment to the treatment group, and then compare the outcomes of the two groups.

Using these building blocks, we can estimate the regression variance as:

s^2 = SSE / (n - k - 1) where n is the number of observations, and k is the number of independent variables in the regression model.

When CR5 is high,

t indicates a lack of competition in the industry. This can lead to violations of the exogeneity assumption because firms in less competitive industries may have more market power to influence prices and affect other variables.

F statistic

the F-statistic tests the overall significance of the model

If heteroscedasticity is present in the data,

the OLS estimators become inefficient, and the standard errors and t-statistics are biased. This can lead to incorrect inferences and conclusions.

CR5 refers to

the concentration ratio of the top 5 firms in an industry.

If the model is underspecified,

the estimates of the coefficients for the included independent variables may be biased and inconsistent. This is because the omitted variables may be correlated with the included independent variables, violating the assumption of exogeneity. The omitted variables may also affect the error term in the model, leading to omitted variable bias.

p-value tells us

the probability of obtaining an F-statistic as large or larger than the one observed if the null hypothesis of no relationship between the dependent and independent variables were true. A small p-value (usually less than 0.05) indicates that the overall model is statistically significant.

However, fixed effects models have limitations, such as

their inability to estimate the effect of time-invariant variables on the dependent variable and their potential susceptibility to omitted variable bias.

If one of the independent variables in the MR model is highly correlated with another independent variable,

then we might face a problem of multicollinearity. (This situation can cause unstable or inaccurate coefficient estimates, and it can also make it difficult to interpret the effects of individual independent variables on the dependent variable.)

Under certain conditions, the MR estimators can collapse to the simple regression (SR) estimator.

this occurs when two or more of the independent variables in the MR model are highly correlated with each other, and the correlation is so strong that the variation in one independent variable can be explained by the other independent variable(s). In this case, the variance inflation factor (VIF) for the highly correlated independent variable(s) will be very high, indicating that their effects on the dependent variable are already accounted for by the other independent variable(s).

The t-statistic for the coefficient estimate of an independent variable is:

tj = bj-βj / SE(βj) where βj is the estimated coefficient for the jth independent variable.

An RCT, or randomized controlled trial, is a

type of experiment where subjects are randomly assigned to treatment and control groups. RCTs are often used as an ideal instrument because random assignment ensures that the treatment group and control group are identical on all observed and unobserved characteristics, except for the treatment itself. This eliminates the endogeneity problem, allowing us to identify the causal effect of interest.


Ensembles d'études connexes

International Business Chapter 1-4 test

View Set

Intermediate Accounting II - Chapter 18

View Set

Business Management 1 Final Exam Study Guide

View Set

Chapter 28 - Pulling it All Together : Integrated Head-to-Toe Assessment (Final)

View Set

ATI Ch 32 Medications Affecting Labor and Delivery

View Set

Chapter 8 (Part 1) TCP/IP Internetworking I

View Set

Chapter 21, PHY: Chap 22, Chap 23

View Set