DIS 651: Quantitative Analysis (Final)

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

How could we integrate ANOVA into a regression analysis?

ANOVA can be integrated into regression analysis through the construction of dummy variables for each treatment.

What is a binary choice model (or linear probability model)?

A linear regression model applied to a binary response variable (boolean) is called a linear probability model.

5 Steps for Evaluating Regression Results

1. Check estimated regression coefficients: Do the signs and magnitudes match theory and common sense? Which are statistically significant at 1, 5, and 10% levels? 2. How well does the estimated regression equation fit the data: R-squared, adjusted r-squared, and the standard error the estimate (Se/Y-bar)? 3. Check for omitted or unnecessary/irrelevant explanatory variables 4. Is the data set reasonably large and does it appear accurate? 5. Has the best functional form been used? Would a non-linear relationship be more appropriate for all or some of the independent variables?

How do we implement (include) the quadratic functional form in a regression equation?

A quadratic regression model w/ explanatory variable is specified as y= b₀ + b₁x + b₂x². This model can easily be estimated as a regression of y on x and x².

How do we interpret the regression coefficients in a linear probability model?

For the linear probability model, we interpret the regression coefficients as the resulting change in probability percentage based on a one % increase in the respective coefficient.

Write the null and alternative hypotheses for the test of joint significance of the explanatory variables:

H0: β₁ = β₂ = . . . βₙ = 0 HA: At least one βₙ ≠ 0

What null and alternative hypotheses are used for ANOVA testing?

H0: μ₁ = μ₂ = μ₃ = . . . = μₙ HA: Not all population means are equal

What is perfect multicollinearity?

Perfect multicollinearity is when there's a perfectly linear relationship between two explanatory variables in a regression model. This circumstance violates an assumption of the regression model, but high (non-perfect) multicollinearity does not.

What is multicollinearity?

Refers to a high linear relationship between 2 or more independent (or explanatory) variables in a regression model.

Why is ANOVA the appropriate statistical tool for controlled experiments and regression the appropriate statistical tool for observational studies?

Regression is the appropriate tool for observational studies because confounding factors can be accounted for within regression equations. ANOVA is the appropriate statistical tool for controlled experiments because we're able to decipher the effects of treatment effects and chance (and not confounding factors).

How do we test for a quadratic effect?

The coefficient beta₂ determines whether the relationship between x and y is u-shaped (beta > 0) or an inverted u-shape (beta < 0). In a quadratic regression equation, the b₂ coefficient would be tested to determine whether x has a quadric functional form.

How do you detect multicollinearity?

The detection methods for multicollinearity are mostly informal. However, the follow are helpful for detecting multicollinearity: 1. High R-squared coupled w/ individually insignificant explanatory variables 2. The sample correlation coefficient between two explanatory vars is more than 0.80 or less than -0.80. 3. Seemingly wrong signs for estimated regression coefficients (based on theory and knowledge of the coefficients)

For ANOVA, the test statistic is the F statistic. How do we determine if we reject or fail to reject the null hypothesis for an ANOVA test?

The easiest thing to do is to compare the p-value of the ANOVA test to the significance level (alpha). If the p-value is greater than alpha, fail to reject the null. If the p-value is less than the alpha level, reject the null hypothesis in favor of the alternative hypothesis.

Quantitative versus qualitative variable impact

The estimated regression coefficient measures the impact on the dependent variable of an increase in the independent variable by one unit.A qualitative (dummy) variable can only change in total by one unit (from 0 to 1), so its full effect on the dependent variable is measured by its estimated regression coefficient.A quantitative variable can change over a whole range of values, so its full effect on the dependent variable is not simply its estimated regression coefficient. One way to measure the full effect on the dependent variable of a quantitative variable, assuming a linear relationship, would be the maximum minus the minimum value of the quantitative variable in the sample times the estimated regression coefficient.

Why is the linear probability model called a probability model?

The linear probability model is called a probability model because the the resulting value is the probability in which the response variable is likely to be true. Here the dependent variable for each observation takes values which are either 0 or 1. The probability of observing a 0 or 1 in any one case is treated as depending on one or more explanatory variables.

What is the major consequence of multicollinearity?

The major consequence of multicollinearity is inflated standard errors of estimate regression coefficients which leads to less precision, flattened distributions, and loss of statistical significance as t-score can decrease. Beyond that, multicollinearity makes it difficult to disentangle separate influences of explanatory variables on the response variable. When multicollinearity is severe, we see some parameter estimates have wrong signs and/or incorrect significance based on relevant t-scores.

How do you remedy multicollinearity?

The primary remedy for resolving multicollinearity is to drop one of the collinear variables from the regression analysis (if it's perceived as redundant) Beyond that, more data can be collected (as the correlation may weaken between the two explanatory variables prompting the multicollinearity). Additionally, the regression model can be reformulated using transformed variables. Otherwise, the investigators could take the "do nothing" approach; however, this might have an implication on the validity of the regression statistics.

What is the quadratic functional form and why do we use it?

The quadratic functional form is used when the slope, capturing the influence of x on y, changes in sign and magnitude. This is different from the linear functional form in that the linear functional form is used to represent a linear relationship between x and y.

What is the purpose of the test of joint significance of the explanatory variables in the regression model?

The test of joint significance is often regarded as a test of the overall usefulness of a regression. This test determines whether the explanatory variables have a joint statistical influence on the response variable (y).

What two sample statistics are calculated from the sample from each treatment/factor population and how do these sample statistics measure between-treatment variability and within-treatment variability?

The two sample statistics leveraged in ANOVA are sample mean and standard deviation. Between-treatment variability is based on the variability between the sample means. Within-treatment variability is based on the variability within each sample, that is, due to chance (or due to the treatment effect).

What is the F-statistic, and what can use the use Significance F call in Excel for? How so?

To conduct the test of joint significance, we employ a one-tailed F-test. This test statistic measures how well the regression equation explains the variability in the response variable. The easiest way to conduct a test of joint significance is w/ the p-value approach. The value under "Sig F" in Excel is the associated p-value. One can compare this p-value to the significance level (alpha) to determine whether to reject or fail to reject the null hypothesis.

How do we use estimated linear probability model to predict whether an observation has an attribute (takes Adderall, has a tattoo, smokes marijuana)?

We can use the linear probability model to calculate the percentage chance that a specific observation has a certain attribute using p-hat. To calculate p-hat, we use the estimated regression coefficients paired w/ their respective responses from the observation. Using this formula, we will get p-hat (a percentage probability that an attribute is true).

What does ANOVA test?

We use ANOVA (analysis of variance) tests to determine if differences exist between the means of three or more populations.

Log-Lin Functional Form

When to Use: Estimate the percentage change in y when x increase by one unit. Often used to model the rate of growth of certain economic variables such as population, employment, wages, productivity, and GDP. How to Implement: Ln(Y) = β₀ + β₁X₁ + . . . + βₙXₙ + ε How to Interpret: If X₁ changes by one unit, Y changes by 100*β₁ percent.

Double-Log Functional Form

When to Use: Use when elasticities (percentage change in x cause a percentage change in y) are constant and slopes are not. How to Implement: ln(Y)=β₀ + β₁*ln(X₁) + . . . + βₙ*ln(Xₙ) + ε How to Interpret: If X1 changes by one percent, Y changes by β1 percent

Lin-Log Functional Form

When to Use: Use when the effect of an explanatory variable diminishes as the variable increases in size, but the effect does not turn negative. For example, the effect of annual income on annual food expenditures, where we would expect the impact to diminish at higher income levels. How to Implement: Y = β₀ + β₁ln(X₁) + β₂X₂ + β₃X₃ + . . . + βₙXₙ + ε How to Interpret: If X₁ changes by one percent, Y changes by β₁/100 units.


Kaugnay na mga set ng pag-aaral

EMT Chapter 34 - Pediatric Emergencies

View Set

FEQUENTLY ASKED QUESTIONS REGARDING NITROUS OXIDE AND OXYGEN SEDATION

View Set

A & P 1 chapter 12 & 13 spinal cord, spinal nerves, spinal reflexes/brain

View Set

P.E. Test 5- Developing Cardiorespiratory Fitness 4

View Set

Chapter 13: Business Intelligence and Data Warehouses

View Set

Reading and Writing- Reading Process and Writing Process

View Set

Fertilization usually occurs in the _____ while fetal development occurs in the _____.

View Set

nkmxb3 - formidable, circumspect, deflect, arable, florid, covetous, egalitarian, ramification, sonorous, herald, scanty, abridge, captious, draconian, rebuff, salutary, prehensile, affinity, catharsis, inept, cynical, dubious, cajole, succinct

View Set

Exam 2 Lifespan Part 1... Physical Development

View Set