Econometrics for Dummies

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Independent Events

One event has no statistical relationship with the other event; to check for independence observe that the probability of one event is unaffected by the occurrence of another event; independent events if the conditional and unconditional probabilities are equal

Standard Deviation

The square root of the variance; a measure of the dispersion of a set of data from its mean; commonly reported because it is measured in the same units as the random variable

Heteroskedasticity

The standard deviations of a variable, monitored over a specific amount of time, are non-constant; volatility isn't constant

Model Specification

Selecting an outcome of interest or dependent variable (Y) and one or more independent factors (or explanatory variables X); the determination of which independent variables should be included in or excluded from a regression equation

Notation for Statistically Significance

(*) Significant at 10% level\n(**) Significant at 5% level\n(***) Significant at 1% level\nReporting of p-values: the lowest level of significance at which the null hypothesis could be rejected

Reasons to Avoid using R-Squared as the only measure of a Regression's Quality

(1) A regression may have a high R-Squared but have no meaningful interpretation because the model equation isn't supported by economic theory or common sense\n(2) Using a small data set or one that includes inaccuracies can lead to a high R-Squared value but deceptive results\n(3) Obsessing over R-Squared may cause you to overlook important econometric problems\n\nHigh R-Squared values may be associated with regressions that violate assumptions; in econometric settings, R-Squared values too close to 1 often indicate something is wrong

Resolving Multicollinearity

(1) Acquire more data\n(2) Apply a new model\n(3) Cut the problem variable loose

Methods of Calculating Standardized Regression Coefficients

(1) Calculating a Z-score for every variable of every observation and then performing OLS with the Z values rather than the raw data\n(2) Obtaining the OLS regression coefficients using the raw data and then multiplying each coefficient by the standard deviation of the X variable over the standard deviation of the y variable

Feasible Generalized Least Squares (FGLS) Techniques and Description

(1) Cochrane-Orcutt Transformation\n(2) Prais-Winsten Transformation\n\nThe goal of these models is to make the error term in the original model uncorrelated; involves quasi-differencing which subtracts the previous value of each variable scaled by the autocorrelation parameter

Typical Data Types in Econometrics

(1) Cross Sectional\n(2) Time Series\n(3) Panel/Longitudinal

Why OLS is most popular method for estimating regressions

(1) Easier than other alternatives\n(2) Sensibility\n(3) Has desirable characteristics

"Acquiring More Data" Solution to Multicollinearity

(1) Ensures that multicollinearity not just in your sample\n(2) Make sure the population doesn't change\n(3) For cross-sectional data, use more specific data at either current time or future time\n(4) For time-series data, increase the frequency of the data

Steps to Performing a Hypothesis Test

(1) Estimate the population parameters using sample data\n(2) Determine the appropriate distribution\n(3) Calculate an interval estimate or test statistic\n(4) Determine the hypothesis test outcome

Detecting Heteroskedasticity

(1) Examining the residuals graphically\n(2) Breusch-Pagan Test\n(3) White Test\n(4) Goldfield-Quandt Test\n(5) Park Test

Methods of Testing for Joint Significance Among Dummy Variables

(1) F-Test\n(2) Chow-Test

Ten Common Mistakes in Applied Econometrics

(1) Failing to use common sense and knowledge of economic theory\n(2) Asking the wrong questions first\n(3) Ignoring the work and contributions of others\n(4) Failing to familiarize yourself with the data\n(5) Making it too complicated\n(6) Being inflexible to real world complications\n(7) Looking the other way when you see bizarre results\n(8) Obsessing over measures of fit and statistical significance\n(9) Forgetting about economic significance\n(10) Assuming your results are robust

Remedying Harmful Autocorrelation

(1) Feasible Generalized Least Squares (FGLS)\n(2) Serial correlation robust standard errors

Estimation Methods when Using Panel Data for Unobservable Factors

(1) First Difference (FD) Transformation\n(2) Dummy Variable (DV) Regression\n(3) Fixed Effects (FE) Estimator

Analyzing Residuals to Test for Autocorrelation

(1) Graphical inspection of the residuals\n(2) The "run test" (or "Geary test")\n(3) Durbin-Watson\n(4) Breusch-Godfrey

Issues when Comparing Regression Coefficients

(1) In standard OLS regression, the coefficient with the largest magnitude is not necessarily associated with the "most important" variable\n(2) Coefficient magnitudes can be affected by changing units of measurement; scale matters\n(3) Even variables measured on similar scales can have different amounts of variability

2 CLRM Assumption Failures

(1) Inability to account for heteroskedasticity\n(2) Inability to account for autocorrelation

Ten Components of a Good Research Project

(1) Introducing your topic and posing the primary question of interest\n(2) Discussing the relevance and importance of your topic\n(3) Reviewing the existing literature\n(4) Describing the conceptual or theoretical framework\n(5) Explaining your econometric model\n(6) Discussing the estimation method(s)\n(7) Providing a detailed description of the data\n(8) Constructing tables and graphs to display the results\n(9) Interpreting the reported results\n(10) Summarizing what was learned

Typical Consequences of High Multicollinearity

(1) Larger Standard Errors and Insignificant t-statistics\n(2) Coefficient estimates that are sensitive to changes in specification\n(3) Nonsensical coefficient signs and magnitudes

Violations of the Classical Regression Model Assumption

(1) Multicollinearity\n(2) Heteroskedasticity\n(3) Autocorrelation

Three Main LPM Problems

(1) Non-normality of the error term\n(2) Heteroskedastic errors\n(3) Potentially nonsensical predictions

Misspecification in Econometric Models

(1) Omitting Relevant Varaibles\n(2) Including Irrelevant Variables\n\nNote: just because an estimated coefficient doesn't have statistical significance doesn't mean it is irrelevant- a well specified model includes both significant and non-significant variables

Results of an Econometric Model with a Dummy, Quantitative, and Interaction Terms

(1) One Regression Line- The dummy and interaction coefficients are zero and not statistically significant\n(2) Two Regression Lines with Different Intercepts, but the Same Slope- The coefficient for the dummy variable is significant, but the interaction coefficient is zero (not statistically significant)\n(3) Two Regression Lines with the Same Intercept but different Slopes: Dummy coefficient zero, interaction coefficient significant\n(4) Two Regression Lines with Different Intercepts and Slopes- Both dummy and interaction coefficients are significant

Results of an Econometric Model with Two Dummy Variables and an Interaction Between those Two Characteristics

(1) One Regression Line: Dummy and Interaction are zero\n(2) Two Regression Lines: One coefficent significant, the other zero\n(3) Three Regression Lines: Both Dummies significant, but the interaction coefficient is 0\n(4) Four Regression Lines: The dummy coefficients and the interaction coefficients are all significant

Measuring the Degree or Severity of Multicollinearity

(1) Pairwise Correlation Coefficients\n(2) Variance Inflation Factors (VIF)

Types of Multicollinearity

(1) Perfect Multicollinearity (rare)\n(2) High Multicollinearity (much more common)

Setting up a PRF Model

(1) Provide the general mathematical specification of the model\n -Denotes dependent variable and all independent variables\n(2) Derive the econometric specification of your model\n -Develop a function that can be used to calculate econometric results\n(3) Specify the random nature of your model\n -Introduce an error variable

Types of Non-linear models

(1) Quadratic Functions\n(2) Cubic Functions\n(3) Inverse Functions

"Using a New Model" Solution to Multicollinearity

(1) Respecify by log transformations, or reciprocal functions\n(2) Use "first-differencing"\n(3) Create a composite index variable

OLS Assumptions (or the Classical Linear Regression Model)

(1) The model is linear in parameters and has an additive error term\n(2) The values for the independent variables are derived from a random sample of the population and contain variability\n(3) No independent variable is a perfect linear function of any other independent variable(s) (no perfect collinearity)\n(4) The model is correctly specified and the error term has a zero conditional MEAN (not necessarily sum)\n(5) The error term has a constant variance (no heteroskedasticity)\n(6) The values of the error term aren't correlated with each other (no autocorrelation or no serial correlation)

Numerical Properties of OLS

(1) The regression line always passes through the sample means of Y and X\n(2) The mean of the estimated (predicted) Y is equal to the mean value of the actual Y\n(3) The mean of the residuals is 0\n(4) The residuals are uncorrelated with the predicted Y\n(5) The residuals are uncorrelated with observed values of the independent variable

Factors Influencing Variance of OLS Estimators

(1) The variance of the error term- the larger the variance of the error, the larger the variance of the OLS estimates\n(2) The variance of X- the larger the sample variance of X, the smaller the variance of the OLS estimates\n(3) Multicollinearity- As the correlation between two or more independent variables approaches 1, the variance of the OLS estimates becomes increasingly large and approaches infinity (less efficient)

Methods of Dealing with Limited Dependent Variables

(1) Tobin's Tobit\n(2) Truncated Normal\n(3) Heckman Selection

Normal Distribution Characteristics

(1) Total area under the curve equals 1\n(2) About 68% of the density is within one standard deviation of the mean\n(3) About 95% of the density is within two standard deviations of the mean\n(4) About 99.7% of the density is within three standard deviations of the mean\n(5) Because a continuous random variable can take on infinitely many values, the probability that a specific value occurring is zero

Variables Leading to Multicollinearity

(1) Variables that are lagged values of one another\n(2) Variables that share a common time trend component\n(3) Variables that capture similar phenomena

Correcting the Regression Model for the Presence of Heteroskedasticity

(1) Weighted Least Squares\n(2) Robust (White-Corrected) Standard Errors

Methods of Hypothesis Test

(1) Z distribution\n(2) t distribution\n(3) Chi-squared distribution\n(4) F distribution

Limitations to the Collection of Data Process in Econometrics

(1)Aggregation of Data\n(2)Statstically correlated but economically irrelevant variables\n(3)Qualitative Data\n(4)Classical Linear Regression Model Assumption Failures

Positive Autocorrelation

A form of "persistence" whereby a system has a tendency to remain in the same state from one observation to the next

Ordinary Least Squares

A statistical technique to determine the line of best fit for a mode; a straight line is sought to be fitted through a number of points to minimize the sum of the squares of the distances from the points to the line of best fit

Autoregressive Model

A type of dynamic model that seeks to fix the estimation issues associated with distributed lag models by replacing the lagged values of the independent variable with a lagged value of the dependent variable

Run Test

A "run" is defined as a sequence of positive or negative residuals\n\nThis test involves observing the number of runs within your data and determining whether this is an acceptable number of runs based on your confidence interval\n\nIf the number of observed runs is below the expected interval, there is evidence of positive autocorrelation; if above expected interval, evidence of negative autocorrelation

Spurious Correlation

A correlation between two variables that does not result from any direct relation between them but from their relation to other variables; variables coincidentally have a statistical relationship but one doesn't cause the other; correlation can never be proven by statistical results in any circumstance

Linear Functions vs. Linear in Parameters

A function doesn't need to be linear in order to be applicable to the OLS, but the parameters must be; in other words, the formula can have exponents, but the parameters (Betas) cannot be the exponents (a log transformation may be used in order to linearize this type of function)

Stochastic Population Regression Function

A function that introduces a random error term associated with the observation

Variance

A measure of dispersion, or how far a set of numbers is spread out; the square of the standard deviation; the average squared difference between the value of a random variable and its mean

Chow Test

A misspecification test that checks for the structural stability of the model; used when the parameters in the model aren't stable or they change

Standard Normal Distribution

A normal distribution with a mean of 0 and a standard deviation of 1; useful because any normally distributed random variable can be converted to this scale allowing for quick and easy computation of probabilities; denoted by the letter "Z"

Parameter

A numerical characteristic of a population, as distinct from a statistic of a sample

F- Distribution

A ratio of two chi-squared distributions divided by their respective degrees of freedom; as degrees of freedom in the numerator and denominator increase, the distribution approaches normal\n\nA probability density function that is used especially in analysis of variance, and is a function of the ratio of two independent random variables each of which has a chi-square distribution and is divided by its number of degrees of freedom

Chi-squared Distribution

A probability density function that gives the distribution of the sum of the squares of several independent random variables each with a normal distribution with zero mean and unit variance; the higher the degrees of freedom (or more observations), the less skewed (and more symmetrical) the distribution\n\nThe sum of the squares of several independent standard normal random variables is distributed according to the chi-squared distribution with "k" degrees of freedom (or "k" number of variables)

Sampling Distribution

A probability distribution or density of a statistic when random samples of size "n" are repeatedly drawn from a population- it is not the distribution of the sample measurements

Multiple Regression

A regression model that contains more than one explanatory variable

Homoskedasticity

A situation in which the error has the same variance regardless of the value(s) taken by the independent variable(s)

No Autocorrelation

A situation where no identifiable relationship exists between the values of the error terms among data, or the correlation and covariance among error terms are 0; the positive and negative error values are random

Negative Autocorrelation

A situation where the correlation from one error term to the next is negative; an unlikely situation

Positive Autocorrelation

A situation where the correlation from one error term to the next is positive; more common and more likely than negative autocorrelation

Biased and Unbiased Statistics

A statistic is biased if it is calculated in such a way that it is systematically different from the population parameter of interest; unbiased statistics represent the population fairly well

Linear Regression

A statistical measure that attempts to determine the strength of the relationship between one dependent variable and a series of other changing variables; linear regression uses one independent variable to explain and/or predict the outcome of a dependent variable

Central Limit Theorem (CLT)

A statistical theory that states that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population. Furthermore, all of the samples will follow an approximate normal distribution pattern, with all variances being approximately equal to the variance of the population divided by each sample's size.\n\nDistributions of sample means can thus be converted to standard normals

Difference in Difference (D-in-D)

A technique used that measures the effect of a treatment in a given period of time; identifies and separates a preexisting difference from a data point from the difference that exists after the introduction of a treatment (or event or public policy change)

d-Statistic

A test statistic developed in the Durbin-Watson test in order to detect the presence of autocorrelation (but only identifying first order autoregression)

Regression Specification Error Test (RESET)

A test that can be used to detect specification issues related to omitted variables and certain functional forms; allows you to identify if there is misspecification in yout model, but it doesn't identify the source

Confidence Interval Approach to Testing Hypotheses

A type of interval estimate of a population parameter that is used to indicate the reliability of an estimate; an observed interval that frequently includes the parameter of interest if the experiment is repeated

Mean Square Error

A value that provdies an estimate for the ture variance of the error; the unbiased estimate of error variance: the residual sum of squares divided by the number of degrees of freedom

Dummy Variables

A variable that takes on the value 0 or 1 to indicate the absence or presence of some categorical effect that may be expected to shift the outcome; a true/false variable

Random Variables

A variable whose value is subject to variations due to chance; conceptually does not have a single, fixed value (even if unknown), rather it can take on a set of possible different values, each with an associated probability; uncertain values

Significance Test Approach to Testing Hyptheses

Allows an analyst to estimate how reliably the results derived from a study based on a randomly selected sample can be generalizable to the population from which the sample was drawn; a result that is statistically significant is a result not likely to occur randomly, but likely to be attributable to a specific cause

Serial Correlation Robust Standard Errors

Allows for the biased estimates to be adjusted while the unbiased estimates are untouched, thus no model transformation is required\n\nAdjusting the OLS standard errors for autocorrelation produces serial correlation robust standard errors (Newey-West standard errors)

Cubic Functions in Econometrics

Allows for the effect of the independent variable (X) on the dependent variable (Y) to change, but this relationship changes at some unique value of X; often observed in total variable cost curves and total cost curves

Quadratic Functions in Econometrics

Allows the effect of the independent varaible on the dependent variable to change; as the value of X increases, the impact on the dependent variable increases or decreases; best for finding maximums and minimums; observable in total variable cost and total cost curves

White Test

Allows the heteroskedasticity process to be a function of one or more independent variables; allows the independent variable to have a nonlinear and interactive effect on the error variance\n\nUseful for identifying nearly any pattern of heteroskedasticity, but not useful in showing how to correct the model

Breusch-Pagan Test

Allows the heteroskedasticity process to be a function of one or more of the independent variables, and its usually applied by assuming that heteroskedasticity may be a linear function of all the independent variables in the model\n\nFailing to find evidence of heteroskedasticity with BP doesn't rule out a nonlinear relationship between the independent variables and the error variance

Integration (Calculus)

Allows us to find densities under nonlinear functions

Measurements of Significance and Confidence

Alpha and (1-Alpha), respectively

Autocorrelation

Also known as serial correlation, may exist in a regression model when the order of the observations in the data is relevant; refers to the correlation of a time series with its own past and future values; also known as lagged correlation; complicates teh application of statistical tests by reducing the number of independent observatiosn

Normal Distribution

Also known as the Gaussian or continuous probability distribution; plots all values in a symmetrical fashion and most of the results are situated around the probability'smean

Consistency

An asymptotic property- as the sample size approaches infinity, the variance of the estimator gets smaller and the value of the estimator approaches the true population parameter value\n\nUsed when CLRM assumptions fail and the alternative method doesn't produce a BLUE

Sequencing

An autocorrelation situtation where most positive error terms are followed or preceded by additional positive errors or when most negative errors are followed or preceded by other negative terms

Random Effects (RE) Model

An econometric model that allows for all unobserved effects to be relegated to the error term; this provides more efficient estimates of the regression parameter

Log-Linear Model

An econometric specification whereby the natural log values are used for the depedendent varaible Y and the independent variable X is kept in its original scale; often used when the variables are expected to have an exponential growth relationship

Linear-Log Model

An economic specification whereby the natural log values are used for the independent variable X and the dependent variable Y is kept in its original scale; typically used when the impact of the indepedent variable on the dependent variable decreases as the value of the indepedent variable increases (similar to a quadratic, but it never reaches a maximum of minimum value for Y; used to model diminishing marginal returns

Linearity Estimators

An estimator has this property if a statistic is a linear function of the sample observations

Efficient Estimator

An estimator that achieves the smallest variance among estimators of its kind

Consistent Estimator

An estimator that approaches the true parameter value as the sample size gets larger and larger; this is known as an asymptotic property- it gradually approaches the true parameter value as the sample size approaches infinity

Interactive Qualitative Characteristic

An interaction (product) of two dummy variables if you have reason to believe that the simultaneous presence of two (or more) characteristics has an additional influence on your dependent variable

Limited Dependent Variables

Arise when some minimum threshold value must be reached before the values of the dependent variable are observed and/or when some maximum threshold value restricts the observed values of the dependent variable\n\nEx: ticket sales cease after a stadium sells out even if demand is still high; people drop out of the labor force if wages become too low

Goldfeld-Quandt Test

Assumes that a defining point exists and can be used to differentiate the variance of the error term\n\nResult is dependent on the criteria chosen; often an arbitrary process, so failing to find evidence of hetero doesn't rule out hetero

Park Test

Assumes that the heteroskedastic process may be proportional to some power of an independent variable\n\nAssumes heteroskedasticity has a particular functional form

Characteristics of t-Distribution

Bell shaped, symmetrical around zero, approaches a normal distribution as the degrees of freedom (number of observations) increases; the ratio of the standard normal to the square root of your chi-squared distribution

Best Linear Unbiased Estimators (BLUE)

Best- Achieving the smallest possible variance among all similar estimators\nLinear- Estimates are derived using linear combinations of the data values\nUnbiased- Estimators (coefficients) on average equal their true parameter values\n\nGiven the assumptions of CLRM, the OLS estimators are "BLUE": This is the Gauss-Markov Theorem

Panel Data-set vs. Pooled Cross-Sectional Measurements

Both contain cross-sectional measurements in multiple periods, but a panel dataset includes the same cross-sectional units in each time period rather than being randomly selected in each period as is the case with pooled cross-sectional data

Details on the Confidence Interval Approach

Calculate a lower limit and an upper limit for a random interval and attach some likelihood that the interval contains the true parameter value; if the hypothesized value for your parameter of interest ins in the critical region (outside of the confidence interval 1-Alpha), then you reject the null hypthesis

Details on the Significance Test Approach

Calculate a test statistic and then compare the calculated value to the critical value from one of the probability distributions to determine the outcome of your test; if the calculated test statistic is in the critical region, you reject the null hypothesis and you can also say that your test is statistically significant

Conditional Probabilities

Calculate the chance that a specific value for a random variable will occur given that another random variable has already taken a value; requires both joint and marginal probabilities in order to calculate

Point Estimate

Calculating a statistic with data from a random sample produces this; a single estimate of a population parameter

Estimators

Calculating descriptive measures using sample data

Difference between Censored and Truncated Dependent Variables

Censored- Observed, but suppressed\nTruncated- Not observed

Maximum Likelihood (ML) Estimation

Chooses values for the estimated parameters that would maximize the probability of observing the Y values in the sample with the given X values; calculates the joint probability of observing all values of the dependent variable assumes each observation is drawn randomly and independently from the population

Coefficients in a Linear-Log Model

Coefficients represent the estimated unit change in the dependent variable for a percentage change in the independent variable

Differential (Calculus)

Concerns the rates of change and slopes of curves

Composite Index Variable

Combine collinear variables with similar characteristics into one variable; requires that the association between the two variables is logical

Pooled Cross-Sectional Data

Combines independent cross-sectional data that has been collected overtime; an event study is pooled cross sectional data

Panel/Longitudinal Data

Consist of time series for each cross-sectional unit in a sample; involves repeated observations of the same variables over a period of time; data typically collected through surveys

Cross Sectional Data

Consists of measurements for individual observations at a given point in time; tend to be popular in labor economics, industrial organization, urban economics, and other micro-based fields; data typically collected through surveys

Time Series Data

Consists of measurements on one or more variables over time; a sequence of data points measured in successive points in time at uniform time intervals; often used for examining seasonal trends and adjustments; data often collected by government agencies

Population Regression Function (PRF)

Defines in a mathematical function perception of reality

Goodness of Fit

Describes how well a statistical model first a set of observations; generally requires the decomposing of the variation in the dependent variable into explained and unexplained (residual) parts, then using R-squared to make measurement

Overall (Joint) Significance

Determine if the variation in your Y variable explained by all or some of your variables is nontrivial; uses the F-statistic

Use of Econometrics Techniques

Determining the magnitude of various relationships that economics introduces; used to predict or forecast future events and explaining how one or more factors affect some outcome of interest

Tools of Econometrics

Econometrics that uses tools such as frequency distributions, probability and probability distributions, statistical inference, simple and multiple regression analysis, simultaneous equations models, and time series methods

Hausman Test

Examines the differences in the estimated parameters, and the result is used to determine whether the RE and FE estimates are significantly different

Spurious Correlation Problem

Exists when a regression model contains dependent and independent variables that are trending; may appear to show that X has a strong effect on Y when this may not be the case: it is the trend of the data causing the observed results

Explained and Residual Variation

Explained variation is the difference between the regression line and the mean value. Residual/unexplained variation is the difference between the observed value and the regression line.

Type II Error

Failing to reject a null hypothesis that is in fact false\n\nReducing the value of alpha (the level of significance) increases the chance of failing to reject the null hypothesis and the chance of committing a Type II error

Statistical Inference

Focuses on the process of making generalizations for a population from sample information

Assumption of Normality in Econometrics

For any given X value, the error term follows a normal distribution with a zero mean and constant variance; for large sample sizes, normality is not a major issue because the OLS estimators are approximately normal even if the errors are not normal\n\nIf you assume that the error term is normally distributed, that translates to a normal distribution of OLS estimators

Composite Error

Found by estimating a model for panel data by using OLS so that you're essentially ignoring the panel nature of the data\n\nThe composite error term includes individual fixed effects (unobservable factors associated with the individual subjects) and idiosyncratic error (represents truly random element associated with a particular subject at a point in time)

Null Hypothesis (Ho)

General or default position- no relationship between two measured phenomenon; an assumption or prior belief about a population parameter to be tested; attempted to be overturned by our hypothesis test

Linear Combination of Normally Distributed Random Variables

If a random variable is a linear combination of another normally distributed random variable(s), it also has a normal distribution

Unbiased Estimator

If in repeated estimations using the same calculation method, the mean value of the estimator coincides with the true parameter value

Using mutliple dummy variables

If you J groups of variables, you need J-1 dummy variables with 1s and 0s to capture all the qualitative information; thr group that does not hav ea dummy variable is identified when all other dummy variables are 0 (known as the reference or base group)

Dynamic

If your dependent variable doesn't fully react to a change in the independent variable(s) during the period in which the change occurs, then your model is dynamic and will estimate both a contemporaneous relationship at time t and lagged relationship at time t-1

Static Model

If your dependent variable reacts instantaneously to changes in the independent variable(s), then the model is static and will estimate a contemporaneous relationship at time t

Specification Issues

In regression analysis and related fields such as economics, this is the process of changing a theory into a regression model. This method consists of choosing an appropriate functional form for the model and choosing which variables to add. This is one of the first basic steps in regression analysis. If an estimated model is not specified, it will be inconsistent and biased.

Core Variables

Indepedent variables of primary interest

Conditional Mean Operator

Indicates that the relationship is expected to hold, on average, for given values of independent variables

Censored Dependent Variables

Information is lost because some of the acutal values for the dependent variable are limited to a minimum and/or maximum threshold value\n\nExamples:\n(1) Number of hours worked in a week\n(2) Income earned\n(3) Sale of tickets to an event\n(4) Exam scores

Truncated Dependent Variables

Information is lost because some of the values for the variables are missing, meaning that they aren't observed if they are above or below some threshold; common when the selection of a sample is non-random (i.e. people below the poverty line)

First-Differencing

Involves subtracting the previous value from the current periods value; requires that the variables have variation over time\n\nDisadvantages\n(1) Losing observations\n(2) Losing variation in the independent variables\n(3) Changing the specification (possibly resulting in misspecification bias)

Dummy Variable Regression

Involves the inclusion of dummy variables in the model for each cross-sectional unit, making it a straightforward extension to the basic use of dummy variables

Point Estimation

Involves the use of sample data to calculate a single value (statistic) which is to serve as a best estimate of an unknown population parameter; a single estimate of your parameter of interest

Expected Value

Mean of a random variable, provides a measure of central tendency or one measurement of where the data tends to cluster; the sum of all variables and their respective probabilities (with continuous variables, it is the derivative of the sum of all variables and their probabilities)

Descriptive Statistics

Measurements that can be used to summarize your sample data and subsequently make predictions about your population of interest; quantitatively describe the main features of a collection of data

Covariance

Measures how two variables are related: 0 if the variables are independent or no clear relation between the two, + if there is a direct relationship, - if there is an inverse relationship; DOES NOT PROVIDE INFORMATION AS TO THE STRENGTH OF THE RELATIONSHIP OF TWO VARIABLES, JUST THE DIRECTION OF THE RELATIONSHIP

Variance Inflation Factor (VIF)

Measures the linear association between an independent variable and all the other independent variables; VIFs greater than 10 signal a highly likely multicollinearity problem, and VIFs between 5 and 10 signal a somewhat likely multicollinearity issue

R-Squared

Measures the proportion of variation in the dependent variation in the independent variables; a ratio between 0 and 1; equals the explained sum of squares over the total sum of squares; maximizing R-Squared means the line has a good fit because we seek to minimize the residual sum of squares (the closer to 1, the better); can only remain the same or increase as more explanatory values are added

Correlation

Measures the strength of the relationship between two variables; can only identify linear relationships (other techniques available for non-linear relationships)

Heterogenity Bias

Occurs if you ignore characteristics that are unique to your cross-sectional units and they're correlated with any of your independent variables

t- Distribution

Probability distribution that arises when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown\n\nIf we take a sample of "n" observations from a normal distribution with fixed unknown mean and variance, and if we compute the sample mean and sample variance of these "n" observations, the the t-distribution can be defined as the distribution of the location of the true mean, relative to the sample mean and divided by the sample standard deviation after multiplying by the normalizing term (SQRT("n")). \n\nUsed to estimate how likely it is that the true mean lies in any given range

Confidence Interval Approach to Statistical Significance

Provides a range of possible values for the estimator in repeated sampling, and the range of values would contain the true value (parameter) a certain percentage of the time; interveals commonly used are 90, 95, and 99; if a hypothesized value is not contained in your calculated confidence interval, then your coefficient is statistically significant

Test of Significance Approach to Statistical Significance

Provides a test statistic that's used to determine the likelihood of the hypothesis; a t-test is generally performed- if the t-statistic is in the critical region, then the coefficient is statistically significant

Bivariate or Joint Probability Density

Provides the relative frequencies or chances that events with more than one random variable will occur; the probability that two events will occur simultaneously

Type I Error

Rejecting a null hypothesis that is in fact true\n\nIncreasing the value of alpha (the level of significance) increases the chance of rejecting the null hypothesis and the chance of committing a Type I error

Coefficient of Determination

R-squared; indicates how well data points fit a line or curve; the measure of fit most commonly used with OLS regression

Stochastic

Random; involving a random variable; involving chance or probabilty

Alternative Hypothesis

Reflects that there will be an observed effect for our experiment; tested against the null hypthesis

Detrending Time-Series Data

Removing trending patters from data in order to derive the explanatory power of the independent variables; helps to solve the spurious correlation problem

High Multicollinearity

Results from a linear relationship between your independent variables with a high degree of correlation, but they aren't completely deterministic

Random Error

Results from:\n(1) insufficient or incorrectly measured data\n(2) A lack of theoretical insights to fully account for all the factors that affect the dependent variable\n(3) Applying the incorrect functional form\n(4) Unobservable characteristics\n(5) Unpredictable elements of behavior

Robustness

Robustness refers to the sensititvity (or rather, the lack therof) of the estimated coefficients when you make changes to your model's specification; misspecification less problematic when the results are robust

Dealing with Seasonal Adjustments

Seasonality can be correlated with both your dependent and independent variables; it is necessary to explicitly control for season in which measurements occur: use dummy variables for the seasons\nData that has been stripped of its seasonal patterns is referred to as seasonally adjusted or deseasonalized data

Probability Density Functions (PDF)

Shows the probabilities of a random variable for all its possible values; a function that describes the relative likelihood for this random variable to take on a five value

Beta Coefficients

Standardized regression coefficients; not to be confused with Beta in finance; unfortunate name as the Greek letter Beta is also used for regular OLS coefficients\n\nEstimates the standard deviation change in the dependent variable for a 1-standard deviation change in the independent variable, holding other variables constant

Gauss-Markov Theorem

States that the OLS estimators are the Best Linear Unbiased Estimators (BLUE) given the assumptions of the CLRM

Natural (Quasi) Experiment

Subjects are assigned to groups based on conditions beyond the control of the researcher, such as public policy

True Experiment

Subjects are randomly assigned to two (or more) groups; one group from the population of interest is randomly assigned to the control group and the remainder is assigned to the treatment group(s)

First Difference Transformation

Subtract the previous value of a variable from the current value of that variable for a particular cross-sectional unit and repeat the process for all variables in the analysis

Regression Analysis

Techniques that allow for estimation of economic relationships using data; used for estimating the relationships among variables

Slope Coefficients

Tell the estimated direction of the impacts that the independent variables have on the dependent variables, and also show by how much the dependent variable changes (value or magnitude) when one of the independent variables increases or decreases

F-Statistic

Tests overall (joint) significance; in order to see how changes to your model affect explained variation, you want to comapre the different components of variance- can be done by using the F-statistic and generating an F-Distribution

Least Squares Principle

The Sample Regression Function should be constructed (with the constant and slope values) so that the sum of the squared distance between the observed values of your dependent variable and the values estimated from your SRF is minimized.

Z Score

The Z value or Z score is obtained by dividing the difference of a measurement and the mean by the standard deviation; this translates the variable into an easily measurable form and the probability will now be obtainable using a table

Econometrics

The application for statistical and mathematical theories to economics for the purpose of testing hypotheses and forecasting future trends; takes economic models and tests them through statistical trials; the branch of economics concerned with the use of mathematical methods in describing economic systems

Standardized Regression Coefficients

The calculation of standardized regression coefficients allows for the comparison of coefficient magnitudes in a multiple regrssion

Elasticity in a Log Model

The coefficients of a linear model that was derived from a non-linear model using logarithms represent the elasticity of the dependent variable with respect to the independent variable; the coefficient is the estimated percent change in Y for a percent change in X

Partial Slope Coefficients

The coefficients of the independent variables in a multiple regression; provide an estimate of the change in the dependent variable for a 1-unit change in the explanatory variable, assuming the value of all other variables in the regression model hold constant

Coefficients in a Log-Linear Model

The coefficients represent the estimated percent change in your dependent variable for a unit change in your independent variable; the regression coefficients in a log-linear model don't represent slope

Constant

The expected value of the dependent variable (Y) when all independent variables (X) are equal to 0

Residuals

The difference between the observed value and the estimated function value; distance from data points to the regression line

Non-normality of the error term in LPM

The error term of an LPM has a binomial distribution; implies that the t-tests and F-tests are invalid; because the error term will be the point on the line to either 1 or 0, it cannot have a normal distribution

p-value

The level of marginal significance within a statistical hypothesis test, representing the probability of the occurrence of a given event; the smaller the p-value, the more strongly the test rejects the null hypothesis\n\nThe lowest level of significance at which you could reject the null hypothesis given your calculated statistic

Efficient Estimators

The lower the variance of a variable, the more efficient; sometimes, a balance must be struck between inefficient vs efficient and biased vs unbiased estimators (it may be better to accept an biased estimator if it is more efficient than an unbiased estimate)

Fixed Effect Estimator

The most common method of dealing with fixed effects of cross-sectional units; applied by time demeaning the data, essentially calculating the average value of a variable over time for each cross-sectional unit and subtracting this mean from all observed values of a given cross sectional unit, repeating the procedure for all units\n\nThis deals with unobservable factors because it takes out any component constant over time

Robust (White-Corrected) Standard Errors

The most popular remedy for heteroskedasticity; uses the OLS coefficient estimates but adjusts the OLS standard errors for hetero without transforming the model being estimated; makes no assumptions about the functional form of the heteroskedasticity

Inflexion Point

The point at which a decreasing effect becomes increasing or a decreasing effect becomes increasing; observed in a cubic function

Marginal (Unconditional) Probability

The probability of the occurrence of a single event; the probability of one variable taking a specific variable irrespective of the values of the others

Interaction Term and Interacted Econometric Model

The product of two independent variables; an interacted econometric model includes a term that is the product of the dummy and quantitative variable for any given observation\n\nThis model useful if the qualitative characteristic only contains two groups\n\nThe inclusion of the interaction term allows the regression function to have a different intercept and slope for each group of indentified dummy variables

Cumulative Density Function (CDF)

The sum or accrual of probabilities up to some value; gives the area under the probability density function (PDF) from -infinity to X.

Pairwise Correlation Coefficients

The value of sample correlation for every pair of independent varaibles; for a general rule of thumb, correlation coefficientsaround 0.8 or above may signal a multicollinearity problem\n\nNote that just because the corrleation coefficient isn't near 0.8 or above doesn't mean that you are clear of multicollinearity problems

Heteroskedasticity in LPM

The variance of the LPM error term isn't constant; the variance of an LPM error term depends on the value of the independent variables

Adjusted R-Squared

This is an attempt to take account of the phenomenon of the R-Squared automatically increasing hen extra explanatory variables are added to the model; this variable includes a "degrees of freedom penalty" which maintains a reputable value considering the number of explanatory values used; may increase, decrease, or remain the same as more explanatory values are added

Weighted Least Squares (WLS) Technique

Transforms the heteroskedastic model into a homoskedastic model by using info about the nature of heteroskedasticity; divides both side of the model by the component of heteroskedasticity that gives the error term a constant variance\n\nCorrected coefficients should be near the OLS coefficients or the problem may have been something other than heteroskedasticity

Perfect Mutlicollinearity

Two or more independent variables in a regression model exhibit a deterministic linear relationship (meaning it is perfectly predictable and contains no randomness)\n\nIn a model with perfect multicollinearity, the regression coefficients are indeterminate and their standard errors are infinite

Uses of Chi-squared Distribution

Used for comparing estimated variance values from a sample to those values based on theoretical assumptions; used to develop confidence intervals and hypothesis tests for population variance

Probit and Logit Models

Used instead of OLS for situations involving a qualitative independent variable; the conditional probabilities are nonlinearly related to the independent variable(s); both models asymptotically approach 0 and 1, so the predicted probabilities are always sensible, unlike the OLS for qualitative variables which has probabilities extending beyond 0 and 1\n\nProbit is based off the standard normal function while the logit is based off the logistic CDF

Inverse Functions

Used when the outcome (dependent variable Y) is likely to approach some value asymptotically (as independent variable approaches 0 or infinity); Observable in economic phenomena where the variables are related inversly (inflation and unemployment, price and quantity demanded)

Log-Log Model

Useful model when the relationship is nonlinear in parameter because the log transformation generates the desired linearity in parameters; may be used to transform a model that's nonlinear in parameters to one that is linear

Overspecification

Using or including numerous irrelevant variables in the model

Data Mining

Using statistics to find models that fit data well; approached viewed unfavorably in economics

Linear Probability Model (LPM)

Using the OLS technique to estimate a model with a dummy dependent variable creates this model

Examples of Hypothesis Tests Situations

Value of one mean: Z\nValue of one mean with unknown pop variance: t\nValue of variance: Chi-squared\nComparing two means: t\nComparing two variances: F

Continuous Variable

Variables that can take on any value in a certain range; infinite and non-coutable

Discrete Variable

Variables that only take on a finite number of values(thus, all qualitative variables and some quantitative variables); can be described by integers and the outcomes are countable

Multicollinearity

When two or more predictor variables in a multiple regression are highly correlated, meaning that one can be linearly predicted from the others with a non-trivial degree of accuracy; in this case, the CLRM and OLS should not be used

Misspecification

When you fail to include a relevant independent variable or you use an incorrect functional form; along with restricted dependent variables (qualitative or percent scale data) may lead to the failing of a CLRM assumption


Set pelajaran terkait

World History Topic 3.5 and 3.6 Study Guide

View Set

Managerial Accounting: Vocab Ch. 10

View Set

Unit 10 - Real Estate Agency - Terms and Definitions

View Set

Stat: Homework_Chapter 5 (5.1-5.4)

View Set

REG Chapter 16 (Quiz 3- Business Org)

View Set