Quant 2 Final

¡Supera tus tareas y exámenes ahora con Quizwiz!

Sum of Residuals in OLS

zero

Level of Significance

(a) Probability of making a Type I error.

95% T Test

5% chance of Type I error: 5% chance of rejecting a true null

A completely randomized designA) has only one factor with several treatment groups.B) can have more than one factor, each with several treatment groups.C) has one factor and one block.D) has one factor and one block and multiple values.

Null Hypothesis

A numeric contention or requirement that the researcher seeks to determine whether statistical evidence is supportive

Discrete Random Variable

A quantitative variable whose set of possible values is countable.

Test Statistic

A quantity from a sample used to decide whether or not to reject the null hypothesis.

Sample Space

All the possible outcomes of a random experiment, usually denoted by the letter S.

Two-Tail Hypothesis Test

An hypothesis test of the null hypothesis that the value of a parameter, µ, is equal to a null value, µ0, designed to have power against the alternative hypothesis that either µ < µ0 or µ > µ0 (the alternative hypothesis contains values on both sides of the null value).

In a simple linear regression problem, r and b1A) may have opposite signs.B) must have the same sign.C) must have opposite signs.D) are equal.

OLS is BLUE

Best Linear Unbiased Estimator, meaning that no other unbiased linear estimator has a lower variance than the least-squares measures

Estimated Regression Coefficients

Beta hats, are empirical best guesses, obtained from a sample

The F test statistic in a one-way ANOVA isA) MSW/MSB.B) SSW/SSB.C) MSB/MSW.D) SSB/SSW.

Which of the following components in an ANOVA table are not additive?A) sum of squaresB) degrees of freedomC) mean squaresD) It is not possible to tell.

If a categorical independent variable contains 2 categories, then ________ dummy variable(s) will be needed to uniquely represent these categories.A) 1B) 2C) 3D) 4

Confidence Interval

A range containing the true value of an item a specific percentage of the time.

Interaction in an experimental design can be tested inA) a completely randomized model.B) a two-factor model.C) a Tukey-Kramer procedure.D) all ANOVA models.

A regression diagnostic tool used to study the possible effects of collinearity isA) the slope.B) the Y-intercept.C) the VIF.D) the standard error of the estimate.

Specification of the Model

Choose independent variables, functional form, and stochastic error term

T OR F The Durbin-Watson D statistic is used to check the assumption of normality.

Alternative Hypothesis

Statement in opposition of the null that the researcher seeks to detetermine whether statistical evidence is sufficient to call into question the null hypothesis. The "Research Question" that statistical evidence seeks to confirm

T or F: When the F test is used for ANOVA, the rejection region is always in the right tail.

T or F: You have just run a regression in which the value of coefficient of multiple determination is 0.57. To determine if this indicates that the independent variables explain a significant portion of the variation in the dependent variable, you would perform an F-test.

p-Value

The smallest level of significance at which the null hypothesis will be rejected, assuming the null hypothesis is true.

Standard Error of Beta Coefficient

A measure of sampling variation (standard deviation) of the slope term estimates of the population parameters. Dividing this value into the beta coefficient estimate yields a t-ratio for comparision with the null that the true Beta equals zero

A dummy variable is used as an independent variable in a regression model whenA) the variable involved is numerical.B) the variable involved is categorical.C) a curvilinear relationship is suspected.D) when 2 independent variables interact.

An independent variable Xj is considered highly correlated with the other independent variables ifA) VIFj < 5.B) VIFj > 5.C) VIFj < VIFi for i ≠ j.D) VIFj > VIFi for i ≠ j.

An interaction term in a multiple regression model may be used whenA) neither one of 2 independent variables contribute significantly to the regression model.B) the relationship between X1 and Y changes for differing values of X2.C) there is a curvilinear relationship between the dependent and independent variables.D) the coefficient of determination is small.

In a two-way ANOVA the degrees of freedom for the "error" term has degrees of freedomA) (r - 1)(c - 1).B) rc(n' - 1).C) (r - 1).D) rcn' + 1.

The standard error of the estimate is a measure ofA) the variation of the X variable.B) the variation of the dependent variable around the sample regression line.C) explained variation.D) total variation of the Y variable.

Which of the following is true for a two-factor ANOVA model:A) SSW = SST - (SSA + SSB + SSI)B) SST = SSW + SSA + SSB + SSIC) SSI = SST - SSW - (SSA - SSB)D) A and B above are true

Why would you use the Tukey-Kramer procedure?A) to test for homogeneity of varianceB) to test independence of errorsC) to test for normalityD) to test for differences in pairwise means

T or F: Adjusted R2 is calculated by taking the ratio of the regression sum of squares over the total sum of squares (SSR/SST) and subtracting that value from 1.

T or F: Collinearity is present if the dependent variable is linearly related to one of the explanatory variables.

T or F: If you are comparing the average sales among 3 different brands you are dealing with a three-way ANOVA design.

Univariate.

Having or having to do with a single variable. Some univariate techniques and statistics include the histogram, IQR, mean, median, percentiles, quantiles, and SD.

R Squared Meaning

Measures the percentage of the variation of Y around the mean of Y that is explained by the regression equation.

Simple Correlation Coefficient, r

Measures the strength and direction of a linear relationship between two variables

OLS or "Ordinary Least-Squares"

Regression technique which minimizes the sum of squared residuals

T or F: In a two-way ANOVA, it is easier to interpret main effects when the interaction component is not significant.

Degrees of Freedom

The number of values that are free to be varied given information, such as the sample mean, is known.

Statistics

The science that deals with the collection, tabulation, and systematic classification of quantitative data, especially as a basis for inference and induction.

Observed Level of Significance

The smallest level of significance at which the null hypothesis will be rejected, assuming the null hypothesis is true. It is also known as the p-value.

Standard Error of the Mean

The standard deviation of sample means.

Standard Error of the Proportion

The standard deviation of the sample proportions.

Population Standard Deviation

The standard deviation of the values of a variable for a population. This is a parameter, not a statistic.

Correlation of the resulting error terms with any of the explanatory variables and the dependent variable.

Zero, that is that the regression results yield estimated errors that are uncorrelated with any of the variables used in the regression model

High levels of intercorrelation among independent variables in a multiple regression modelA) can result in a problem called multicollinearityB) leads to shrinkage of the standard errors of the coefficient estimatesC) increases the likelihood of a type I errorD) All the above are true

In a one-way ANOVA, the null hypothesis is alwaysA) there is no treatment effect.B) there is some treatment effect.C) all the population means are different.D) some of the population means are different.

In a one-way ANOVAA) there is no interaction term.B) an interaction effect can be tested.C) an interaction term is present.D) the interaction term has (c - 1)(n - 1) degrees of freedom.

In a two-way ANOVA the degrees of freedom for the interaction term isA) (r - 1)(c - 1).B) rc(n - 1).C) (r - 1).D) rcn + 1.

Testing for the existence of correlation in simple two variable regression is equivalent toA) testing for the existence of the slope (β1).B) testing for the existence of the Y-intercept (β0).C) the confidence interval estimate for predicting Y.D) None of the above.

The F test statistic in a one-way ANOVA isA) MSB/MSW.B) SSB/SSW.C) MSW/MSB.D) SSW/SSB.

The degrees of freedom for the F test in a one-way ANOVA for the numerator and denominator, respectively, areA) (c - 1) and (n - c).B) (n - c) and (c - 1).C) (c - n) and (n - 1).D) (n - 1) and (c - n).

Population

A collection of units being studied. Units can be people, places, objects, epochs, drugs, procedures, or many other things. Much of statistics is concerned with estimating numerical properties (parameters) of an entire population from a random sample of units from the population.

Statistic

A computational measure that describes a characteristic about a sample. Statistics are used to estimate parameters, and to test hypotheses.

Deviation

A deviation is the difference between a datum and some reference value, typically the mean of the data. In computing the SD, one finds the rms of the deviations from the mean, the differences between the individual data and the mean of the data.

Variance

A measure of dispersion that describes the relative distance between the data points in the set and the mean of the data set.

Durbin-Watson Statistic

A measure of the extent of serial correlation. A value of 2 indicated no evidence of serial correlation.

Random Sample.

A random sample is a sample whose members are chosen at random from a given population in such a way that the chance of being sampled can be computed.

Confidence Interval

A range of values used to estimate a population parameter and associated with a specific confidence level.

What do we need to predict the magnitude of change?

A sample data and least-squares regression results

Point Estimate

A single value that best describes the population of interest, the sample mean being the most common.

Reqression Coefficient Elasticity Measure

A standardized measure of the importance of a variable, computed by multiplying the estimated regression coefficient time the ratio of Xbar/Ybar. Thus, this measure is at the means of X and Y

Sample

A subset of a population, usually randomly drawn or through stratified samples that represent population characteristics.

Stochastic Error Term

A term added to a regression equation to introduce all the variation in Y that cannot be explained by the included X's.

Central Limit Theorem

A theorem that states as the sample size, n, gets larger, the sample means tend to follow a normal probability distribution

Standard Units

A variable (a set of data) is said to be in standard units if its mean is zero and its standard deviation is one. A process that is said to "standardize" the data.

Continuous Random Variable

A variable that can assume any numerical value within an interval as a result of measuring the outcome of an experiment.

Pooled Estimate of the Standard Deviation

A weighted average of two sample variances.

Estimator

An estimator is a rule for calculating the value of a population parameter based on a random sample from the population. An estimator is a random variable, because its value depends on which particular sample is obtained, which is random. A canonical example of an estimator is the sample mean, which is an estimator of the population mean.

Event

An event is a subset of outcome space. An event determined by a random variable is an event of the form A=(X is in A). When the random variable X is observed, that determines whether or not A occurs: if the value of X happens to be in A, A occurs; if not, A does not occur.

One-Tail Hypothesis Test

An hypothesis test of the null hypothesis that the value of a parameter, µ, is equal to a null value, µ0, designed to have power against either the alternative hypothesis that µ < µ0 or the alternative µ > µ0 (but not both).

Assuming a linear relationship between X and Y, if the coefficient of correlation (r) equals -0.30,A) there is no correlation.B) the slope (b1) is negative.C) variable X is larger than variable Y.D) the variance of X is negative.

If the Durbin-Watson statistic has a value close to 0, which assumption is violated?A) homoscedasticityB) independence of errorsC) normality of the errorsD) None of the above.

If the Durbin-Watson statistic has a value close to 0, which assumption is violated?A) normality of the errorsB) independence of errorsC) homoscedasticityD) None of the above.

If the plot of the residuals is fan shaped, which assumption is violated?A) normalityB) homoscedasticityC) independence of errorsD) No assumptions are violated, the graph should resemble a fan.

If the residuals in a regression analysis of time ordered data are not correlated, the value of the Durbin-Watson D statistic should be near ________.A) 1.0B) 2.0C) 3.0D) 4.0.

In a multiple regression model, the Sdjusted r2A) cannot be negative.B) can sometimes be negative.C) can sometimes be greater than +1.D) has to fall between 0 and +1.

In a multiple regression model, the value of the coefficient of multiple determinationA) has to fall between -1 and +1.B) has to fall between 0 and +1.C) has to fall between -1 and 0.D) can fall between any pair of real numbers.

In a multiple regression model, which of the following is incorrect regarding the value of the adjusted r2 ?A) It can be negative.B) It has to be positive.C) It has to be larger than the coefficient of multiple determination.D) It will be greater than the unadjusted R-squared.

In a multiple regression problem involving two independent variables, if b1 is computed to be +2.0, it means thatA) the relationship between X1 and Y is significant.B) the estimated mean of Y increases by 2 units for each increase of 1 unit of X1, holding X2 constant.C) the estimated mean of Y increases by 2 units for each increase of 1 unit of X1, without regard to X2.D) the estimated mean of Y is 2 when X1 equals zero.

In a one-way ANOVA, if the computed F statistic exceeds the critical F value we mayA) reject H0 since there is evidence all the means differ.B) reject H0 since there is evidence of a treatment effect.C) not reject H0 since there is no evidence of a difference.D) not reject H0 because a mistake has been made.

In a two-way ANOVA the degrees of freedom for the interaction term isA) (r - 1).B) (r - 1)(c - 1).C) rc(n - 1).D) rcn + 1.

In performing a regression analysis involving two numerical variables, we are assumingA) the variances of X and Y are equal.B) the variation around the line of regression is the same for each X value.C) that X and Y are independent.D) All of the above.

Logarithm transformations can be used in regression analysisA) to test for possible violations to the autocorrelation assumption.B) to change a nonlinear model into a linear model.C) to reduce the impact of multicollinearity.D) to overcome violations to the autocorrelation assumption.

Testing for the existence of statistically significant correlation between two variables is equivalent toA) testing for the existence of the Y-intercept (β0).B) testing that the slope (β1) differs from zero in a two variable regression.C) the confidence interval estimate for predicting Y.D) None of the above.

The Variance Inflationary Factor (VIF) measures theA) correlation of the X variables with the Y variable.B) the extent that the standard error of a given variable is inflated because of linear relationships to other explanatory variabelsC) contribution of each X variable with the Y variable after all other X variables are included in the model.D) standard deviation of the slope.

The coefficient of multiple determination r2Y|X1,X2A) measures the variation around the predicted regression equation.B) measures the proportion of variation in Y that is explained by X1 and X2.C) measures the proportion of variation in Y that is explained by X1 holding X2 constant.D) will have the same sign as b1.

The degrees of freedom for the F test in a one-way ANOVA areA) (n - c) and (c - 1).B) (c - 1) and (n - c).C) (c - n) and (n - 1).D) (n - 1) and (c - n).

The residuals representA) the difference between the actual Y values and the mean of Y.B) the difference between the actual Y values and the predicted Y values.C) the square root of the slope.D) the predicted value of Y for the average X value.

The standard error of the estimate is a measure ofA) total variation of the Y variable.B) the variation around the sample regression line.C) explained variation.D) the variation of the X variable.

The strength of the linear relationship between two numerical variables may be measured by theA) scatter diagram.B) coefficient of correlation.C) slope.D) Y-intercept.

The strength of the linear relationship between two numerical variables may be measured by theA) slope.B) coefficient of correlation.C) scatter diagram.D) Y-intercept.

What do we mean when we say that a simple linear regression model is "statistically" useful?A) The model is "practically" useful for predicting Y.B) The model is a better predictor of Y than the sample mean, C) The model is an excellent predictor of Y.D) All the statistics computed from the sample make sense.

Which of the following is used to find a "best" model?A) odds ratioB) Mallows' CpC) standard error of the estimateD) SST

Why would you use the Levene procedure?A) to test for homogeneity of varianceB) to test independence of errorsC) to test for normalityD) to test for differences in pairwise means

If a categorical independent variable contains 4 categories, then ________ dummy variable(s) will be needed to uniquely represent these categories.A) 1B) 2C) 3D) 4

If a categorical independent variable contains 5 distinct types, such as a Student Home Residence factor (OKC region, Tulsa Region, Other OK, Texas, Other) , then when the model contains a constant term, ________ dummy variable(s) will be needed to uniquely represent these categories.A) 2B) 3C) 4D) 5

If one categorical independent variable contains 4 types and a second categorical independent variable contains two types, then when the model contains a constant term, ________ dummy variable(s) will be needed to uniquely represent these categories.A) 2B) 3C) 4D) 5

If the correlation coefficient (r) = 1.00, thenA) all the data points must fall exactly on a straight line with a slope that equals 1.00.B) all the data points must fall exactly on a straight line with a negative slope.C) all the data points must fall exactly on a straight line with a positive slope.D) all the data points must fall exactly on a horizontal straight line with a zero slope.

If the correlation coefficient (r) = 1.00, thenA) the Y-intercept (b0) must equal 0.B) the explained variation equals the unexplained variation.C) there is no unexplained variation.D) there is no explained variation.

In a one-way ANOVAA) an interaction term is present.B) an interaction effect can be tested.C) there is no interaction term.D) the interaction term has (c - 1)(n - 1) degrees of freedom.

Interaction between main effects in an experimental design can be tested inA) all ANOVA models.B) a Tukey-Kramer procedure.C) a two-factor model.D) a completely randomized model.

The coefficient of determination (r2) tells usA) that the coefficient of correlation (r) is larger than 1.B) whether r has any significance.C) the proportion of total variation (SST) that is explained (SSR).D) that we should not partition the total variation.

The logarithm transformation can be usedA) to overcome violations to the autocorrelation assumption.B) to test for possible violations to the autocorrelation assumption.C) to change a nonlinear model into a linear model.D) to change a linear independent variable into a nonlinear independent variable.

The logarithm transformation can be usedA) to overcome violations to the autocorrelation assumption.B) to test for possible violations to the autocorrelation assumption.C) to overcome violations to the homoscedasticity assumption.D) to test for possible violations to the homoscedasticity assumption.

Which of the following assumptions concerning the probability distribution of the random error term is stated incorrectly?A) The distribution is normal.B) The mean of the distribution is 0.C) The variance of the distribution increases as X increases.D) The errors are independent.

An interaction term in a multiple regression model may be used whenA) the coefficient of determination is small.B) there is a curvilinear relationship between the dependent and independent variables.C) neither one of 2 independent variables contribute significantly to the regression model.D) the relationship between X1 and Y changes for differing values of X2.

Clearly among all of the statistical techniques that we have studied, the most powerful and useful, because it is capable of incorporating other statistical models, isA) Two-factor analysis of variance.B) Simple linear regression analysis.C) Pairwise analysis of differences in population means.D) Multiple regression analysis

If a group of independent variables are not significant individually but are significant as a group at a specified level of significance, this is most likely due toA) autocorrelation.B) the presence of dummy variables.C) the absence of dummy variables.D) collinearity.

If the F statistic yields a p-value for a multiple regression model that is statistically significant, which of the following is true?A) The regression coefficients for all the independent variables are statistically significantly different from one another ( β1 ≠ β2 ≠ ... ≠ βk )B) The regression coefficients for all the independent variables are statistically significantly different from zero ( β1 ≠ 0 & β2 ≠ 0 & ...& βk ≠ 0)C)The sum of the regression coefficients is statistically significantly different from zero ( β1 + β2 + ... + βk ≠ 0)D)The regression coefficient for at least one independent variable is statistically significantly different from zero ( β1 ≠ 0; or β2 ≠ 0; or...or βk ≠ 0)

In a multiple regression problem involving two independent variables, if b1 is computed to be +2.0, it means thatA) the estimated mean of Y increases by 2 units for each increase of 1 unit of X1, without regard to X2.B) the estimated mean of Y is 2 when X1 equals zero.C) the relationship between X1 and Y is significant.D) the estimated mean of Y increases by 2 units for each increase of 1 unit of X1, holding X2 constant.

In a one-way ANOVA, if the computed F statistic exceeds the critical F value we mayA) not reject H0 since there is no evidence of a difference.B) not reject H0 because a mistake has been made.C) reject H0 since there is evidence all the means differ.D) reject H0 since there is evidence that some of the means differ, and thus evidence of a treatment effect.

The Y-intercept (b0) in the expression represents theA) variation around the sample regression line.B) change in estimated average Y per unit change in X.C) predicted value of Y.

The coefficient of determination (r2) tells usA) that the coefficient of correlation (r) is larger than 1.B) whether r has any significance.C) that we should not partition the total variation.D) the proportion of total variation that is explained.

The p-value measuresA) the probability that the null hypothesis should be rejected.B) the highest level of confidence we can have and still accept the null hypothesis.C) the lowest level of confidence we can have in rejecting the null hypothesis.D) the lowest possible level of significance for which the null hypothesis would still be rejected.

What do we mean when we say that a simple linear regression model is "statistically" useful?A) All the statistics computed from the sample make sense.B) The model is an excellent predictor of Y.C) The model is "practically" useful for predicting Y.D) The model is a better predictor of Y than the sample mean,

Which of the following will NOT change a nonlinear model into a linear model?A) quadratic regression modelB) logarithmic transformationC) square-root transformationD) variance inflationary factor

Which of the following will generally lead to a model exhibiting a "better fit?"A) The range of X values is extended.B) A previously excluded important variable is added to the model.C) Sample size is increased.D) All of the above tend to improve the "goodness of fit".

Why would you use the Tukey-Kramer procedure?A) to test for normalityB) to test for homogeneity of varianceC) to test independence of errorsD) to test for differences in pairwise means

Measure of Central Tendency

Describes the center point of our data set with a single value.

Sampling Distribution for the Difference in Means

Describes the probability of observing various intervals for the difference between two sample means.

If the correlation coefficient (r) = 1.00 for two variables, thenA) All points lie on a straight line.B) there is no explained variation.C) the explained variation equals the unexplained variation.D) there is no unexplained variation.E) A and D above.

In a multiple regression model, which of the following is correct regarding the value of the adjusted r2 ?A) It is equal to 1 - MSE/MSTB) It is always smaller than the coefficient of multiple determination.C) It can be negative.D) Unlike r2, it can decline when a new variable is added to the regression modelE) All of the above.

Multiple Regression AnalysisA) had its early origins in extensions of analysis of variance.B) uses least-squared errors techniques to estimate regression coefficients.C) rests on a foundation of linearity, and independent/identically normally distributed error term assumptions.D) can be used to estimate non-linear models through use of logarithmic transformations.E) All the above are true of multiple regression analysis.

Signs that multicollinearity among explanatory variables may be a problem are indicated byA) Closely related variables having opposite signs in the regression results.B) High correlations between explanatory variables that are much higher than their relationship to the dependent variable.C) Standard errors of the coefficients that fall appreciably when one of the highly correlated variables is "dropped" from the model.D) Significant F tests for the overall regression, but failure to achieve significance for individual correlated variables.E) All the above are indicators of multicollinearity problems.

The width of the prediction interval for the predicted value of Y is dependent onA) the standard error of the estimate.B) the sample size.C) the value of X for which the prediction is being made.D) the level of confidence chosen.E) All of the above.

T OR F A completely randomized ANOVA design with 4 groups would have 12 possible pairwise comparisons mean comparison.

T OR F Data that exhibit an autocorrelation effect violate the regression assumption of homoscedasticity, or constant variance.

T OR F The coefficient of determination is computed as the ratio of SSE to SST.

T or F: A multiple regression is called "multiple" because it has several data points.

T or F: A regression had the following results: SST = 102.55, SSE = 82.04. It can be said that 90.0% of the variation in the dependent variable is explained by the independent variables in the regression.

T or F: A regression had the following results: SST = 82.55, SSE = 29.85. It can be said that 73.4% of the variation in the dependent variable is explained by the independent variables in the regression.

T or F: The Durbin-Watson D statistic is used to check the assumption of normality.

T or F: The total sum of squares (SST) in a regression model will never exceed the regression sum of squares (SSR).

T or F: When a dummy variable is included in a multiple regression model, the interpretation of the estimated slope coefficient does not make any sense anymore.

T or F: When an additional explanatory variable is introduced into a multiple regression model, the Adjusted r2 can never decrease.

Factorial

For an integer k that is greater than or equal to 1, k! (pronounced "k factorial") is k×(k-1)×(k-2)× . . . ×1. By convention, 0! = 1. There are k! ways of ordering k distinct objects. For example, 9! is the number of batting orders of 9 baseball players, and 52! is the number of different ways a standard deck of playing cards can be ordered. The calculator above has a button to compute the factorial of a number. To compute k!, first type the value of k, then press the button labeled "!".

Empirical Rule

If a distribution follows a bell-shaped, symmetrical curve centered around the mean, we would expect approximately 68, 95, and 99.7 percent of the values to fall within one, two, and three standard deviations around the mean respectively.

Type I Error

Rejecting a true null hypothesis. Also known as the alpha error, as determined by the level of significance

Law of Large Numbers

States that in repeated independent trials the absolute value difference between a sample statistic and the true population parameter is less than some arbitrary constant value approaches unity as sample size grows.

Econometrics

Statistical measurement of economic phenomena to determine independent influence of explanatory variables on a specified dependent variable

Regression Analysis

Statistical technique to "explain" movements in one variable as a function of movements in another

T or F: A completely randomized design with 4 groups would have 6 possible pairwise comparisons.

T or F: A high value of F significantly above the critical value of F in multiple regression accompanied by insignificant t-values on all parameter estimates very often indicates a high correlation between independent variables in the model.

T or F: A multiple regression is called "multiple" because it has several explanatory variables.

T or F: A regression had the following results: SST = 102.55, SSE = 82.04. It can be said that 20.0% of the variation in the dependent variable is explained by the independent variables in the regression.

T or F: A regression had the following results: SST = 82.55, SSE = 29.85. It can be said that 63.84% of the variation in the dependent variable is explained by the independent variables in the regression.

T or F: Collinearity is present when there is a high degree of correlation between independent variables.

T or F: Consider a regression in which b2 = -1.5 and the standard error of this coefficient equals 0.3. To determine whether X2 is a significant explanatory variable, you would compute an observed t-value of -5.0.

T or F: Data that exhibit an autocorrelation effect violate the regression assumption of independence.

T or F: From the coefficient of multiple determination, we cannot detect the strength of the relationship between Y and any individual independent variable.

T or F: If the residuals in a regression analysis of time ordered data are not correlated, the value of the Durbin-Watson D statistic should be near 2.0

T or F: If we have taken into account all relevant explanatory factors, the residuals from a multiple regression should be random.

T or F: In a two-factor ANOVA analysis, the sum of squares due to both factors, the interaction sum of squares and the within sum of squares must add up to the total sum of squares.

T or F: In calculating the standard error of the estimate, there are n - k - 1 degrees of freedom, where n is the sample size and k represents the number of independent variables in the model.

T or F: One of the consequences of collinearity in multiple regression is inflated standard errors in some or all of the estimated slope coefficients.

T or F: Regression analysis is used for prediction, while correlation analysis is used to measure the strength of the association between two numerical variables.

T or F: The F test in a completely randomized model is just an expansion of the t test for independent samples.

T or F: The MSE must always be positive.

T or F: The Regression Sum of Squares (SSR) can never be greater than the Total Sum of Squares (SST).

T or F: The coefficient of determination represents the ratio of SSR to SST.

T or F: The coefficient of multiple determination measures the fraction of the total variation in the dependent variable that is explained by the set of independent variables.

T or F: The coefficient of multiple determination r2Y|X1,X2 measures the proportion of variation in Y that is explained by X1 and X2.

T or F: The goals of model building are to find a good model with the fewest independent variables that is easier to interpret and has lower probability of collinearity.

T or F: The interpretation of the slope is different in a multiple linear regression model as compared to a simple linear regression model.

T or F: The standard error of the estimate, the measure of "scatter" of Y values about the regression line is the square-root of MSE.

T or F: When an additional explanatory variable is introduced into a multiple regression model, the coefficient of multiple determination will never decrease.

T or F: When an explanatory variable is dropped from a multiple regression model, the adjusted r2 can increase.

Power of the Test

The chance of correctly rejecting the null hypothesis when a given alternative hypothesis is true is called the power of the test against that alternative.

Probability Density Function

The chance that a continuous random variable is in any range of values can be calculated as the area under a curve over that range of values.

Critical value

The critical value in an hypothesis test is the value of the test statistic beyond which we would reject the null hypothesis. The critical value is set so that the probability that the test statistic is beyond the critical value is at most equal to the significance level if the null hypothesis be true.

Standard Error of the Difference between Two Means

The error describes the variation in the difference between two sample means.

Population Proportion

The true share of the population having a specified characteristic. For example, the population of eligible voters preferring a particular candidate for office.. The population proportion is a parameter.

Inferential Statistics

Used to make claims or conclusions about a population based on a sample of data from that population.

Adjusted R Squared Formula

1 - MSE/MST, where MSE is the Mean Squared Error (Variance of the error term) and MST is estimated variance of the dependent variable

Z Transformation

A linear transformation of a variable, such that from each observation the mean is subtracted and the resulting value is divided by the standard deviation. The transformed variable has a mean of zero and a variance of unity.

Discrete Probability Distribution

A listing of all the possible outcomes of an experiment for a discrete random variable along with the relative frequency or probability.

Hypothesis Testing

A statistical test regarding specified values about a population parameter.

Type I Error

A Type I error occurs when the null hypothesis is rejected erroneously when it is in fact true.

Type II Error

A Type II error occurs if the null hypothesis is not rejected when it is in fact false.

Fundamental Counting Principle

A concept that states if one event can occur in m ways and a second event can occur in n ways, the total number of ways both events can occur together is m * n ways.

Random Variable

A random variable is an assignment of numbers to possible outcomes of a random experiment.

Type II Error

Accepting a false null hypothesis. Also known as a beta error, and can be calculated only in reference to specific values of the alternative hypothesis

Two Sided Test

Alternative hypothesis is given for two sides of the null (null=0)

One Sided Test

Alternative hypothesis is only given for one side of the null, a more "powerful" test than a two sided test, meaning lower probability of Type II error

Residual Sum of Squares, a.k.a SSE for Sum of Squared Errors

Amount of squared deviation that is unexplained by the regression line, the squared difference of the actual value of Y from the predicted value of Y from the estimated regression equation, that is Sum(Y - Yhat)^2

Explained Sum of Squares, a.k.a SSR for Sum of Squares Regression

Amount of the squared deviation of the predicted value of Y as determined by the estimated regression equation from the mean of Y, that is Sum(Yhat - Ybar)^2

Sampling Error

An error which occurs when the sample measurement is different from the population measurement.

T or F: Collinearity is present when there is a high degree of correlation between the dependent variable and any of the independent variables.

Multivariate Regression Coefficient

Change in the dependent variable associated with a one unit increase in the independent variable, holding all other independent variables constant.

Margin of Error

Concept determines the width of a confidence interval and is calculated using zcσx.

T or F: Multiple regression is the process of using several independent variables to predict a number of dependent variables.

T or F: One of the consequences of collinearity in multiple regression is biased estimates on the slope coefficients.

T or F: The analysis of variance (ANOVA) tests hypotheses about the population variance.

Serial Correlation a.k.a. Autocorrelation

Correlation of the error terms, typically first order correlation where the error for time period t is correlated with the error from the prior time period, t-1. Tends to reduce standard errors, meaning that we are more likely to say that a variable matters when it doesn't (Type I Error)

T or F: The value of r is always positive.

Cross-Sectional

Data set that includes entries from the same time period but different economic entities (countries, for instance)

Time Series Data

Data set that is ordered by time, typically generating an serial or autocorrelation violation of the randomness of the error term

Parameter

Data that describes a characteristic about a population mean and standard deviation. Usually referenced by a Greek letter.

Alternative Hypothesis

Denoted by H1, represents the opposite of the null hypothesis and holds true if the null hypothesis is found to be false.

Three purposes of econometrics

Describe reality, Test hypotheses, Predict the future

Critical T Value

Determined by degress of freedom and the level of significance

Residual, e

Difference between dependent variable actual value and the estimated value of the dependent variable from the regression results

Sampling Distribution

Distribution of different values of B Hat across different samples

Critical Value

Divides the acceptance from rejection region

Correlation of the error terms in a multiple regression modelA) can result in a problem called autocorrelation.B) leads to shrinkage of the standard errors of the coefficient estimates.C) increases the likelihood of a type I error, saying a variable matters when it really does not.D) is measured by the Durbin-Watson statistic.E) All the above are true

Gauss-Markov Theorem

Given the Classical Assumptions, the OLS estimator is the minimum variance estimator from all linear unbiased estimators

Omitted Variable

Important explanatory variable has been left out

Rejection Region

In an hypothesis test using a test statistic, the rejection region is the set of values of the test statistic for which we reject the null hypothesis.

Sampling error

In estimating from a random sample, the difference between the estimator and the parameter can be written as the sum of two components: bias and sampling error.

Null Hypothesis

In hypothesis testing, denoted by Ho, this represents the status quo and involves stating the belief that the mean of the population is a specific value or is less than or greater than a specified value.

Multicollinearity

Intercorrelation of Explanatory variables that can lead to expansion of the standard errors of the coefficient estimates, thereby leading one to say that a variable does not matter when it actually does (Type II error)

Classical Assumptions

Linear, Zero Population Mean, Explanatory Variables Uncorrelated with Error, Error Term is Uncorrelated with Itself, Error has Constant Variance, No Perfect Multicollinearity

What do we need to predict direction of change for individual variables?

Knowledge of economic theory and general characteristics of how explanatory variables relate to the dependent variable under consideration

The ________ (larger/smaller) the value of the Variance Inflationary Factor, the higher is the collinearity of the X variables.

Larger

Consequence of Omitting a Relevant Variable

Leads to biased estimates of other variables

Consequence of Including an Irrelevant Variable

Leads to higher variances of estimated coefficients

Lower Degrees of Freedom

Less reliable estimates

Level of Significance

Level of Type 1 Error

Unbiased

Not biased; having zero bias. The sample mean and sample proportion are unbiased estimates of true population values.

Degrees of Freedom

Observations - Slope Coefficients - 1

Four sources of variation in Error Term

Omitted variables, Measurement Error, Different Functional Form, Purely Random Component

Dummy variable

Only takes on values 0 and 1

Econometrics method of interest

Ordinary Least Squares or OLS: Single-equation linear regression analysis

Interval Estimate

Provides a range of values that best describe the population.

Unbiased Estimator

Sampling distribution has its expected value equal to the true value of B.

Normalized Beta Coefficient

Slope-term multiplied by the ratio of the standard deviation of the independent variable to the standard deviation of the dependent variable. Transformed slope-term then reads as the standard deviation change in Y per one standard deviation change in X.

Standard Error of the Estimate

Square-root of the Mean Squared Error (MSE), a measure of the standard deviation of error terms about the regression line.

Total Sum of Squares, a.k.a. SST for Sum of Squares Total

Squared variations of Y around its mean

Student's t curve

Student's t curve is a family of curves indexed by a parameter called the degrees of freedom, which can take the values 1, 2, . . .

T or F: In a one-factor ANOVA analysis, the between sum of squares and within sum of squares must add up to the total sum of squares.

T OR F Collinearity among explanatory variables in multiple regression analysis increases the standard errors of the beta coefficient estimates, thereby increasing the likelihood of committing a Type II error.

T OR F Even though multicollinearity and autocorrelation are violations of major assumptions of multiple regression analysis, our simulations reveal that the expected values of the coefficient estimates are still equal to the true population values.

T OR F If the null hypothesis is true in one-way ANOVA, then both MSB and MSW provide unbiased estimates of the population variance of the variable under investigation.

T OR F If there are four dummy variables for some phenomenon such as occupation group, and all four are included in the regression, then the constant term must be excluded.

T OR F Regression analysis is used for prediction and estimation of the impact and explanatory power of an independent variable on a dependent variable, while correlation analysis is used to measure the strength of the linear relationship between two numerical variables and implies no causal relationship.

T OR F The confidence interval for the mean of Y given X in regression analysis is always narrower than the prediction interval for an individual response Y given the same data set, X value, and confidence level.

T or F Referring to the above table, the null hypothesis that all means are equal.

Sampling distribution

The sampling distribution of an estimator is the probability distribution of the estimator when it is applied to random samples.

Adjusted R Squared

The R Squared that has been adjusted for Degrees of Freedom lost, since adding an independent variable to the original R Squared will likely increase it, even if the fit isn't necessarily better.

L.I.N.E

The OLS model assumptions: Linear in parameters, the errors are Independent, Normally distributed, and there is Equality of variance of error terms about the regression line

Expectation, Expected Value.

The expected value of a random variable is the long-term limiting average of its values in independent repeated experiments.

Meaning of Regression Coefficient

The impact of a one-unit increase in X1 on the dependent variable Y, holding all other independent variables constant.

Population Mean

The mean of the numbers in a numerical population. For example, the population mean of a box of numbered tickets is the mean of the list comprised of all the numbers on all the tickets. The population mean is a parameter.

Normal curve

The normal curve is the familiar "bell curve:," illustrated on this page. The mathematical expression for the normal curve is y = (2×pi)-½e-(x^2)/2, where pi is the ratio of the circumference of a circle to its diameter (3.14159265 . . . ), and e is the base of the natural logarithm (2.71828 . . . ). The normal curve is symmetric around the point x=0, and positive for every value of x. The area under the normal curve is unity, and the SD of the normal curve, suitably defined, is also unity.

Permutations

The number of different ways in which objects can be arranged in order.

Combinations

The number of different ways in which objects can be arranged without regard to order.

Expected Frequencies

The number of observations that would be expected for each category of a frequency distribution, assuming the null hypothesis is true with chi-squared analysis.

Independent Sample

The observation from one sample is not related to any observations from another sample.

Z-score

The observed value of the Z statistic.

Sampling Distribution of the Mean

The pattern of the sample means that will occur as samples are drawn from the population at large.

Symmetric Distribution

The probability distribution of a random variable X is symmetric if there is a number a such that the chance that X>=a+b is the same as the chance that X<=a-b for every value of b.

Probability Distribution

The probability distribution of a random variable specifies the chance that the variable takes a value in any subset of the real numbers.

Confidence Level

The probability that the interval estimate will include the population parameter.

Experiment

The process of measuring or observing an activity for the purpose of collecting data.

Sample Standard Deviation, S.

The sample standard deviation S is an estimator of the standard deviation of a population based on a random sample from the population.

R Squared Formula

The ratio of SSR/SST, explained sums of squares divided by total sums of squares

Significance of Statistical Test

The significance level of an hypothesis test is the chance that the test erroneously rejects the null hypothesis when the null hypothesis is true.

Mean Square Error

Variance of the Regression, or SSE/(n-k-1) where SSE is "sum of squared errors," n is # of observations, k is # of independent variables.

Correlation of the error terms with any of the explanatory variables in the model

Quant 2 Final

Conjuntos de estudio relacionados

Human Biology: Quiz #5

Astronomy Quizzes and Homework

Chapter 5, 6 and 7

(COURSE 1) SECTION 10: FEDERAL AND CALIFORNIA FAIR HOUSING LAWS

Developmental Psychology - Chapter 6

History of Paris exam 1

Managment 204 Exam review

CSET Math Test Prep

Lesson 5 - Network Routing Principles

HESI 800

Weather Forecasts

EMT SELF STUDY CH 26

physics final conceptual

Chapter 15: Properties of Liquids (and 13)

A/P Lab 29

prepU: 1 & 2

Cyber Operations Exam 2

Business Law Chapter 14

Marketing-Test 1 Chapter 4

ha