Quantitative Analysis Final Exam

Ace your homework & exams now with Quizwiz!

The logarithm transformation can be used

to change a nonlinear model into a linear model.

The owner of a fish market has an assistant who has determined that the weights of catfish are normally distributed, with mean of 3.2 pounds and variance of 0.64 pound. If a random sample of 256 fish is taken, what would the standard error of the mean weight equal?

0.050

The test for the effect of blocking in a randomized block ANOVA with three factors and 20 blocks:

An upper-tail F test with 19 and 38 degrees of freedom

ANOVA had its origins in pharmacological research.

False

You asked a group of students to indicate their home country. Answers were: 1 if U.S., 2 if Canada, 3 if Mexico, 4 if China, and 5 if India. If this variable was to be used in a regression model as an independent variable, it would be modeled as:

Four different dummy variables.

In a multiple regression model, which of the following is incorrect regarding the value of the adjusted R2 ?

It has to be positive.

The ________ (larger/smaller) the value of the Variance Inflationary Factor, the higher is the collinearity of the X variables.

Larger

Consumer ratings of soft drinks are indicative of what type of data?

Ordinal

Tabulating respondents by educational attainment (High School, College, Graduate Degree) is an example?

Ordinal

The least squares method minimizes which of the following?

SSE

The least squares line is:

The line that minimizes the sum of squared deviations between each data point and the regression line.

A tennis player is trying to improve the consistency of her serves. Her goal is that she hits fewer than 10 percent of her serves out-of-bounds. In a weekend of practice, she hits 400 serves and 35 of them are out-of-bounds:

This is a left-tail test given the nature of the test as an objective/requirement/goal consistent with having a mean achievement less than 10%.

In a sample of size 55, the sample mean is 20. In this case, the sum of all observations in the sample is ∑Xi= 1010.

True

An independent variable Xj is considered highly correlated with the other independent variables if

VIFj > 5.

Which of these is an example of a categorical variable?

flavor of soft drink ordered by each customer at a fast food restaurant

A dummy variable is used as an independent variable in a regression model when

the variable involved is categorical.

In a one-way ANOVA, the null hypothesis is always

there is no treatment effect.

The degrees of freedom for the F test in a one-way ANOVA are

(c - 1) and (n - c).

In a two-way ANOVA the degrees of freedom for the interaction term is

(r - 1)(c - 1).

Of the graduate students at a certain university, 80% like M&M's (M), 50% like the coffee in their graduate lounge (C), and 30% like both (B). What is the probability that a graduate student likes M&M's, given that he/she likes coffee?

0.600

If a categorical independent variable contains 2 categories, then ________ dummy variable(s) will be needed to uniquely represent these categories.

1

Which of the following is indicative of an inferential statistic?

A Gallup Poll sample shows 51 percent of votes will vote for Romney

A study measured the time it took each of ninety students to complete a final exam. Researchers compared the mean test time of three groups of students based on the time students spent preparing for the exam (less than 10 hours, 10 to 20 hours, and more than 20 hours). This is an example of:

A one-way ANOVA

A test of whether or not it rains 50% of the time in Seattle would be considered:

A two-tailed test

When we make predictions that are outside of the range of our independent variables, we are:

Extrapolating

Correlation necessarily implies a causal dependency between the correlated variables.

False

If the "blocking" factor proves insignificant, it is best to return to the one-way ANOVA format to reduce the probability of a type I error.

False Do the "one-way" analysis if the blocking factor is insignificant to improve the Power of the Test.

When two independent variables are highly correlated with each other:

Multicollinearity exists.

A sampling error is caused by:

The difference between the true population parameter and the estimated sample statistic.

What do we mean when we say that a simple linear regression model is "statistically" useful?

The model is a better predictor of Y than the sample mean

The coefficient of determination (R2) is:

The percent of variability in the dependent variable that is explained by the independent variable (or variables).

Which of the following is not true of a Box Plot display?

The proximity of the mean and median are revealed.

If the correlation coefficient (r) = 1.00, then

all the data points must fall exactly on a straight line with a positive slope.

The strength of the linear relationship between two numerical variables may be measured by the

coefficient of correlation.

In a multiple regression model, the value of the Coefficient of Multiple Determination, R2

has to fall between 0 and +1.

Which of the following components in an ANOVA table are not additive?

mean squares

The Y-intercept (b0) represents the

predicted value of Y when X = 0.

An interaction term in a multiple regression model may be used when

the relationship between X1 and Y changes for differing values of X2.

In a one-way ANOVA

there is no interaction term.

Given the following contingency table; What is the P(A)?; P(A|B); and Are the events A and B independent?:

0.300 0.300 Yes

A two-factor ANOVA study was conducted with 3 levels of Factor A and 4 levels of Factor B. 120 observations were included in this data set. The number of replications in this two-factor ANOVA design is:

10

If we are selecting only one card from a standard deck of 52 playing cards, what is the probability of picking either a black card or an ace?

28/52 ≈ 0.5385

If a categorical independent variable contains 4 categories, then ________ dummy variable(s) will be needed to uniquely represent these categories.

3

The head librarian at the Library of Congress has asked her assistant for an interval estimate of the mean number of books checked out each day. The assistant provides the following 95% confidence interval estimate: from 840 to 920 books per day. If the head librarian knows that the population standard deviation is 150 books checked out per day, approximately how large a sample did her assistant use to determine the interval estimate?

54

Which of the following is not a property of the standard normal distribution?

68.26% of observations lie within ±1 standard deviation of the mean The mean, median, and mode are equal the range is - infinity to + infinity the variance is equal to the standard deviation

According to the empirical rule, if the data form a "bell-shaped" normal distribution, ________ percent of the observations will be contained within 2 standard deviations around the arithmetic mean.

95.44

Causality means that:

A change in variable A leads to a change in variable B.

A machine allocates miniature cookies to 100 calorie packages. According to the manufacturer, the average weight of the bag should be 0.81oz with a standard deviation of 0.03oz. The mean weight of a random sample of ten bags is taken and the mean weight is 0.82oz. In consequence:

Adjustment is probably not necessary because deviations of about one standard error of the mean are not uncommon in statistical analysis.

The denominator degrees of freedom in the ANOVA F test is:

Always the number of degrees of freedom of the residual.

The error terms (e):

Are assumed to be normally distributed around the regression line with a mean of zero.

Ratio data

Are distinguishable by having a natural zero point Have meaningful differentials between values Have a natural meaning to such concepts as a doubling in value Are exemplified by such variables as income and weight

A measure of relative variation, (1) is computed by (2) the be (3) by the (4) , and useful in comparing two or more distributions.

Coefficient of Variation, dividing, standard deviation, mean

Sample sizes for each population being tested with ANOVA must be equal for the technique to be utilized.

False

T/F: A multiple regression is called "multiple" because it has several data points.

False

T/F: A regression had the following results: SST = 102.55, SSE = 82.04. It can be said that 80.0% of the variation in the dependent variable is explained by the independent variables in the regression.

False

T/F: A regression had the following results: SST = 82.55, SSE = 29.85. It can be said that 73.4% of the variation in the dependent variable is explained by the independent variables in the regression.

False

T/F: Adjusted R2 is calculated by taking the ratio of the regression sum of squares over the total sum of squares (SSR/SST) and subtracting that value from 1.

False

T/F: Collinearity is present if the dependent variable is linearly related to one of the explanatory variables.

False

T/F: Collinearity is present when there is a high degree of correlation between the dependent variable and any of the independent variables.

False

T/F: Consider a regression in which b2 = -1.5 and the standard error of this coefficient equals 0.3. To determine whether X2 is a significant explanatory variable, you would compute an observed t-value of -4.5.

False

T/F: If you are comparing the average sales among 3 different brands you are dealing with a three-way ANOVA design.

False

T/F: Multiple regression is the process of using several independent variables to predict a number of dependent variables.

False

T/F: One of the consequences of collinearity in multiple regression is biased estimates on the slope coefficients.

False

T/F: The Durbin-Watson D statistic is used to check the assumption of normality.

False

T/F: The analysis of variance (ANOVA) tests hypotheses about the population variance.

False

T/F: The total sum of squares (SST) in a regression model will never exceed the regression sum of squares (SSR).

False

T/F: The value of r is always positive.

False

T/F: When an additional explanatory variable is introduced into a multiple regression model, the Adjusted R2 can never decrease.

False

The analysis of variance (ANOVA) tests hypotheses about the population variance.

False

The Y-intercept (b0) represents the change in estimated average Y per unit change in X.

False This is estimated by the slope term.

If the plot of the residuals is fan shaped, the assumption of independence of errors is violated.

False homoscedasticity assumption is violated under such conditions.

The least squares method minimizes SSR, "sum of squares regression."

False minimizes the sum of squared errors

In a simple linear regression model, r and b1 can possibly have opposite signs

False must have the same sign.

The residual represents the discrepancy between the observed independent variable and its predicted value.

False observed dependent variable and its predicted value based on the regression intercept and slope term.

In performing a regression analysis involving two numerical variables, we are assuming that X and Y share a common variance.

False the variation around the line of regression is the same for each X value.

The standard error of the estimate, SYX, is a measure of total variation of the Y variable.

False the variation around the sample regression line.

The result of a regression analysis estimating personal consumption expenditures in the U.S. (y, measured in billions of dollars) using time (x, measured in months) over the past 500 months resulted in the following regression equation: y=10830.96+22.6571x. Based on these results, we can say that:

For the 12th month in the data set, the estimated expenditure (y) was 11,102.8452 billion dollars.

An appliance manufacturer claims to have developed a compact microwave oven that consumes a mean of no more than 250 W. From previous studies, it is believed that power consumption for microwave ovens is normally distributed with a standard deviation of 15 W. A consumer group has decided to try to discover if the claim appears true. They take a sample of 20 microwave ovens and find that they consume a mean of 257.3 W. Referring the above discussion, the appropriate hypotheses to determine if the manufacturer's claim appears reasonable are:

H0 : μ ≤ 250 versus H1 : μ > 250

Your company is considering purchase of a radio station that claims to have a 30% market share. The purchase is not justified unless that market share exceeds 30%. You are instructed to conduct a survey to determine whether the purchase should be made. In such a case, the alternative hypothesis is:

H1: ρ > .3

A shipment of 100 components arrive, five of which are defective. To analyze the probability of observing one or more defective components in a sample of 10 units, the following distribution would be used:

Hypergeometric distribution

Time and temperature, because they have no natural zero point but have measurable differentials, are indicative of what type of data?

Interval

Which of the following statements about the median is true?

It is less affected by extreme values than the arithmetic mean. It is a positional, not calculated, measure of central tendency. It is equal to Q2, the second quartile. It is equal to the mean and the mode in bell-shaped "normal" distributions.

The F test statistic in a one-way ANOVA is

MSB/MSW.

The correlation coefficient

Measures the extent of linear relationship between two variables Depends only on Z scores and is thus unaffected by linear transformations of variables. Is sensitive to outliers

If both events A and B cannot occur simultaneously, then these two events are said to be (1) . If one of the two events A and B must occur, then the set of events is (2) .

Mutually exclusive, Collectively exhaustive

Which of the following provides an example of a variable that follows an Exponential distribution?

None of the above.

The "N" in the L.I.N.E acronym for linear regression assumptions stands for

Normally distributed error terms

When using the general multiplication rule, P[A and B] is equal to

P(A|B)P(B).

On the average, 1.8 customers per minute arrive at any one of the checkout counters of a grocery store. What type of probability distribution can be used to find out the probability that there will be no customer arriving at a checkout counter?

Poisson distribution

Which of the following measures is likely regarded as "least useful" as a measure of dispersion?

Range

A distribution with the "bulk" of the values at relatively low levels and few and fewer observiations at high levels is said to be (1) . In this distribution, the mean is (2) the median.

Right-skewed, Greater than

In a two-factor ANOVA, the interaction term was found to be significant. This means that:

The effect of Factor A on the dependent variable changes with the levels of Factor B.

Our experiments with full sample space sampling distributions from a parent population reveal which of the following to always be true:

The mean of the sample means is always equal to the parent population mean The variance of the sample Sum is always equal to n times the population variance The variance of the sample mean is always equal to the variance of the population divided by n The mean of the sample Sum is always equal to n times the population mean

In regard to the prior question regarding the tennis player and her objective, and knowing that the variance of the sample proportion is p(1-p)/n, where p is the hypothesized value, and p(1-p)/n = 0.1*0.9/400, sample proportion is .0875, we have the ingredients for computation of a Z score. In consequence, we find:

The null is accepted at conventional levels of significance.

A Type II error is:

The probability of accepting a false null hypothesis

Which of the following assumptions concerning the probability distribution of the random error term is stated incorrectly?

The variance of the distribution increases as X increases.

As a general rule, a data point is considered to be an outlier if it is more than (1) standard deviations away from the mean. In a bell-shaped normal distribution, approximately (2) of the data lie within this range.

Three, 99.7%

A "level" describes the number of categories of interest in ANOVA.

True

A completely randomized design with 4 groups would have 6 possible pairwise mean comparisons.

True

ANOVA was a predecessor to Regression Analysis.

True

Autocorrelation of residuals is exhibited by patterns of plus and minus variation in residuals about the regression line. This indicates that the errors are not independent, an important assumption of regression analysis.

True

Data that exhibit an autocorrelation effect violate the regression assumption of independence of error terms.

True

Donnelly asserts that the populations under investigation through ANOVA must be normally distributed.

True

Equality of population variances is a required assumption in ANOVA modeling. The Levine procedure can be used to test the hypothesis of equality of variance.

True

Grouping samples using a "blocking" factor is a way to control the influence of other factors, such as soil acidity in agricultural research. Strangely this method is called "completely randomized block ANOVA."

True

In a two-factor ANOVA analysis, the sum of squares due to both factors, the interaction sum of squares and the within sum of squares must add up to the total sum of squares.

True

Increasing variation in the dependent variable, Y, as X increases violates an important assumption of regression analysis.

True

SSW or Sum of Squares "within" populations being tested is also referred to as the "error" sums of squares.

True

T/F Hey! I've seen this before. Apparently, for the two-variable model, the percentage "explained sum of squares," or R2, is the same value as the correlation coefficient squared.

True

T/F Hey! The product of Beta(Y|X) and Beta(X|Y) is the correlation coefficient for X,Y squared. This makes sense because correlation is not causality. |Correlation| can be seen as a geometric mean, the 2nd root of the product of two numbers.

True

T/F Sb1 is a measure of the variation in B1 in repeated trials, and the simulation values are close to the theoretical standard deviation of B1.

True

T/F Se is known as the standard error of the regression, and we see that this value is close to the expected population error term value of 16 used in this simulation.

True

T/F The average B1 Hat is about equal to the known true slope term, and thus B1 Hat appears to have an expected value equal to the true population value.

True

T/F. The residual represents the discrepancy between the observed dependent variable and its predicted or estimated value is known as the error term.

True

T/F: A high value of F significantly above the critical value of F in multiple regression accompanied by insignificant t-values on all parameter estimates very often indicates an multicollinearity of residuals problem.

True

T/F: A multiple regression is called "multiple" because it has several explanatory variables.

True

T/F: A one-way analysis design with 4 groups would have 6 possible pairwise sample mean comparisons.

True

T/F: A regression had the following results: SST = 102.55, SSR = 82.04. It can be said that 80.0% of the variation in the dependent variable is explained by the independent variables in the regression.

True

T/F: Collinearity is present when there is a high degree of correlation between independent variables.

True

T/F: Data that exhibit an autocorrelation effect violate the regression assumption of independence.

True

T/F: From the coefficient of multiple determination, we cannot detect the strength of the relationship between Y and any individual independent variable.

True

T/F: If the residuals in a regression analysis of time ordered data are not correlated, the value of the Durbin-Watson D statistic should be near 2.0

True

T/F: If we have taken into account all relevant explanatory factors, the residuals from a multiple regression should be random.

True

T/F: In a one-factor ANOVA analysis, the between sum of squares and within sum of squares must add up to the total sum of squares.

True

T/F: In a two-factor ANOVA analysis, the sum of squares due to both factors, the interaction sum of squares and the within sum of squares must add up to the total sum of squares.

True

T/F: In calculating the standard error of the estimate, there are n - k - 1 degrees of freedom, where n is the sample size and k represents the number of independent variables in the model.

True

T/F: One of the consequences of collinearity in multiple regression is inflated standard errors in some or all of the estimated slope coefficients.

True

T/F: Regression analysis is used for prediction, while correlation analysis is used to measure the strength of the association between two numerical variables.

True

T/F: The F test in a completely randomized model is just an expansion of the t test for equality of means in independent samples.

True

T/F: The Regression Sum of Squares (SSR) can never be greater than the Total Sum of Squares (SST).

True

T/F: The coefficient of determination represents the ratio of SSR to SST.

True

T/F: The coefficient of multiple determination R2 measures the proportion of variation in Y that is explained by X1 and X2.

True

T/F: The coefficient of multiple determination measures the fraction of the total variation in the dependent variable that is explained by the set of independent variables.

True

T/F: The goals of model building are to find a good model with the fewest independent variables that is easier to interpret and has lower probability of collinearity.

True

T/F: The standard error of the estimate, the measure of "scatter" of Y values about the regression line is the square-root of MSE.

True

T/F: When an additional explanatory variable is introduced into a multiple regression model, the coefficient of multiple determination will never decrease.

True

T/F: When an explanatory variable is dropped from a multiple regression model, the adjusted r2 can increase.

True

T/F: When the F test is used for ANOVA, the rejection region is always in the right tail.

True

T/F: You have just run a regression in which the value of coefficient of multiple determination is 0.57. To determine if this indicates that the independent variables explain a significant portion of the variation in the dependent variable, you would perform an F-test.

True

The F test is a test for equality of variances and commonly has the larger of the two variance estimates in the numerator.

True

The MSW must always be positive.

True

The Regression Sum of Squares (SSR) can never be greater than the Total Sum of Squares (SST), but when SSR=SST, r2 = 1.0.

True

The Tukey-Kramer test to identify which pairwise set of means differ statistically is a more "powerful" test (smaller confidence intervals under the null) than the Scheffe test.

True

The coefficient of determination (r2) tells us the proportion of total variation that is explained.

True

The residuals or errors represent the difference between the actual Y values and the predicted Y values.

True

The slope (b1) is the first-derivative of the linear regression equation.

True

The slope term of the regression Y on X times the slope term of the regression X on Y is equal to the correlation coefficient of X and Y squared.

True

To test for "interaction" effects in the two-factor model, it is necessary to have multiple observations on each of the two-factor combinations.

True

Two degrees of freedom are lost in simple two-variable regression analysis in computation of error variance because we have to estimate both the intercept and slope-term values.

True

Under the assumption of equality of means, the null hypothesis, MSB, MSW, and MST all provide estimates of the common variance.

True

When the F test is used for ANOVA, the rejection region is always in the right tail.

True

When we say that a simple linear regression model is "statistically" useful we mean that the model is a better predictor of Y than the sample mean .

True

Which of the following statements regarding probability distributions is true?

When n is large and p is small, Poisson probabilities are very close to the Binomial. The Normal distribution is the limit of the Binomial as n approaches infinity. The Hypergeometric approaches the Binomial as N, the population, gets large. The Exponential can be described as the time between Poisson events.

Suppose we want to test H0 : μ ≥ 30 versus H1 : μ < 30. Which of the following possible sample results based on a sample of size 36 gives the strongest evidence to reject H0 in favor of H1?

Xbar= 27, S = 4

Interaction in an experimental design can be tested in

a two-factor model.

Sampling distributions describe the distribution of

estimators of parameters.

An electronic component in the thousands of field weather stations has a high rate of failure in a year of use of 0.02 probability. Thus, the component is "backed up" by the same component, switching on automatically when the first component fails. Thus, assuming independence, the probability of failure of the entire system is:

four in ten thousand.

A project is expected to have a net gain of $15 million if successful and -$5 million if unsuccessful. This project:

has a mean value of $5 million only if the probabilities are 50/50.

The t distribution

has more area in the tails than does the normal distribution. has degrees of freedom equal to n-1 for confidence intervals for the population mean. approaches the normal distribution as the sample size increases. is required when computing confidence intervals when the standard deviation must be estimated from sample data.

If the plot of the residuals is fan shaped, which assumption is violated?

homoscedasticity

Based on the residual plot to the right, you will conclude that there might be a violation of which of the following assumptions?

homoscedasticity (equal variance)

For sample size 64, the sampling distribution of the mean will be approximately normally distributed

if the shape of the population is symmetrical. if the population is normally distributed. regardless of the shape of the population's distribution.

If the Durbin-Watson statistic has a value close to 0, which assumption is violated?

independence of errors

The Z transformation, (X - μ)/σ:

is a linear transformation of the X variable shifts the mean of Z to be zero has a variance and standard deviation equal to 1.

The standard error of the mean

is never larger than the standard deviation of the population. measures the variability of the sample mean from sample to sample. decreases as the sample size increases.

Whenever p = 0.8 and n is small, the binomial distribution will be

left-skewed.

An example for a hypothesis tested using ANOVA is:

m1= m2= m3= m4

Our experiments with the sampling distribution of a sample proportion showed that the

mean is equal to the population proportion, π. variance of the sample proportion is equal to π(1- π)/n it takes a larger n for the sampling distribution to approach normality especially when the true population proportion is either very low or very high. the distribution of the difference in sample proportions from two independent populations also approaches normality with increasing sample size.

The coefficient of multiple determination R2Y|X1,X2

measures the proportion of variation in Y that is explained by X1 and X2.

If a group of independent variables are not significant individually but are significant as a group at a specified level of significance, this is most likely due to

multicollinearity.

In a simple linear regression problem, r and b1

must have the same sign.

Which of the following variables is likely to be negatively skewed?

number of stop lights hit in travel on a poorly timed city road system.

In a one-way ANOVA, if the computed F statistic exceeds the critical F value we may

reject H0 since there is evidence of a treatment effect.

The process of using sample statistics to draw conclusions about true population parameters is called

statistical inference.

Testing for the existence of correlation in simple two variable regression is equivalent to

testing for the existence of the slope (β1).

A regression diagnostic tool used to study the possible effects of collinearity is

the VIF.

The residuals represent

the difference between the actual Y values and the predicted Y values.

The slope (b1) represents

the estimated average change in Y per unit change in X.

In a multiple regression problem involving two independent variables, if b1 is computed to be +2.0, it means that

the estimated mean of Y increases by 2 units for each increase of 1 unit of X1, holding X2 constant.

The Variance Inflationary Factor (VIF) measures the

the extent that the standard error of a given variable is inflated because of linear relationships to other explanatory variabels

The larger the spread or dispersion of data points around the mean:

the larger the coefficient of variation.

The coefficient of determination (r2) tells us

the proportion of total variation that is explained.

Assuming a linear relationship between X and Y, if the coefficient of correlation (r) equals -0.30,

the slope (b1) is negative.

In performing a regression analysis involving two numerical variables, we are assuming

the variation around the line of regression is the same for each X value.

The standard error of the estimate is a measure of

the variation around the sample regression line.

If the correlation coefficient (r) = 1.00, then

there is no unexplained variation.

A college entrance exam has a mean of 25 points and standard deviation of 3 points. In a large sample of persons completing the exam, 90% scored between 16 and 34 points. Relative to the normal distribution,

this distribution would be considered comparatively flat or platykuric.

The logarithm transformation can be used

to overcome violations to the homoscedasticity assumption.

Why would you use the Tukey-Kramer procedure?

to test for differences in pairwise means

Which of the following will NOT change a nonlinear model into a linear model?

variance inflationary factor


Related study sets

Chapter 7 Thinking Intelligence and Language

View Set

Learning Catalytic's Ecology Section, Chapter 38, Exam 3 Multiple Choice, Chapter 54 Community Ecology, Chapter 37, Bio HW 6 (Ch. 38), Ch 54, CH37-COMMUNITY AND ECOSYSTEM ECOLOGY, Chapter 37, 1041SCG Biological Systems Week 12, Ecology CH. 12 Book On...

View Set

Finance Chapter 17: Capital Strucutre Determination

View Set

Drugs & Behavior - Test 3 - Chapter 5 & 6

View Set