Quant 2 Final Modules 5, 6, 7

Ace your homework & exams now with Quizwiz!

In a regression analysis, if SSE = 200 and SSR = 300, then the coefficient of determination is:

.6.

In a multiple regression model, the error term ε is assumed to have a mean of:

0

The tests of significance in regression analysis are based on assumptions about the error term ɛ. One such assumption is that the error term ɛ is a random variable with a mean or expected value of:

0

Which of the following statements is false?

Regression analysis can be interpreted as a procedure for establishing a cause-and-effect relationship between variables.

R2=

SSR/SST

In multiple regression analysis, any observation with a standardized residual of less than _____ or greater than _____ is known as an outlier.

-2; 2

The mathematical equation relating the expected value of the dependent variable to the value of the independent variables, which has the form of E(y): B0+B1x1+B2x2+...Bpxp , is called:

a multiple regression equation.

The mathematical equation that explains how the dependent variable y is related to several independent variables and has the form y= B0+B1x1+B2x2+....Bpxp is called:

a multiple regression model.

If two large independent random samples are taken from two populations, the sampling distribution of the difference between the two sample means:

can be approximated by a normal distribution.

If the coefficient of determination is a positive value, then the coefficient of correlation:

can be either negative or positive.

The value of the coefficient of correlation (r):

can be equal to the value of the coefficient of determination (r2).

The coefficient of determination:

cannot be negative.

If a significant relationship exists between x and y and the coefficient of determination shows that the fit is good, the estimated regression equation should be useful for:

estimation and prediction.

Observations with extreme values for the independent variables are called:

high leverage points.

The tests of significance in regression analysis are based on assumptions about the error term ɛ. One such assumption is that the error term follows ɛ a(n) _____ distribution for all values of x.

normal

The sampling distribution of p^-1 - p^-2 is approximated by a:

normal distribution.

A graph of the standardized residuals plotted against values of the normal scores that helps to determine whether the assumption that the error term has a normal probability distribution appears to be valid is called a:

normal probability plot.

In a multiple regression model, the values of the error term, ε, are assumed to be:

normally distributed.

When we conduct significance tests for a multiple regression relationship, the F test will be used as the test for:

overall significance.

Graphical representation of the residuals that can be used to determine whether the assumptions made about the regression model appear to be valid is called a:

residual plot.

The difference between the observed value of the dependent variable and the value predicted using the estimated regression equation is called a(n):

residual.

Since the multiple regression equation generates a plane or surface, its graph is called a:

response surface.

Suppose we have a t distribution based upon two sample means with unknown population standard deviations, which we are unwilling to assume are equal. When we calculate the appropriate degrees of freedom, we should:

round the calculated degrees of freedom down to the nearest integer.

When x^- is unknown, which of the following is used to estimate x^- ?

s

An F test, based on the F probability distribution, can be used to test for:

significance in regression.

When constructing a confidence or a prediction interval to quantify the relationship between two quantitative variables, what distribution do confidence and prediction intervals follow?

t distribution

Independent simple random samples are taken to test the difference between the means of two populations whose variances are not known, but are assumed to be equal. The sample sizes are n1 = 32 and n2 = 40. The correct distribution to use is the:

t distribution with 70 degrees of freedom.

We would like to know if the difference in the mean semester hours taken by the two groups of students is statistically significant at the ⍺ = .05 level. What statistical test is appropriate for answering this question?

t test for a difference in two means

In most applications of the interval estimation and hypothesis testing procedures, random samples with n1 ≥ 30 and n2 ≥ 30 are adequate. In cases where either or both sample sizes are less than 30:

the distribution of the populations becomes an important consideration.

If a residual plot of x versus the residuals, y - ŷ, shows a non-linear pattern, then we should conclude that:

the regression model is not an adequate representation of the relationship between the variables.

The tests of significance in regression analysis are based on assumptions about the error term ɛ . One such assumption is that the variance of ɛ, denoted by 𝝈2, is:

the same for all values of x.

In a multiple regression model, the variance of the error term, ε, is assumed to be:

the same for all values of x1, x2,..., xp.

In a regression analysis, an outlier will always increase:

the value of the correlation.

In multiple regression analysis:

there can be several independent variables, but only one dependent variable.

Suppose, after calculating an estimated multiple regression equation, we find that the value of R2 is .9201. Interpret this value.

92.01% the variability in y can be explained by the estimated regression equation.

Which of the following scenarios follows a matched sample design?

A teacher uses a pretest and then a posttest with her students to see how much they have improved.

Which of the following variables is categorical?

Gender

Influential observations always:

None of the above are correct.

Suppose a multiple coefficient of determination coming from a regression analysis with 50 observations and 3 independent variables is .8455. Calculate the adjusted multiple coefficient of determination.

R-Sq(adj) = 83.54%

If we are interested in testing whether the proportion of items in population 1 is larger than the proportion of items in population 2, then the:

alternative hypothesis should state p1-p2>0 .

The multiple regression equation based on the sample data, which has the form of y^=b0+b1x1+b2x2+...bpxp , is called:

an estimated multiple regression equation.

When working with regression analysis, an outlier is:

any observation that does not fit the trend shown by the remaining data.

Suppose a residual plot of x verses the residuals, y - ŷ, shows a nonconstant variance. In particular, as the values of x increase, suppose that the values of the residuals also increase. This means that:

as the values of x get larger, the ability to predict y becomes less accurate.

When studying the relationship between two quantitative variables, an interval estimate of the mean value of y for a given value of x is called a(n):

confidence interval.

When we use the estimated regression equation to develop an interval that can be used to predict the mean for ALL units that meet a particular set of given criteria, that interval is called a(n):

confidence interval.

In regression analysis, the variable that is being predicted is the:

dependent variable.

A variable used to model the effect of categorical independent variables is called a(n):

dummy variable.

The term in the multiple regression model that accounts for the variability in y that cannot be explained by the linear effect of the p independent variables is the:

error term, e

The model developed from sample data that has the form y^: b0+b1x is known as the:

estimated regression equation.

In general, R2 always _____ as independent variables are added to the regression model.

increases

The tests of significance in regression analysis are based on assumptions about the error term ɛ. One such assumption is that the values of ɛ are:

independant

In a multiple regression model, the values of the error term, ε, are assumed to be:

independent of each other.

Regarding inferences about the difference between two population means, the alternative to the matched sample design, as covered in the textbook, is:

independent samples.

When we conduct significance tests for a multiple regression relationship, the t test can be conducted for each of the independent variables in the model. Each of those tests are called tests for:

individual significance.

An observation that has a strong influence or effect on the regression results is called a(n):

influential observation.

If a categorical variable has k levels, then:

k - 1 dummy variables are needed.

Larger values of r2 imply that the observations are more closely grouped about the:

least squares line.

The method used to develop the estimated regression equation that minimizes the sum of squared residuals is called the:

least squares method.

what is an outlier?

less than -2, greater than +2

The tests of significance in regression analysis are based on several assumptions about the error term ɛ. Additionally, we make an assumption about the form of the relationship between x and y. We assume that the relationship between x and y is:

linear.

A researcher recruits 25 people to participate in a study on alcohol consumption and its interactions with Tylenol. The 25 participants had to come to a check-in center every day at 7:00 a.m. for one week. They were given various amounts of alcohol. Each day, each participant would flip a coin to determine if they also took Tylenol with their alcohol. They found that their BAC was 25% higher on days when they were given Tylenol with their alcohol than when they drank alcohol alone. This is an example of a(n):

matched sample design.

A company wants to identify which of the two production methods has the smaller completion time. One sample of workers is selected and each worker first uses one method and then uses the other method. The sampling procedure being used to collect completion time data is based on:

matched samples.

The term used to describe the case when the independent variables in a multiple regression model are correlated is:

multicollinearity.

The proportion of the variability in the dependent variable that can be explained by the estimated multiple regression equation is called the:

multiple coefficient of determination.

The study of how a dependent variable y is related to two or more independent variables is called:

multiple regression analysis.

When constructing a confidence or a prediction interval to quantify the relationship between two quantitative variables, the appropriate degrees of freedom are:

n - 2.

When developing an interval estimate for the difference between two sample means, with sample sizes of n1 and n2 :

n1 and n2 can be of different sizes.

Suppose we are constructing an interval estimate for the difference between the means of two populations when the standard deviations of the two populations are unknown. Suppose it can be assumed that the two populations have equal variances. If n1 is the size of sample 1 and n2 is the size of sample 2, we must use a t distribution with:

n1+n2-2 degrees of freedom.

The sampling distribution of p^-1 - p^-2 is approximated by a normal distribution when:

n1p1, n1(1-p1), n2p2, n2(1-p2) are all greater than or equal to 5.

When completing a two-tailed hypothesis test about the difference between two population means, the

p-value must be doubled.

All things held constant, which interval will be wider: a confidence interval or a prediction interval?

prediction interval

When studying the relationship between two quantitative variables, whenever we want to predict an individual value of y for a new observation corresponding to a given value of x, we should use a(n):

prediction interval.

When we use the estimated regression equation to develop an interval that can be used to predict the mean for a specific unit that meets a particular set of given criteria, that interval is called a(n):

prediction interval.

The mathematical equation relating the independent variable to the expected value of the dependent variable, E(y): B0+B1x , is known as the:

regression equation.

In regression analysis, the equation in the form y = 𝛽0 + 𝛽1x + ε is called the:

regression model.

Dummy variables must always have:

values of either 0 or 1.

The matched sample design often leads to a smaller sampling error than the independent sample design. The primary reason is that in a matched sample design:

variation between subjects is eliminated because the same subjects are used for both treatments.

Regarding hypothesis tests about p^-1 - p^-2, the pooled estimate of P is a:

weighted average of p^-1 and p^-2 .

In a regression analysis, the error term ε is a random variable with a mean or expected value of

zero


Related study sets

Women's Health/Disorders and Childbearing Health Promotion (Level 1)

View Set

Ch. 13 arterial blood collection

View Set

Interpersonal Communication Final

View Set

Lecture 13 LaunchPad Assignment BIO 2170

View Set

NURSING Fundamentals. Chapter 26 & 27.

View Set

Cyber Security Chapter 7-11 flash cards

View Set

Chapter 14: Infection, Infectious Diseases, and Epidemiology

View Set