# Study guide IDS 371 multiple choice

True or false? If we desire α = 0.10, then a p-value of 0.15 would lead us to reject the null hypothesis.

False

Which of the following is a characteristic of the t distribution?

It has a mean of zero. It a symmetric distribution. It is similar to the z distribution when n is large

Which is not correct regarding the estimated slope of the OLS regression line? It is divided by its standard error to obtain its t statistic. It may be regarded as zero if its p-value is less than α. It is chosen so as to minimize the sum of squared errors. It shows the change in Y for a unit change in X.

It may be regarded as zero if its p-value is less than α.

Which one of the following R code creates an exponential trend model? a. tslm(jetblue.ts~trend)) b. tslm(log(jetblue.ts)~trend)) c. EMA(jetblue.ts,n=1,ratio=0.1) d. SMA(jetblue.ts,n=2)

MSA(jet blue.ts,n=2)

The F statistic and its p-value give a global test of significance for a multiple regression. a) True b) False

True

After testing a hypothesis regarding the mean, we decided not to reject H0. Thus, we are exposed to -Type I error -Type II error

Type II error.

The Central Limit Theorem implies that

the distribution of the mean is approximately normal for large n.

Which is always in the center of the confidence interval?

the sample mean

In multiple regression analysis, a dummy (binary) variable is An additional quantitative variable A variable with three or more values A variable with only two values A new regression coefficient

a variable with only two values

A slow drift of measurements either up or down from the process centerline suggests a(n) instability trend cycle

trend

In correlation analysis, neither X nor Y is designated as the independent variable. a) True b) False

true

Which is not an assumption of ANOVA? Equal population sizes for groups Homogeneous treatment variances Independent sample observations Normality of the treatment populations

equal population sizes for groups

Analysis of variance is a technique used to test for equality of two or more variances equality of two or more means equality of more than two variances equality of a population mean and a given value

equality of two or more means

Which model assumes a constant percentage rate of growth? Linear Cubic Exponential Quadratic

exponential

The correlation coefficient r always has the same sign as b1 in Y = b0 + b1X. a) True b) False

true

True or false A two-tailed hypothesis test for H0: μ = 20 at α = .05 is analogous to asking if a 95 percent confidence interval for μ contains 20:

true

true or false An in-control process will always exhibit some common cause variation.

true

The ordinary least squares (OLS) method of estimation will minimize.. both the slope and the intercept. only the intercept neither the slope nor the intercept only the slope

neither the slope nor the intercept

. Which is not an assumption of least squares regression? a) Normal X values b) Nonautocorrelated errors c) Homoscedastic errors d) Normal errors

normal x values

Based on the following regression ANOVA table, what is the R2? Source df SS MS F Regression 4 1793.2356 448.3089 7.4854 Residual 45 2695.0996 59.8911 Total 49 4488.3352

.3995

Degrees of freedom for the between-group variation in a one-factor ANOVA with n1=4, n2=3, n3=7 would be

2

Find the Cp index for a process with USL = 550, LSL = 540, μ = 545, and σ = 0.75

2.22

3 randomly chosen pieces of 4 types of PVC pipe of equal wall thickness are tested to determine the burst strength (in pounds per square inch) under 3 temperature conditions. Total degrees of freedom for the ANOVA would be

35

Based on the following regression ANOVA table, how many predictors were used in the regression? Source df SS MS F Regression 4 1793.2356 448.3089 7.4854 Residual 45 2695.0996 59.8911 Total 49 4488.3352

4

In a simple bivariate regression with 50 observations, there will be ________ residuals.

50

In a regression with 7 predictors and 62 observations, degrees of freedom for a t test for each coefficient would use how many degrees of freedom?

54

true or false As n increases, the standard error increases

false

Which of the following is most likely the cause of an oscillation in a SPC chart? Tool wear Temperature fluctuations Alternating samples from two machines A new worker

Alternating samples from two machines

In multiple regression analysis, testing the global null hypothesis that all regression coefficients are zero is based on A t statistic A z statistic An F statistic

An F statistic

. A high variance inflation factor (VIF) indicates a significant predictor in the regression. a) True b) False

False

A binary (categorical) predictor should not be used along with nonbinary (numerical) predictors. a) True b) False

False

Which of the following is NOT a valid null hypothesis?

H0: μ ≠ 0

A sample of 17 ATM transactions shows a mean transaction time of 65 seconds with a standard deviation of 10 seconds. State the hypotheses to test whether the mean transaction time exceeds 60 seconds.

H0: μ ≤ 60, H1: μ > 60

Which is not true of the coefficient of determination? It is calculated using sums of squares (e.g., SSR, SSE, SST). It is the square of the coefficient of correlation. It is negative when there is an inverse relationship between X and Y. It reports the percentage of the variation in Y explained by X.

It is negative when there is an inverse relationship between X and Y

Which of the following is not true of the standard error of the regression? It is a measure of the accuracy of the prediction. It is based on squared vertical deviations between the actual and predicted values of Y. It would be negative when there is an inverse relationship in the model. It is used in constructing confidence and prediction intervals for Y.

It would be negative when there is an inverse relationship in the model.

If the specification subgroup size is n = 4 and the known process parameters are μ = 2.75 and σ = 0.044, which are the control limits for the x-bar chart?

LCL = 2.684, UCL = 2.816

A researcher is studying the effect of 10 different variables on a critical measure of business performance. A multiple regression analysis including all 10 variables is performed. What criterion could be used to eliminate 1 of the 10 variables? Multiple R2 Largest p-value Smallest p-value Smallest regression coefficient

Largest P value

What statement is most nearly correct, other things being equal?

Quadrupling the sample size will cut the standard error in half

The trend model yt = a + bt + ct2 is fitted to a time series, we would get R2 that could be either higher or lower than the linear model no R2 for this type of model because it is nonlinear R2 that is at least as high as the linear model R2 that could be lower than the linear model

R^2 that is at least as high as the linear model

Which is correct to find the value of the coefficient of determination (R2)? SSR/SST 1 - SST/SSE SSR/SSE none of the above

SSR/SST

Which of the following is not true of seasonal dummy variables? Their number should be one less than the number of seasons. They are used to describe the seasonality in quarterly, monthly, or other increment time series. Their number should equal the number of seasons.

Their number should equal the number of seasons.

. The effect of a binary predictor is to shift the regression intercept. a) True b) False

True

A widening pattern of residuals as X increases would suggest heteroscedasticity. a) True b) False

True

. If R2 = .36 in the model Sales = 268 + 7.37 Ads with n = 50, the two-tailed test for correlation at α = .05 would say that there is a significant correlation between Sales and Ads a) True b) False

True ( Given R2 = 0.36, r = 0.6. tcalc = r[(n − 2)/(1 − r2 )]1/2 = (.60)[(50 − 2)/(1 − .36)]1/2 = 5.196 > t.025 = 2.011 for d.f. = 50 − 2 = 48.)

Concerning confidence intervals, which statement is most nearly correct?

We use the Student's t distribution when σ is unknown.

When does multicollinearity occur in a multiple regression analysis? When the dependent variables are highly correlated When the regression coefficients are correlated When the independent variables are highly correlated When the independent variables have no correlation

When the independent variables are highly correlated

Multiple regression analysis is applied when analyzing the relationship between An independent variable and several dependent variables A dependent variable and several independent variables Several dependent variables and several independent variables Several regression equations and a single sample

a dependent variable and several independent variables

A time series is

a set of sequential observations of a variable over time

The Central Limit Theorem (CLT)

applies to any population

Given "Price" is response and "Chicago" and "Sqft" are predictors. Which one of the following notation makes sure the interaction term is part of the regression model in R? a) Chicago+Sqft b) Chicago:Sqft c) Chicago*Sqft d) b) and c) e) All of the above

b) Chicago:Sqft c) Chicago*Sqft

Given H0: μ ≥ 18 and H1: μ < 18, we would commit a Type I error if we

conclude that μ < 18 when the truth is that μ ≥ 18.

A fitted multiple regression equation is Y = 12 + 3X1 − 5X2 + 7X3 + 2X4. When X1 increases 2 units and X2 increases 2 units as well, while X3 and X4 remain unchanged, what change would you expect in your estimate of Y?

decrease by 4

For a given sample size, when we increase the probability of a Type I error, the probability of a Type II error

decreases

In regression, the variable predicted is called the regression variable independent variable dependent variable predictor

dependent variable

. One-factor analysis of variance a. requires that the number of observations in each group be identical. b. has less power when the number of observations per group is not identical. c. is extremely sensitive to slight departures from normality.

has less power when the number of observations per group is not identical

Which statement is incorrect Binary predictors shift the intercept of the fitted regression. If a qualitative variable has c categories, we would use only c − 1 binaries as predictors. If there is a binary predictor in the model, the residuals may not sum to zero. A binary predictor has the same t test as any other predictor

if there is a binary predictor in the model the residuals may not sum to zero

Simple tests for nonlinearity in a regression model can be performed by deleting predictors one at a time squaring the standard error including squared predictors none of the above

including squared predictors

Mary used a sample of 68 large U.S. cities to estimate the relationship between Crime (annual property crimes per 100,000 persons) and Income (median annual income per capita, in dollars). Her estimated regression equation was Crime = 428 - 0.050 Income. If Income decreases by 1000, we would expect that Crime will remain unchanged. decrease by 50. increase by 50. increase by 428.

increase by 50

The time-series model Y = T * C * S * I is an additive model. is an exponential model. is a multiplicative model. is a polynomial model.

is a multiplicative model

The rejection region in a hypothesis test

is an area in the tail(s) of a sampling distribution

One a particular morning the length of time spent in the examination rooms is recorded for each patient seen by each physician at an orthopedic clinic. The data is recorded for four physicians. Suppose the physician names are stored in "Physician" column and times are recorded in "Time" column. The data frame is attached to the R search path. What does tapply(Time,Physician,mean) do in R-Studio? a. It runs a one factor ANOVA test. b. It computes the overall average time for all physicians. c. It computes the average time for each physician.

it computes the average time for each physician

What does the following code do in R? glm(y~x,family=binomial(link="logit")) a) It creates a simple linear regression model. b) It creates a logistic regression model. c) It plots the fitted line for a logistic regression. d) It plots the fitted line for a simple linear regression model.

it creates a logistic regression model

Which is a characteristic of the variance inflation factor (VIF)? It is insignificant unless the corresponding t statistic is significant It indicates the predictor's degree of multicollinearity It measures the degree of significance of each predictor It reveals collinearity rather than multicollinearity

it indicates the predictors degree of multicollinearity

Which statement about α is NOT correct?

it is equal to B

Which is not true of the logistic regression model? a) It is nonlinear. b) It can best be fitted using the maximum likelihood method. c) Its predictions are either 0 or 1. d) It cannot yield predictions greater than 1

its predictions are either 0 or 1

In a simple regression, which would suggest a significant relationship between X and Y? a) Large p-value for the estimated slope b) Large t statistic for the slope c) Large p-value for the F statistic d) Small t statistic for the slope

large t statistic for the slope

. When the dependent variable is binary (0 or 1), we need a) stepwise regression. b) data transformation to improve conditioning. c) best subsets regression. d) logistic regression

logistic regression

Is a process capable if USL = 550, LSL = 540, μ = 542, and σ = 1.25? No, but very close No, clearly not capable Yes, just barely capable Yes, highly capable

no clearly not capable

Which is correct concerning a two-factor without replication ANOVA? a. No interaction effect is estimated. b. The interaction effect would have its own F statistic. c. The interaction would be insignificant unless the main effects were significant.

no interaction effect is estimated

Which is not assumed in ANOVA? a. Observations are independent. b. Populations are normally distributed. c. Variances of all treatment groups are the same. d. Population variances are known.

population variances are known

Which one of the following computes t score for 90% confidence interval for sample size 20 in R? a) qt(0.9,20) b) qt(0.95,20) c) qt(0.95,19) d) qt(0.8,19)

qt(0.95,19)

The width of a confidence interval for μ is not affected by

sample mean

The four components of a time series are which of the following? Cycle, seasonal, irregular, regular Month, cycle, seasonal, irregular Seasonal, cycle, irregular, trend Cycle, season, month, day

seasonal, cycle, irregular, trend

The critical value in a hypothesis test a) is calculated from the sample data. b) depends on the value of the test statistic. c) separates the acceptance and rejection regions. d) has to between 0 and 1.

separates the acceptance and rejection regions

A level shift in a process is indicated when samples shift abruptly to a new mean vary more than expected drift slowly either upward or downward tend to alternate between high and low values

shift abruptly to a new mean

for(i in 1:10000){ + means[i]<-mean(sample(3:6,50,replace=T)) + } What does the above R code do? a) simulates a sampling distribution for population 3:6 for a sample size of 50. b) simulates a sampling distribution for population 3:6 for a sample size of 10,000. c) draws a histogram. d) finds the mean for population 3:6.

simulates a sampling distribution for population 3:6 for a sample size of 50.

Which of the following is NOT a characteristic of the F-test in a simple regression? a) It is a test for overall fit of the model. b) The test statistic can never be negative. c) It requires a table with numerator and denominator degrees of freedom. d) The F-test gives a different p-value than the t-test.

the f-test gives a different p value than the t test

Which of the following would be most useful in checking the normality assumption of the errors in a regression model? a) The t statistics for the coefficients. b) The F-statistic from the ANOVA table. c) The histogram of residuals.

the histogram of residuals

How does the confidence interval change as population standard deviation, σ, increases? The interval gets wider as σ increases. The interval stays the same as σ increases. The interval gets narrower as σ increases.

the interval gets wider as σ increases

When comparing the 90 percent prediction and confidence intervals for a given regression analysis a) the prediction interval is narrower than the confidence interval. b) the prediction interval is wider than the confidence interval. c) there is no difference between the size of the prediction and confidence intervals. d) no generalization is possible about their comparative width

the prediction interval is wider than the confidence interval

The Central Limit Theorem implies that a) the population will be approximately normal if n ≥ 30. b) the mean follows the same distribution as the population. c) the sampling distribution is approximately normal for large n.

the sampling distribution is approximately normal for large n

Statistical process control charts can measure both stability and capability the stability of the process the capability of the process neither stability or capability

the stability of the process

Statistical process control charts can measure a. the stability of the process b. the capability of a process. c. both stability and capability.

the stability of the process

If a trend is given by yt = ae^bt , assuming observations are positive, then the trend is exponentially decreasing the fitted trend value in period 0 is b the trend is exponentially increasing the trend is increasing if b > 0 and decreasing if b < 0

the trend is increasing if b > 0 and decreasing if b < 0

If the residuals from a fitted regression violate the assumption of homoscedasticity, we know that a) they are normally distributed. b) they are independent of one another. c) their variance is not constant. d) there are extreme outliers

their variance is not constant

In a multiple regression, all of the following are true regarding residuals except they are the differences between observed and predicted values of the response variable their sum always equals zero they may be used to detect heteroscedasticity they may be used to detect multicollinearity

they may be used to detect multicollinearity

After testing a hypothesis regarding the mean, we decided to reject H0. Thus, we are exposed to: a) Type I error. b) Type II error. c) either Type I or Type II error. d) neither Type I nor Type II error.

type I error

We could narrow a 90 percent confidence interval by

using a larger sample

The within-treatment variation reflects variation explained by factors included in the ANOVA model variation that is not part of the ANOVA model variation between individuals in different groups variation among individuals of the same group

variation among individuals of the same group

Variation "between" the ANOVA treatments represents a. Random variation. b. Variation explained by treatments. c. The effect of interation. d. The effect of sample size

variation explained by treatments

Simple regression analysis means that we have only one explanatory variable. there are only two independent variables. the data are presented in a simple and clear way. we have only a few observations

we only have one explanatory variable

In an ANOVA, when would the F-test statistic be zero? a. When there is no difference in the variances b. When the treatment means are the same c. When the observations are normally distributed d. The F-test statistic cannot ever be zero.

when the treatment means are the same

Which estimated multiple regression contains an interaction term? y = 47 - 12x12 + 5x22 y = 47 - 12x1 + 8x12 - 5x2 + 25x22 y = 47 - 12x1 + 8x1x2 - 5x2 none of the above

y = 47 - 12x1 + 8x1x2 - 5x2

. Which estimated multiple regression allows a test for nonlinearity? a) y = 47 − 12x1 + 8x1x2 − 5x2 b) y = 47 − 12x1 + 5x2 + 13x3 c) y = 47 − 12x1 + 8x1 2 − 5x2 + 25x2 2

y = 47 − 12x1 + 8x1 2 − 5x2 + 25x2 2