Study guide IDS 371 multiple choice
True or false? If we desire α = 0.10, then a p-value of 0.15 would lead us to reject the null hypothesis.
False
Which of the following is a characteristic of the t distribution?
It has a mean of zero. It a symmetric distribution. It is similar to the z distribution when n is large
Which is not correct regarding the estimated slope of the OLS regression line? It is divided by its standard error to obtain its t statistic. It may be regarded as zero if its p-value is less than α. It is chosen so as to minimize the sum of squared errors. It shows the change in Y for a unit change in X.
It may be regarded as zero if its p-value is less than α.
Which one of the following R code creates an exponential trend model? a. tslm(jetblue.ts~trend)) b. tslm(log(jetblue.ts)~trend)) c. EMA(jetblue.ts,n=1,ratio=0.1) d. SMA(jetblue.ts,n=2)
MSA(jet blue.ts,n=2)
The F statistic and its p-value give a global test of significance for a multiple regression. a) True b) False
True
After testing a hypothesis regarding the mean, we decided not to reject H0. Thus, we are exposed to -Type I error -Type II error
Type II error.
The Central Limit Theorem implies that
the distribution of the mean is approximately normal for large n.
Which is always in the center of the confidence interval?
the sample mean
In multiple regression analysis, a dummy (binary) variable is An additional quantitative variable A variable with three or more values A variable with only two values A new regression coefficient
a variable with only two values
A slow drift of measurements either up or down from the process centerline suggests a(n) instability trend cycle
trend
In correlation analysis, neither X nor Y is designated as the independent variable. a) True b) False
true
Which is not an assumption of ANOVA? Equal population sizes for groups Homogeneous treatment variances Independent sample observations Normality of the treatment populations
equal population sizes for groups
Analysis of variance is a technique used to test for equality of two or more variances equality of two or more means equality of more than two variances equality of a population mean and a given value
equality of two or more means
Which model assumes a constant percentage rate of growth? Linear Cubic Exponential Quadratic
exponential
The correlation coefficient r always has the same sign as b1 in Y = b0 + b1X. a) True b) False
true
True or false A two-tailed hypothesis test for H0: μ = 20 at α = .05 is analogous to asking if a 95 percent confidence interval for μ contains 20:
true
true or false An in-control process will always exhibit some common cause variation.
true
The ordinary least squares (OLS) method of estimation will minimize.. both the slope and the intercept. only the intercept neither the slope nor the intercept only the slope
neither the slope nor the intercept
. Which is not an assumption of least squares regression? a) Normal X values b) Nonautocorrelated errors c) Homoscedastic errors d) Normal errors
normal x values
Based on the following regression ANOVA table, what is the R2? Source df SS MS F Regression 4 1793.2356 448.3089 7.4854 Residual 45 2695.0996 59.8911 Total 49 4488.3352
.3995
Degrees of freedom for the between-group variation in a one-factor ANOVA with n1=4, n2=3, n3=7 would be
2
Find the Cp index for a process with USL = 550, LSL = 540, μ = 545, and σ = 0.75
2.22
3 randomly chosen pieces of 4 types of PVC pipe of equal wall thickness are tested to determine the burst strength (in pounds per square inch) under 3 temperature conditions. Total degrees of freedom for the ANOVA would be
35
Based on the following regression ANOVA table, how many predictors were used in the regression? Source df SS MS F Regression 4 1793.2356 448.3089 7.4854 Residual 45 2695.0996 59.8911 Total 49 4488.3352
4
In a simple bivariate regression with 50 observations, there will be ________ residuals.
50
In a regression with 7 predictors and 62 observations, degrees of freedom for a t test for each coefficient would use how many degrees of freedom?
54
true or false As n increases, the standard error increases
false
Which of the following is most likely the cause of an oscillation in a SPC chart? Tool wear Temperature fluctuations Alternating samples from two machines A new worker
Alternating samples from two machines
In multiple regression analysis, testing the global null hypothesis that all regression coefficients are zero is based on A t statistic A z statistic An F statistic
An F statistic
. A high variance inflation factor (VIF) indicates a significant predictor in the regression. a) True b) False
False
A binary (categorical) predictor should not be used along with nonbinary (numerical) predictors. a) True b) False
False
Which of the following is NOT a valid null hypothesis?
H0: μ ≠ 0
A sample of 17 ATM transactions shows a mean transaction time of 65 seconds with a standard deviation of 10 seconds. State the hypotheses to test whether the mean transaction time exceeds 60 seconds.
H0: μ ≤ 60, H1: μ > 60
Which is not true of the coefficient of determination? It is calculated using sums of squares (e.g., SSR, SSE, SST). It is the square of the coefficient of correlation. It is negative when there is an inverse relationship between X and Y. It reports the percentage of the variation in Y explained by X.
It is negative when there is an inverse relationship between X and Y
Which of the following is not true of the standard error of the regression? It is a measure of the accuracy of the prediction. It is based on squared vertical deviations between the actual and predicted values of Y. It would be negative when there is an inverse relationship in the model. It is used in constructing confidence and prediction intervals for Y.
It would be negative when there is an inverse relationship in the model.
If the specification subgroup size is n = 4 and the known process parameters are μ = 2.75 and σ = 0.044, which are the control limits for the x-bar chart?
LCL = 2.684, UCL = 2.816
A researcher is studying the effect of 10 different variables on a critical measure of business performance. A multiple regression analysis including all 10 variables is performed. What criterion could be used to eliminate 1 of the 10 variables? Multiple R2 Largest p-value Smallest p-value Smallest regression coefficient
Largest P value
What statement is most nearly correct, other things being equal?
Quadrupling the sample size will cut the standard error in half
The trend model yt = a + bt + ct2 is fitted to a time series, we would get R2 that could be either higher or lower than the linear model no R2 for this type of model because it is nonlinear R2 that is at least as high as the linear model R2 that could be lower than the linear model
R^2 that is at least as high as the linear model
Which is correct to find the value of the coefficient of determination (R2)? SSR/SST 1 - SST/SSE SSR/SSE none of the above
SSR/SST
Which of the following is not true of seasonal dummy variables? Their number should be one less than the number of seasons. They are used to describe the seasonality in quarterly, monthly, or other increment time series. Their number should equal the number of seasons.
Their number should equal the number of seasons.
. The effect of a binary predictor is to shift the regression intercept. a) True b) False
True
A widening pattern of residuals as X increases would suggest heteroscedasticity. a) True b) False
True
. If R2 = .36 in the model Sales = 268 + 7.37 Ads with n = 50, the two-tailed test for correlation at α = .05 would say that there is a significant correlation between Sales and Ads a) True b) False
True ( Given R2 = 0.36, r = 0.6. tcalc = r[(n − 2)/(1 − r2 )]1/2 = (.60)[(50 − 2)/(1 − .36)]1/2 = 5.196 > t.025 = 2.011 for d.f. = 50 − 2 = 48.)
Concerning confidence intervals, which statement is most nearly correct?
We use the Student's t distribution when σ is unknown.
When does multicollinearity occur in a multiple regression analysis? When the dependent variables are highly correlated When the regression coefficients are correlated When the independent variables are highly correlated When the independent variables have no correlation
When the independent variables are highly correlated
Multiple regression analysis is applied when analyzing the relationship between An independent variable and several dependent variables A dependent variable and several independent variables Several dependent variables and several independent variables Several regression equations and a single sample
a dependent variable and several independent variables
A time series is
a set of sequential observations of a variable over time
The Central Limit Theorem (CLT)
applies to any population
Given "Price" is response and "Chicago" and "Sqft" are predictors. Which one of the following notation makes sure the interaction term is part of the regression model in R? a) Chicago+Sqft b) Chicago:Sqft c) Chicago*Sqft d) b) and c) e) All of the above
b) Chicago:Sqft c) Chicago*Sqft
Given H0: μ ≥ 18 and H1: μ < 18, we would commit a Type I error if we
conclude that μ < 18 when the truth is that μ ≥ 18.
A fitted multiple regression equation is Y = 12 + 3X1 − 5X2 + 7X3 + 2X4. When X1 increases 2 units and X2 increases 2 units as well, while X3 and X4 remain unchanged, what change would you expect in your estimate of Y?
decrease by 4
For a given sample size, when we increase the probability of a Type I error, the probability of a Type II error
decreases
In regression, the variable predicted is called the regression variable independent variable dependent variable predictor
dependent variable
. One-factor analysis of variance a. requires that the number of observations in each group be identical. b. has less power when the number of observations per group is not identical. c. is extremely sensitive to slight departures from normality.
has less power when the number of observations per group is not identical
Which statement is incorrect Binary predictors shift the intercept of the fitted regression. If a qualitative variable has c categories, we would use only c − 1 binaries as predictors. If there is a binary predictor in the model, the residuals may not sum to zero. A binary predictor has the same t test as any other predictor
if there is a binary predictor in the model the residuals may not sum to zero
Simple tests for nonlinearity in a regression model can be performed by deleting predictors one at a time squaring the standard error including squared predictors none of the above
including squared predictors
Mary used a sample of 68 large U.S. cities to estimate the relationship between Crime (annual property crimes per 100,000 persons) and Income (median annual income per capita, in dollars). Her estimated regression equation was Crime = 428 - 0.050 Income. If Income decreases by 1000, we would expect that Crime will remain unchanged. decrease by 50. increase by 50. increase by 428.
increase by 50
The time-series model Y = T * C * S * I is an additive model. is an exponential model. is a multiplicative model. is a polynomial model.
is a multiplicative model
The rejection region in a hypothesis test
is an area in the tail(s) of a sampling distribution
One a particular morning the length of time spent in the examination rooms is recorded for each patient seen by each physician at an orthopedic clinic. The data is recorded for four physicians. Suppose the physician names are stored in "Physician" column and times are recorded in "Time" column. The data frame is attached to the R search path. What does tapply(Time,Physician,mean) do in R-Studio? a. It runs a one factor ANOVA test. b. It computes the overall average time for all physicians. c. It computes the average time for each physician.
it computes the average time for each physician
What does the following code do in R? glm(y~x,family=binomial(link="logit")) a) It creates a simple linear regression model. b) It creates a logistic regression model. c) It plots the fitted line for a logistic regression. d) It plots the fitted line for a simple linear regression model.
it creates a logistic regression model
Which is a characteristic of the variance inflation factor (VIF)? It is insignificant unless the corresponding t statistic is significant It indicates the predictor's degree of multicollinearity It measures the degree of significance of each predictor It reveals collinearity rather than multicollinearity
it indicates the predictors degree of multicollinearity
Which statement about α is NOT correct?
it is equal to B
Which is not true of the logistic regression model? a) It is nonlinear. b) It can best be fitted using the maximum likelihood method. c) Its predictions are either 0 or 1. d) It cannot yield predictions greater than 1
its predictions are either 0 or 1
In a simple regression, which would suggest a significant relationship between X and Y? a) Large p-value for the estimated slope b) Large t statistic for the slope c) Large p-value for the F statistic d) Small t statistic for the slope
large t statistic for the slope
. When the dependent variable is binary (0 or 1), we need a) stepwise regression. b) data transformation to improve conditioning. c) best subsets regression. d) logistic regression
logistic regression
Is a process capable if USL = 550, LSL = 540, μ = 542, and σ = 1.25? No, but very close No, clearly not capable Yes, just barely capable Yes, highly capable
no clearly not capable
Which is correct concerning a two-factor without replication ANOVA? a. No interaction effect is estimated. b. The interaction effect would have its own F statistic. c. The interaction would be insignificant unless the main effects were significant.
no interaction effect is estimated
Which is not assumed in ANOVA? a. Observations are independent. b. Populations are normally distributed. c. Variances of all treatment groups are the same. d. Population variances are known.
population variances are known
Which one of the following computes t score for 90% confidence interval for sample size 20 in R? a) qt(0.9,20) b) qt(0.95,20) c) qt(0.95,19) d) qt(0.8,19)
qt(0.95,19)
The width of a confidence interval for μ is not affected by
sample mean
The four components of a time series are which of the following? Cycle, seasonal, irregular, regular Month, cycle, seasonal, irregular Seasonal, cycle, irregular, trend Cycle, season, month, day
seasonal, cycle, irregular, trend
The critical value in a hypothesis test a) is calculated from the sample data. b) depends on the value of the test statistic. c) separates the acceptance and rejection regions. d) has to between 0 and 1.
separates the acceptance and rejection regions
A level shift in a process is indicated when samples shift abruptly to a new mean vary more than expected drift slowly either upward or downward tend to alternate between high and low values
shift abruptly to a new mean
for(i in 1:10000){ + means[i]<-mean(sample(3:6,50,replace=T)) + } What does the above R code do? a) simulates a sampling distribution for population 3:6 for a sample size of 50. b) simulates a sampling distribution for population 3:6 for a sample size of 10,000. c) draws a histogram. d) finds the mean for population 3:6.
simulates a sampling distribution for population 3:6 for a sample size of 50.
Which of the following is NOT a characteristic of the F-test in a simple regression? a) It is a test for overall fit of the model. b) The test statistic can never be negative. c) It requires a table with numerator and denominator degrees of freedom. d) The F-test gives a different p-value than the t-test.
the f-test gives a different p value than the t test
Which of the following would be most useful in checking the normality assumption of the errors in a regression model? a) The t statistics for the coefficients. b) The F-statistic from the ANOVA table. c) The histogram of residuals.
the histogram of residuals
How does the confidence interval change as population standard deviation, σ, increases? The interval gets wider as σ increases. The interval stays the same as σ increases. The interval gets narrower as σ increases.
the interval gets wider as σ increases
When comparing the 90 percent prediction and confidence intervals for a given regression analysis a) the prediction interval is narrower than the confidence interval. b) the prediction interval is wider than the confidence interval. c) there is no difference between the size of the prediction and confidence intervals. d) no generalization is possible about their comparative width
the prediction interval is wider than the confidence interval
The Central Limit Theorem implies that a) the population will be approximately normal if n ≥ 30. b) the mean follows the same distribution as the population. c) the sampling distribution is approximately normal for large n.
the sampling distribution is approximately normal for large n
Statistical process control charts can measure both stability and capability the stability of the process the capability of the process neither stability or capability
the stability of the process
Statistical process control charts can measure a. the stability of the process b. the capability of a process. c. both stability and capability.
the stability of the process
If a trend is given by yt = ae^bt , assuming observations are positive, then the trend is exponentially decreasing the fitted trend value in period 0 is b the trend is exponentially increasing the trend is increasing if b > 0 and decreasing if b < 0
the trend is increasing if b > 0 and decreasing if b < 0
If the residuals from a fitted regression violate the assumption of homoscedasticity, we know that a) they are normally distributed. b) they are independent of one another. c) their variance is not constant. d) there are extreme outliers
their variance is not constant
In a multiple regression, all of the following are true regarding residuals except they are the differences between observed and predicted values of the response variable their sum always equals zero they may be used to detect heteroscedasticity they may be used to detect multicollinearity
they may be used to detect multicollinearity
After testing a hypothesis regarding the mean, we decided to reject H0. Thus, we are exposed to: a) Type I error. b) Type II error. c) either Type I or Type II error. d) neither Type I nor Type II error.
type I error
We could narrow a 90 percent confidence interval by
using a larger sample
The within-treatment variation reflects variation explained by factors included in the ANOVA model variation that is not part of the ANOVA model variation between individuals in different groups variation among individuals of the same group
variation among individuals of the same group
Variation "between" the ANOVA treatments represents a. Random variation. b. Variation explained by treatments. c. The effect of interation. d. The effect of sample size
variation explained by treatments
Simple regression analysis means that we have only one explanatory variable. there are only two independent variables. the data are presented in a simple and clear way. we have only a few observations
we only have one explanatory variable
In an ANOVA, when would the F-test statistic be zero? a. When there is no difference in the variances b. When the treatment means are the same c. When the observations are normally distributed d. The F-test statistic cannot ever be zero.
when the treatment means are the same
Which estimated multiple regression contains an interaction term? y = 47 - 12x12 + 5x22 y = 47 - 12x1 + 8x12 - 5x2 + 25x22 y = 47 - 12x1 + 8x1x2 - 5x2 none of the above
y = 47 - 12x1 + 8x1x2 - 5x2
. Which estimated multiple regression allows a test for nonlinearity? a) y = 47 − 12x1 + 8x1x2 − 5x2 b) y = 47 − 12x1 + 5x2 + 13x3 c) y = 47 − 12x1 + 8x1 2 − 5x2 + 25x2 2
y = 47 − 12x1 + 8x1 2 − 5x2 + 25x2 2