DS101 Final answers

¡Supera tus tareas y exámenes ahora con Quizwiz!

The manager of a warehouse monitors the volume of shipments made by the delivery team. The automated tracking system tracks every package as it moves through the facility. A sample of 25 packages is selected and weighed every day. On the basis of contracts with​ customers, the mean weight should be μ=22 pounds with σ=5 pounds. The standard error of the daily average is ​SE(Xbar​)=1

True

The number of rows in a data table is indicated by the symbol n.

True

The empirical rule indicates that the range from ybar−s up to ybar+s holds​ two-thirds of the distribution of any numerical variable.

The statement is false because the empirical rule works well only when the distribution of the numerical variables is unimodal and symmetric.

If a variable X is associated with a variable​ Y, then Y is caused by X.

The statement is false. Association does not imply causation.

The 50th percentile is equivalent to Q1

The statement is false. The 50th percentile is equivalent to Q2

A large VIF (e.g., 10 or more) would indicate multicollinearity.

true

A line with positive slope describes a linear pattern with a positive direction.

true

A partial F test is used to assess when at least one variable in a subset of squared and interaction variables in the multiple regression model is significant.

true

A scatterplot graphically shows the relationship between two variables.

true

Adjusted R squared is less than regular R squared

true

By increasing the sample sizes from n=100 to n=​400, we can reduce the margin of error by​ 50%.

true

Half of the numbers in a correlation matrix are​ redundant, including the diagonal.

true

If all of the data lie along a single line with nonzero​ slope, then the rsquared of the regression is 1.​ (Assume the values of the explanatory variable are not​ identical.)

true

If the correlation between the explanatory variable and response is​ zero, then the slope will also be zero.

true

If zero lies inside the​ 95% confidence interval for μ​, then zero is also inside the​ 99% confidence interval for μ.

true

Prediction intervals get wider as you extrapolate outside the range of the data.

true

Regression predictions become less reliable as we extrapolate farther from the observed data.

true

Opposite of log transformation

exp (x)

The correlation between sales and advertising when both are measured in millions of dollars is 0.65. The correlation remains the same if these variables are converted into thousands of dollars.

true

The correlation coefficient is unit-free.

true

The estimated value ycarrot=B0+B1x approximates the average value of the response when the explanatory variable equals x.

true

The explanatory variable defines the​ x-axis in a scatterplot.

true

The normal probability plot is a residual plot that checks the normality assumption.

true

The primary use of stepwise regression is to identify the most important ________ that should be included in the multiple regression model.

independent variables

retailer maintains a Web site that it uses to attract shoppers. The average purchase amount is​ $80. The retailer is evaluating a new Web site that​ would, it​ hopes, encourage shoppers to spend more. Let μ represent the average amount spent per customer at its redesigned Web site. If the α​-level of the test is α=​0.05, then there is at most a​ 5% chance of incorrectly rejecting H0.

true

Shown in the bar chart of a categorical variable

marginal distribution

Largest α​-level for which a test rejects the null hypothesis

p-value

Indicates a statistically significant result

p-value < alpha

Measure of association that lies between 0 and 1

​Cramer's V

To estimate the value of​ p, the population proportion of​ successes, use the point estimate x.

​False, to estimate the value of​ p, use the point estimate pcarrot=x/n

The value of Rsqr is 1 if data lie along a single line. Is it possible to fit a linear regression for which Rsqr is exactly equal to​ zero?

​Yes, it is possible to fit a linear​ regression, but the​ slope, b1​, will be zero.

Measure of association between two categorical variables that grows with increased sample size

​chi-squared

Occurs if the​ p-value is larger than α when H0 is false

Type 2 Error

The Web site of a photo processor allows customers to send digital files of pictures to be printed on​ high-quality paper with durable inks. When the inks used by the processor start to run​ out, the color mix in the pictures gradually degrades. Complete parts​ (a) and​ (b) below. ​(a) How could the facility use a quality control process to identify when it needed to switch ink​ cartridges? Will it be necessary to check every photo or only a sample of​ photos? ​(b) If the facility samples​ photos, would it do better to group the photos into a large batch and then calculate a mean and standard deviation​ (for the measured color​ mix) or only wait a little while before calculating the mean and​ SD?

(a) Inspect a sample of photos using a system that can assess the color mix. It is not necessary to check every photo as a sample is enough. (b) If the problem is easily​ detected, only a few photos need to be batched. If the problem is​ subtle, the facility needs a large batch.

Consider each situation given in parts​ (a) through​ (d). Identify the population and the​ sample, explain what the parameter p or μ ​represents, and tell whether a confidence interval can be created. If ​so, indicate what the interval would say about the parameter. (a) Identify the parameter. Choose the correct answer below. (b) Determine whether the confidence interval can be calculated. Select all that apply. (c)If a confidence interval can be​ created, indicate what the interval would say about the parameter. Choose the correct answer below.

(a) The parameter is​ p, the proportion of all people who recently bought new kitchen appliances that expressed dissatisfaction with the salesperson. (b)The confidence interval can be calculated because all assumptions and conditions have either been satisfied or it is reasonable to assume that they have been satisfied. (c) The interval would give a range of plausible values in which the parameter is likely to lie.

Which of the following processes would you expect to be under​ control, and which would you expect not to be under​ control? Explain briefly why or why not. ​(a) Daily sales at each checkout line in a supermarket ​(b) Number of weekday calls to a telephone help line ​(c) Monthly volume of shipments of video game software ​(d) Dollar value of profits of a new startup company

(a) This process would likely be under​ control, unless​ there's a special sale or weekend shopping surge. (b) This process would likely be under​ control, unless some problem was discovered in their​ product, causing a surge in calls. (c) This process would likely be out of​ control, due to surges during the holiday season. (d) This process would likely be out of​ control, as sales typically have strong upward​ (or downward) trends for a startup company.

The correlation coefficient may assume any value between

-1 and +1

Dummy variables take on the values of ________ and are used to model the effects of different levels of qualitative variables.

0 or 1

What does the coefficient of determination equal if r = 0.89?

0.7921

What is a control​ chart? Describe its use.

A control chart is a graphical device used for monitoring process​ variation, identifying when to take action to improve the​ process, and assisting in diagnosing the causes of process variation.

A sales manager for an advertising agency believes that there is a relationship between the number of contacts that a salesperson makes and the amount of sales dollars earned. What is the dependent variable?

Amount of sales dollars

Identifies the intercept in a fitted line

B0

Identifies the slope in a fitted line

B1

The SRM assumes that the model errors have this property. A. Heteroscedasticity B. Scatterplot of Y on X C. Homoscedasticity D. Random sample from a population E. Outlier F. ​Durbin-Watson statistic G. Leveraged H. Normal quantile plot of residuals I. Timeplot of residuals J. Plot of residuals on x

C

Use this plot to check for dependence in data over time. A. Normal quantile plot of residuals B. Plot of residuals on x C. Timeplot of residuals Your answer is correct. D. Scatterplot of y on x

C

Percentage variation described by a fitted line

r^2

To identify the presence of​ curvature, it can be helpful to begin by fitting a line and plotting the residuals from the linear equation. A. This statement is false. Residuals cannot be calculated if an association is not linear. B. This statement is false. Fitting a curve and plotting the residuals from the nonlinear equation will help identify the presence of curvature. C. This statement is false. Residuals do not indicate whether or not an association is linear. D. This statement is true.

D

Use this plot to check the linear enough condition. A. Normal quantile plot of residuals or a scatterplot of y on x B. Plot of residuals on x or a normal quantile plot of residuals C. Normal quantile plot of residuals D. Scatterplot of y on x or a plot of residuals on x Your answer is correct. E. Timeplot of residuals F. Scatterplot of y on x or a timeplot of residuals G. Timeplot of residuals or a plot of residuals on x

D

An analyst is trying to purchase a large tract of land. The current owner of the tract has already subdivided the land into separate building lots and has prepared the lots by removing some of the trees. The developer wants to forecast the value of each lot. From previous experience, she knows that the most important factors affecting the price of a lot are size, number of mature trees, and distance to the lake. She runs the following multiple regression model for her analysis: P r i c e = β 0 + β 1 L o t S i z e + β 2 T r e e s + β 3 D i s t a n c e + ε Identify the dependent and independent variables.

Dependent variable: Price; Independent variables: Lot Size, Trees, Distance

x represents the number of home theater systems sold per month at an electronics store

Discrete variable

Statistic used to detect dependence in sequences of residuals A. Heteroscedasticity B. Leveraged C. ​Durbin-Watson statistic D. Random sample from a population E. Scatterplot of Y on X F. Timeplot of residuals G. Outlier H. Normal quantile plot of residuals I. Plot of residuals on x J. Homoscedasticity

Durbin-Watson statistic

Use this plot to check the similar variances condition. A. Outlier B. Timeplot of residuals C. Heteroscedasticity D. Normal quantile plot of residuals E. Plot of residuals on x Your answer is correct. F. ​Durbin-Watson statistic G. Homoscedasticity H. Leveraged I. Random sample from a population

E

What is the name of the variable that is used to predict another variable?

Explanatory

Consider the following simple linear regression model: y=B0+B1x+epsilon . The random error term is ________.

Epsilon

s/[sqrt]n s divided by square root of n

Estimated standard error of Ybar

Use this plot to check the nearly normal condition. A. Plot of residuals on x B. Leveraged C. Random sample from a population D. ​Durbin-Watson statistic E. Homoscedasticity F. Normal quantile plot of residuals G. Outlier H. Heteroscedasticity I. Timeplot of residuals J. Scatterplot of Y on X

F

Ninety-five percent​ z-intervals have the form of a statistic plus or minus 3 standard errors of the statistic.

False. Ninety-five percent​ z-intervals have the form of a statistic plus or minus 2 standard​ error(s) of the statistic.

Consider the following simple linear regression model: y=B0+B1x+epsilon When determining whether there is a negative linear relationship between x and y, the alternative hypothesis takes the form ________.

H1: B1<0

Which alternative hypothesis should be used to test the significance of a positive slope in a regression model?

H1: B>0

Term that describes data with unequal error variation Outlier Heteroscedasticity Leveraged Homoscedasticity

Heteroscedasticity

Symbol for the standard deviation of the residuals

Se (standard error)

A histogram with a long right tail. interquartile range standard deviation skewed Your answer is correct. variance ​z-score

Skewed

The square root of the variance.

Standard deviation

In​ practice, xbar and R-charts are used together to monitor a process.​ However, the​ R-chart should be interpreted before the xbar​-chart. ​Why?

The control limits of the xbar-chart are a function of​ R, meaning that if the process variation is out of statistical​ control, the control limits of the xbar​-chart have little meaning.

What does it mean to say​ "correlation does not imply​ causation"?

The fact that two variables are strongly correlated does not in itself imply a​ cause-and-effect relationship between the variables.

Suppose you were looking at the histogram of the incomes of all of the households in the United States. Do you think that the histogram would be bell​ shaped? Skewed to the left or​ right?

The histogram would be heavily​ right-skewed.

What requirements are necessary for a normal probability distribution to be a standard normal probability​ distribution?

The mean and standard deviation have the values of mean=0 and sigma=1

A summary of sales made in the quarterly report of a department store says that the average retail purchase was​ $125 with a margin of error equal to​ $15. What does the margin of error mean in this​ context?

The population average of sales is within​ $15 of the​ estimate, with some degree of confidence.

In the sample regression equation ycarot=B0+B1x what is ycarot?

The predicted value of y, , given a specific x value

Describe the range of values for the correlation coefficient.

The range of values for the correlation coefficient is −1 to​ 1, inclusive.

An accountant at a retail shopping chain accidentally calculated the correlation between the phone number of customers and their outstanding debt. He should expect to find a substantial positive correlation.

The statement is false. The accountant should expect to find no correlation.

In a​ scatterplot, the response is shown on the horizontal axis with the explanatory variable on the vertical axis.

The statement is false. The explanatory variable is shown on the horizontal axis with the response on the vertical axis.

​Cramer's V is 0 if the categorical variables are not associated.

The statement is true.

As the size of a sample​ increases, the standard deviation of the distribution of sample means increases.

This statement is false. A true statement​ is, "As the size of a sample​ increases, the standard deviation of the distribution of sample means​ decreases."

All other things the​ same, a 90​% confidence interval is shorter than a 99​% confidence interval.

True

Auditors at a bank randomly sample 100 withdrawal transactions made at ATM machines each day and use video records to verify that authorized users of the accounts made the transactions. The system records the amounts withdrawn. The average withdrawal is typically​ $50 with SD​ $40. Deposits are handled separately. A histogram of the average withdrawal amounts made daily over the span of a month should cluster around​ $50.

True

Auditors at a bank randomly sample 100 withdrawal transactions made at ATM machines each day and use video records to verify that authorized users of the accounts made the transactions. The system records the amounts withdrawn. The average withdrawal is typically​ $50 with SD​ $40. Deposits are handled separately. A histogram of the daily standard deviations of the withdrawal amounts over the span of a month should cluster around​ $40.

True

Cases is another name for the columns in a data table.

True

If the 90​% confidence interval for the average purchase of customers at a department store is ​$40 to​ $120​, then ​$100 is a plausible value for the population mean at this level of confidence.

True

Occurs if the​ p-value is less than the α​-level when H0 is true

Type 1 error

Multicollinearity is suspected when ________.

there is a high Rsqr coupled with insignificant explanatory variables

When estimating a population​ mean, are you more likely to be correct when you use a point estimate or an interval​ estimate? Explain your reasoning.

You are more likely to be correct using an interval estimate because it is unlikely that a point estimate will exactly equal the population mean.

The forward selection method of stepwise regression

adds predictors one at a time starting with the best single predictor.

Maximum tolerance for incorrectly rejecting H0

alpha level

A multiple regression model includes (X1 X2). The term is called ________.

an interaction

Counts cases that match values of two categorical variables

cell

Shown in a stacked bar chart

conditional distribution

Table of​ cross-classified counts

contingency table

x represents the volume of milk taken from one cow for a day

continuous

Autocorrelation occurs when the residuals are

correlated among each other

For the multiple regression model: y ^ = 75 + 25x1 − 15x2 + 10x3 if we were to increase x 2 by 5, holding x 1 and x 3 constant, the value of y will:

decrease on average by 75

In the quadratic model, y = β 0 + β 1 x + β 2 x 2 + ε, a negative value of β1 indicates a downward concavity.

false

In regression, multicollinearity is considered problematic when two or more explanatory variables are ________.

highly correlated

In regression analysis, which shape of residual plot demonstrates homoscedasticity?

horizontal band

Like a stacked bar chart but respecting the area principle

mosaic plot

If the simple correlation coefficient between two independent variables is greater than 0.90, then ________ is considered to be severe.

multicollinearity

In multiple regression analysis, when the independent variables are highly correlated, this situation is called ________.

multicollinearity

The Variance Inflation Factor (VIF) is used to assess ________________.

multicollinearity

The confidence interval estimate of the expected value of y for a given value x, compared to the prediction interval of y for the same given value of x and confidence level, will be:

narrower

In regression modeling, if you take a particular x value and plug it into a regression line equation, the result is a(n) ____________________.

point

Which process parameter is an x​bar-chart used to​ monitor?

process mean

What characteristic of a process is an R​-chart designed to​ monitor?

process variation

Simple linear regression analysis differs from multiple regression analysis in that ________.

simple linear regression uses only one explanatory variable

________ is an iterative variable selection procedure that allows an independent variable to be added to a multiple regression model in one iteration and deleted during the next iteration.

stepwise regression

Even if all the points on an xbar-chart fall between the control​ limits, the process may be out of control. Explain.

the process may be out of control because there may be nonrandom patterns of variation that have not yet broken through the control limits​ (or may never break​ through).

Autocorrelation is typically observed in ________.

time series data

The C p statistic is used

to choose the best model in regression model-building.

Symbol for the explanatory variable in a regression

x

Which statistic is the best unbiased estimator for μ​?

xbar

Symbol for the response in a regression

y

Residual from an estimated regression equation

y-ycarrot

Fitted value from an estimated regression equation

ybar

The number of standard deviations from the mean.

z-score


Conjuntos de estudio relacionados

Chapter 15 - Civil Rights: Equal Rights Under the Law

View Set

Grammar: les gérondifs/participes passés, le futur, le conditionnel

View Set

Ophthamology and Otolaryngology questions

View Set