Ch. 13

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

A response variable, Price, is defined as the selling price of a used car. Three predictor variables include, Age, the age of the car in years, Mileage, the mileage of the car in thousands of miles, and Cylinders, the number of engine cylinders. The estimated regression equation is: Price = 10,000 - 1800Age - 50Mileage + 1200Cylinders. By how much will the price be reduced for each additional 10,000 miles on the car?

$-500

Identify the four criteria for regression assessment.

Logic, Fit, Parsimony, Stability

True or false: Testing the coefficient at zero is the same as checking the coefficient confidence interval to see if it contains zero.

True

We can check for normality of errors by looking at

a normplot of residuals. a histogram of residuals.

We can check for heteroscedasticity by looking at

a plot of residuals against the X values.

The confidence interval for Y will be ___ prediction interval for Y.

narrower than the

Dependent errors are often found in

time-series data.

If two predictor variables, X1 and X2, are suspected of having an interaction effect on the response variable Y, we can test for this by adding the term _____ to the model.

β3x1x2

If a predictor variable, Xj, is suspected of having a quadratic relationship with the response variable Y, we can test for this by adding the term _____ to the model.

βjxj2

A response variable, Price, is defined as the selling price of a used car. Two predictor variables include Mileage, the mileage of the car in thousands of miles, and a binary predictor, Type, where Type = 1 if the car is a sedan or 0 if the car is not a sedan. The estimated regression equation is: Price = 12,000 - 60Mileage - 1500Type. Predict the price for a sedan with 50,000 miles.

$7500

A response variable, Price, is defined as the selling price of a used car. Three predictor variables include, Age, the age of the car in years, Mileage, the mileage of the car in thousands of miles, and Cylinders, the number of engine cylinders. The estimated regression equation is: Price = 10,000 - 1800Age - 50Mileage + 1200Cylinders. Predict the average price of a 2 year old car with 50,000 miles and 4 cylinders.

$8700

A standardized residual that is outside the range ____ or ____ would be considered unusual or an outlier.

(-2, +2) (-3, +3)

Each category of a qualitative variable can be converted to a binary variable by assigning the value ____________ or ________________ to indicate the presence or absence of the condition

0; 1

A multiple regression data set includes binary variables describing vehicle type using the four categories car, truck, SUV, and van. If the car, truck, and SUV variables are all equal to zero then the van binary variable would be equal to

1

A VIF greater than _____________ can be cause for concern.

10

Evan's Rule states that there should be at least ______________ observations for each predictor variable

10

If a qualitative variable has five categories then we need only ___ binary predictors.

4

Doane's Rule states that there should be at least ________________ observations for each predictor variable

5

In a study, SST = 1,000, SSE = 200. Find the coefficient of determination.

80%

True or false: A binary predictor variable is tested for significance using a different test statistic than used for a quantitative predictor variable.

False

True or false: Software packages such as Excel or MINITAB routinely report left-tail p-values when testing multiple regression coefficients.

False

How does the coefficient of determination help as a goodness of fit tool in regression analysis?

It gives the percentage of the variation in Y explained by the sample regression equation.

Match each regression criteria with the reasoning. Instructions

Logic - Is there an expectation that the predictor will help explain variation in the response? Fit - Does the overall regression show significant predictive ability? Parsimony - Does each predictor contribute significantly to the model? Stability - Are the predictors independent enough from each other so that the model is stable?

"When two explanations are otherwise equivalent, we prefer the simpler explanation." This is known as the principle of

Occam's Razor.

A response variable is defined as the selling price of a used car. Three predictor variables include the age of the car, the mileage of the car, and the number of cylinders. The proper estimated regression equation would be:

Price = b0 + b1Age + b2Mileage + b3Cylinders

What does SSR represent in regression analysis?

The amount of variation in Y that is explained.

What does SSE represent in regression analysis?

The amount of variation in Y that is left unexplained.

The three regression assumptions are

The errors have constant variance. The errors are independent. The errors are normally distributed.

The use of the standard error of regression, se, as a measure of goodness of fit of a model is best expressed by which of the following statements?

The smaller the value, the better the fit.

True or false: Taking a logarithm of a variable in a regression analysis can transform a nonlinear relationship between Y and X into a linear relationship between Y and log(X).

True

True or false: With multiple regression we're concerned about the degree of multicollinearity because almost any data set will have some level of multicollinearity.

True

A Variance Inflation Factor (VIF) can be calculated for each predictor using the formula

VIF = 1 / 1−Rj2

When writing a multiple regression equation the b0 term represents

a constant.

If c binary variables are created for a categorical predictor with c categories, the regression calculations will fail because we will have

a redundant predictor that causes perfect collinearity.

A residual is defined as

actual response - predicted response.

Suppose we define a qualitative variable called payment method and the categories are credit card, personal check, or cash. The binary variables defined are CC = 1 if pay by credit card (0 otherwise) and PC = 1 if pay by personal check (0 otherwise). If both CC = 0 and PC = 0 it means the payment method was by

cash

A measure of relative fit for a simple regression line is called the R² or __________ of __________

coefficient; determination

A _____________ interval for Y, the response variable, predicts the mean of Y whereas a _______________ interval for Y predicts the individual value for Y.

confidence; prediction

A correlation matrix can be used to identify possible _______________ between two predictor variables.

correlation

To check for collinearity a _____________ matrix can be used.

correlation

A response variable, Price, is defined as the selling price of a used car. Three predictor variables include, Age, the age of the car in years, Mileage, the mileage of the car in thousands of miles, and Cylinders, the number of engine cylinders. The estimated regression equation is: Price = 10,000 - 1800Age - 50Mileage + 1200Cylinders. If the estimated price of a used car is $8700 and the actual selling price is $9000, residual is

ei = $300

The population multiple regression model includes a response variable, a constant term, multiple explanatory variables, and an _______________ term

error

A response variable, Price, is defined as the selling price of a used car. Three predictor variables include, Age, the age of the car in years, Mileage, the mileage of the car in thousands of miles, and Cylinders, the number of engine cylinders. The estimated regression equation is: Price = 10,000 - 1800Age - 50Mileage + 1200Cylinders. The constant term, 10,000

has no meaningful interpretation in this model.

If the error terms in a regression analysis do not have constant variance we say the errors are

heteroscedastic

Variance inflation caused by multicollinearity can result

in difficulty identifying the contribution of each predictor. in wider confidence intervals for the coefficients of parameters than warranted. in untrustworthy t statistics for the coefficient estimates.

A parsimonious model is one that

is less complex.

A multiple regression model is preferred over a simple regression model because

it is possible that a predictor can appear unrelated to Y in a simple regression but can show significance when combined with another predictor. rarely does one predictor explain the variation in Y as well as several predictors.

When the predictor variables are related to each other rather than being independent we have a condition called

multicollinearity

A linear model that uses more than one predictor variable to describe a single response variable is called a _____________ regression model

multiple

A multivariate data set will have

n rows of observations k columns of X values a single column of Y values

A multivariate data set is a matrix of columns and rows where each row represents an

observation

Coefficient instability would be when X1 and X2 both show strong correlation with Y and

one or both of their coefficients are not significant.

Coefficient instability would be when X1 and X2 both show strong positive correlation with Y and

one or both of their coefficients are not significant. one or both of their coefficients are negative.

The R2 calculation is adjusted to reflect the inclusion of multiple predictors, in order to discourage the practice of ______________ by including insignificant predictors.

overfitting

We can check for dependent errors by

plotting the residuals in sequence and looking for a nonrandom trend.

se is an estimate for the standard deviation of the

regression errors

We test the three regression assumptions using the

residuals

A binary predictor variable, xk, is also called a ______________ variable because it adds a fixed constant, bk, to the response estimate when xk = 1.

shift

If the coefficient on the term x1x2 has a p-value less than alphas we would conclude that there is a ___________________ interaction effect between x1 and x2.

significant

We calculate a _________________ residual in order to spot unusual or outlier residual values.

standardized

VIF values greater than or equal to 10 are generally considered to show _____________ variance inflation.

strong

To test the null hypothesis H0: βj = 0 vs H1: βj ≠ 0 we use the test statistic

tcalc = bj−0 / sj

If the coefficient of an explanatory variable is hypothesized to be negative this would indicate

that as the explanatory increases the response variable decreases.

A response variable, Price, is defined as the selling price of a used car. Three predictor variables include, Age, the age of the car in years, Mileage, the mileage of the car in thousands of miles, and Cylinders, the number of engine cylinders. The estimated regression equation is: Price = 10,000 - 1800Age - 50Mileage + 1200Cylinders. The coefficient -1800 means

that for each year increase in a car's age, the price decreases by $1800 on average.

A high VIF for predictor j shows

that predictor j is strongly related to the other predictors.

If we fail to reject the null hypothesis that the coefficient βk = 0 then we conclude

that the predictor variable X is not associated with the response variable Y.

For a given regression model, the p-value for a binary variable coefficient is .028. If alpha is .05 then

the coefficient is significantly different from zero.

If the error terms in a regression analysis are not normally distributed

the confidence intervals for the parameters could be untrustworthy.

Non-normality of errors is considered a mild violation unless

the data has major outliers.

When developing a multiple regression model the analyst would like the model to be parsimonious which means

the model has only useful predictor variables.

Non-constant error variance is considered a serious violation because

the significance of the regression could be overstated.

Multicollinearity between the predictor variables causes

the variances of the coefficient estimates to become inflated.

True or false: Categorical variables can be included as predictor variables in regression analysis by using data coding.

true

True or false: R2adj is always less than R2.

true

A high residual value means the observation is far from the regression line in the ________________ direction

vertical

Identify the population multiple regression model.

y = β0 + β1x1 + β2x2 + ... + βkxk + ε


Kaugnay na mga set ng pag-aaral

Chapter 8 - Eating and sleep - week disorders

View Set

Voting and Public Opinion Exam 1

View Set

Understanding Core Azure Services (30-35%)

View Set

World Geography Chapter 25: India

View Set

The Neurological System (Part 1)

View Set

Med success genitourinary comprehensive exam

View Set

OWare- Essentials Of Communication 2. Language Characteristics

View Set

Community Ambulance Cancel Codes (800 codes)

View Set

Lecture Chapter 13 Brain and Spinal Cord

View Set

Fundamentals: Chapter 15: Critical Thinking in Nursing Practice

View Set