Weeks 5 and 6

Assuming a linear relationship between X and Y, if the coefficient of correlation (r) equals - 0.30,

the slope (b1) is negative.

The sample correlation coefficient between X and Y is 0.375. It has been found out that the p-value is 0.256 when testing against the one-sided alternative . To test against the two-sided alternative at a significance level of 0.1, the p-value is:

(0.256)(2)

If the plot of the residuals is fan shaped, which assumption is violated?

Homoscedasticity.

If the Durbin-Watson statistic has a value close to 0, which assumption is violated?

Independence of errors

If the Durbin-Watson statistic has a value close to 0, which assumption is violated?

Independence of errors.

Which of the following assumptions concerning the probability distribution of the random error term is stated incorrectly?

The variance of the distribution increases as X increases

A multiple regression is called "multiple" because it has several explanatory variables. T/F

True

A zero population correlation coefficient between a pair of random variables means that there is no linear relationship between the random variables. T/F

True

The Y-intercept (b0) represents the

estimated average Y when X=0

The coefficient of multiple determination r2Y.12

measures the proportion of variation in Y that is explained by X1 and X2.

When an additional explanatory variable is introduced into a multiple regression model, the adjusted can never decrease. T/F

False

When an explanatory variable is dropped from a multiple regression model, the coefficient of multiple determination can increase. T/F

False

An economist is interested to see how consumption for an economy (in $ billions) is influenced by gross domestic product ($ billions) and aggregate price (consumer price index). The Microsoft Excel output of this regression is partially reproduced below. Referring to the scenario above, to test for the significance of the coefficient on aggregate price index, the p-value is:

0.8330

If a categorical independent variable contains 2 categories, then _________ dummy variable(s) will be needed to uniquely represent these categories.

1

If a categorical independent variable contains 4 categories, then _________ dummy variable(s) will be needed to uniquely represent these categories.

3

A real estate builder wishes to determine how house size (House) is influenced by family income (Income) and family size (Size). House size is measured in hundreds of square feet and income is measured in thousands of dollars. The builder randomly selected 50 families and ran the multiple regression. Partial Microsoft Excel output is provided below: Referring to the scenario above, what are the residual degrees of freedom that are missing from the output?

47

The marketing manager for a nationally franchised lawn service company would like to study the characteristics that differentiate home owners who do and do not have a lawn service. A random sample of 30 home owners located in a suburban area near a large city was selected; 11 did not have a lawn service (code 0) and 19 had a lawn service (code 1). Additional information available concerning these 30 home owners includes family income (Income, in thousands of dollars) and lawn size (Lawn Size, in thousands of square feet). The PHStat output is given below: Referring to scenario above, which of the following is the correct interpretation for the Income slope coefficient?

Holding constant the effect of lawn size, the estimated natural logarithm of the odds ratio of purchasing a lawn service increases by 0.0304 for each increase of one thousand dollars in family income.

An economist is interested to see how consumption for an economy (in $ billions) is influenced by gross domestic product ($ billions) and aggregate price (consumer price index). The Microsoft Excel output of this regression is partially reproduced below. Referring to the scenario above, the p-value for the regression model as a whole is:

None of the selections

A real estate builder wishes to determine how house size (House) is influenced by family income (Income) and family size (Size) . House size is measured in hundreds of square feet and income is measured in thousands of dollars. The builder randomly selected 50 families and ran the multiple regression. Partial Microsoft Excel output is provided below: Referring to the scenario above and allowing for a 1% probability of committing a type I error, what is the decision and conclusion for the test: H0: B1=B2=0 vs H1: At least one Bj (does not =) 0, J=1,2?

Reject H0 and conclude that the 2 independent variables taken as a group have significant linear effects on house size.

A large national bank charges local companies for using their services. A bank official reported the results of a regression analysis designed to predict the bank's charges (Y)-- measured in dollars per month-- for services rendered to local companies. One independent variable used to predict service charges to a company is the company's sales revenue (X)-- measured in millions of dollars. Data for 21 companies who use the bank's services were used to fit the model: y1=b0+b1x1+E1 The results of the simple linear regression are provided below. Y= =2,700 + 20X, S(yx w/ a line)= 65, two-tail p value= 0.034 (for testing B1) Referring to scenario above, a 95% confidence interval for B1 is (15,30). Interpret the interval.

You are 95% confident that mean service charge (Y) will increase between $15 and $30 for every $1 million increase in sales revenue (X).

If the correlation coefficient (r) = 1.00, then

all the data points must fall exactly on a straight line with a positive slope.

The Slope (b1) represents

the estimated average change in Y per unit change in X

An economist is interested to see how consumption for an economy (in $ billions) is influenced by gross domestic product ($ billions) and aggregate price (consumer price index). The Microsoft Excel output of this regression is partially reproduced below. Referring to the scenario above, one economy in the sample had an aggregate consumption level of $3 billion, a GDP of $3.5 billion, and an aggregate price level of 125. What is the residual for this data point?

$0.48 billion

An economist is interested to see how consumption for an economy (in $ billions) is influenced by gross domestic product ($ billions) and aggregate price (consumer price index). The Microsoft Excel output of this regression is partially reproduced below. Referring to the scenario above, what is the estimated mean consumption level for an economy with GDP equal to $4 billion and an aggregate price index of 150?

$2.89 billion

An economist is interested to see how consumption for an economy (in $ billions) is influenced by gross domestic product ($ billions) and aggregate price (consumer price index). The Microsoft Excel output of this regression is partially reproduced below. Referring to the scenario above, what is the predicted consumption level for an economy with GDP equal to $4 billion and an aggregate price index of 150?

$2.89 billion

The sample correlation coefficient between X and Y is 0.375. It has been found out that the p-value is 0.744 when testing against the one-sided alternative . To test against the two-sided alternative at a significance level of 0.1, the p-value is

(1 - 0.744)(2)

A candy bar manufacturer is interested in trying to estimate how sales are influenced by the price of their product. To do this, the company randomly chooses 6 small cities and offers the candy bar at different prices. Using candy bar sales as the dependent variable, the company will conduct a simple linear regression on the data below: City Price ($) Sales River Falls 1.30 100 Hudson 1.60 90 Ellsworth 1.80 90 Prescott 2.00 40 Rock Elm 2.40 38 Stillwater 2.90 32 Referring to scenario above, what is the estimated slope for the candy bar price and sales data?

- 48.193

An economist is interested to see how consumption for an economy (in $ billions) is influenced by gross domestic product ($ billions) and aggregate price (consumer price index). The Microsoft Excel output of this regression is partially reproduced below. Referring to the scenario above, to test for the significance of the coefficient on aggregate price index, the value of the relevant t-statistic is:

-0.219

A candy bar manufacturer is interested in trying to estimate how sales are influenced by the price of their product. To do this, the company randomly chooses 6 small cities and offers the candy bar at different prices. Using candy bar sales as the dependent variable, the company will conduct a simple linear regression on the data below: City Price ($) Sales River Falls 1.30 100 Hudson 1.60 90 Ellsworth 1.80 90 Prescott 2.00 40 Rock Elm 2.40 38 Stillwater 2.90 32 Referring to scenario above, what is the estimated mean change in the sales of the candy bar if price goes up by $1.00?

-48.193

The sample correlation coefficient between X and Y is 0.375. It has been found out that the p-value is 0.256 when testing against the two-sided alternative . To test against the one-sided alternative at a significance level of 0.1, the p-value is:

0.256 / 2

The sample correlation coefficient between X and Y is 0.375. It has been found out that the p-value is 0.256 when testing against the two-sided alternative To test against the one-sided alternative at a significance level of 0.1, the p-value is:

1-0.256/2

An economist is interested to see how consumption for an economy (in $ billions) is influenced by gross domestic product ($ billions) and aggregate price (consumer price index). The Microsoft Excel output of this regression is partially reproduced below. Referring to the scenario above, when the economist used a simple linear regression model with consumption as the dependent variable and GDP as the independent variable, he obtained an r2 value of 0.971. What additional percentage of the total variation of consumption has been explained by including aggregate prices in the multiple regression?

1.1

A candy bar manufacturer is interested in trying to estimate how sales are influenced by the price of their product. To do this, the company randomly chooses 6 small cities and offers the candy bar at different prices. Using candy bar sales as the dependent variable, the company will conduct a simple linear regression on the data below: City Price ($) Sales River Falls 1.30 100 Hudson 1.60 90 Ellsworth 1.80 90 Prescott 2.00 40 Rock Elm 2.40 38 Stillwater 2.90 32 Referring to scenario below, what is for these data?

1.66

A candy bar manufacturer is interested in trying to estimate how sales are influenced by the price of their product. To do this, the company randomly chooses 6 small cities and offers the candy bar at different prices. Using candy bar sales as the dependent variable, the company will conduct a simple linear regression on the data below: City Price ($) Sales River Falls 1.30 100 Hudson 1.60 90 Ellsworth 1.80 90 Prescott 2.00 40 Rock Elm 2.40 38 Stillwater 2.90 32 Referring to scenario below, what is the standard error of the regression slope estimate, ?

12.650

A candy bar manufacturer is interested in trying to estimate how sales are influenced by the price of their product. To do this, the company randomly chooses 6 small cities and offers the candy bar at different prices. Using candy bar sales as the dependent variable, the company will conduct a simple linear regression on the data below: City Price ($) Sales River Falls 1.30 100 Hudson 1.60 90 Ellsworth 1.80 90 Prescott 2.00 40 Rock Elm 2.40 38 Stillwater 2.90 32 Referring to scenario below, what is the standard error of the regression slope estimate, Sb1?

12.650

A candy bar manufacturer is interested in trying to estimate how sales are influenced by the price of their product. To do this, the company randomly chooses 6 small cities and offers the candy bar at different prices. Using candy bar sales as the dependent variable, the company will conduct a simple linear regression on the data below: City Price ($) Sales River Falls 1.30 100 Hudson 1.60 90 Ellsworth 1.80 90 Prescott 2.00 40 Rock Elm 2.40 38 Stillwater 2.90 32 Referring to scenario below, what is the standard error of the estimate, SYX, for the data?

16.299

A real estate builder wishes to determine how house size (House) is influenced by family income (Income) and family size (Size) . House size is measured in hundreds of square feet and income is measured in thousands of dollars. The builder randomly selected 50 families and ran the multiple regression. Partial Microsoft Excel output is provided below: Referring to the scenario above, the observed value of the F-statistic is missing from the printout. What are the degrees of freedom for this F-statistic?

2 for the numerator, 47 for the denominator

A manager of a product sales group believes the number of sales made by an employee (Y) depends on how many years that employee has been with the company (X1) and how he/she scored on a business aptitude test (X2). A random sample of 8 employees provides the following: Referring to scenario above, for these data, what is the value for the regression constant, b0?

21.293

A candy bar manufacturer is interested in trying to estimate how sales are influenced by the price of their product. To do this, the company randomly chooses 6 small cities and offers the candy bar at different prices. Using candy bar sales as the dependent variable, the company will conduct a simple linear regression on the data below: City Price ($) Sales River Falls 1.30 100 Hudson 1.60 90 Ellsworth 1.80 90 Prescott 2.00 40 Rock Elm 2.40 38 Stillwater 2.90 32 Referring to scenario below, if the price of the candy bar is set at $2, the predicted sales will

65

A real estate builder wishes to determine how house size (House) is influenced by family income (Income) and family size (Size) . House size is measured in hundreds of square feet and income is measured in thousands of dollars. The builder randomly selected 50 families and ran the multiple regression. Partial Microsoft Excel output is provided below: Referring to the scenario above, what fraction of the variability in house size is explained by income and size of family?

71.89%

A candy bar manufacturer is interested in trying to estimate how sales are influenced by the price of their product. To do this, the company randomly chooses 6 small cities and offers the candy bar at different prices. Using candy bar sales as the dependent variable, the company will conduct a simple linear regression on the data below: City Price ($) Sales River Falls 1.30 100 Hudson 1.60 90 Ellsworth 1.80 90 Prescott 2.00 40 Rock Elm 2.40 38 Stillwater 2.90 32 Referring to scenario below, what percentage of the total variation in candy bar sales is explained by prices?

78.39%

A candy bar manufacturer is interested in trying to estimate how sales are influenced by the price of their product. To do this, the company randomly chooses 6 small cities and offers the candy bar at different prices. Using candy bar sales as the dependent variable, the company will conduct a simple linear regression on the data below: City Price ($) Sales River Falls 1.30 100 Hudson 1.60 90 Ellsworth 1.80 90 Prescott 2.00 40 Rock Elm 2.40 38 Stillwater 2.90 32 Referring to scenario below, what is the percentage of the total variation in candy bar sales explained by the regression model?

78.39%

A large national bank charges local companies for using their services. A bank official reported the results of a regression analysis designed to predict the bank's charges (Y) -- measured in dollars per month -- for services rendered to local companies. One independent variable used to predict service charges to a company is the company's sales revenue (X) -- measured in millions of dollars. Data for 21 companies who use the bank's services were used to fit the model: The results of the simple linear regression are provided below. Referring to scenario above, interpret the estimate of Math Formula, the standard deviation of the random error term (standard error of the estimate) in the model.

About 95% of the observed service charges fall within $130 of the least squares line.

A large national bank charges local companies for using their services. A bank official reported the results of a regression analysis designed to predict the bank's charges (Y)-- measured in dollars per month-- for services rendered to local companies. One independent variable used to predict service charges to a company is the company's sales revenue (X)-- measured in millions of dollars. Data for 21 companies who use the bank's services were used to fit the model: y1=b0+b1x1+E1 The results of the simple linear regression are provided below. Y= =2,700 + 20X, S(yx w/ a line)= 65, two-tail p value= 0.034 (for testing B1) Referring to scenario above, interpret the estimate of b0; the Y-intercept of the line.

All companies will be charged at least $2,700 by the bank

The width of the prediction interval for the predicted value of Y is dependent on

All of the selections

A multiple regression is called "multiple" because it has several data points. T/F

False

In a particular model, the sum of the squared residuals was 847. If the model had 5 independent variables, and the data set contained 40 points, the value of the standard error of the estimate is 24.911. T/F

False

Multiple regression is the process of using several independent variables to predict a number of dependent variables. T/F

False

The Durbin-Watson D statistic is used to check the assumption of normality. T/F

False

The coefficient of multiple determination is calculated by taking the ratio of the regression sum of squares over the total sum of squares (SSR/SST) and subtracting that value from 1. T/F

False

The total sum of squares (SST) in a regression model will never be greater than the regression sum of squares (SSR). T/F

False

A logistic regression model was estimated in order to predict the probability that a randomly chosen university or college would be a private university using information on mean total Scholastic Aptitude Test score (SAT) at the university or college and whether the TOEFL criterion is at least 90 (Toefl90 = 1 if yes, 0 otherwise.) The dependent variable, Y, is school type (Type = 1 if private and 0 otherwise). The PHStat output is given below: Referring to scenario above, which of the following is the correct interpretation for the Toefl90 slope coefficient?

Holding constant the effect of SAT, the estimated natural logarithm of the odds ratio of the school being a private school is 0.1928 higher for a school that has a TOEFL criterion that is at least 90 than one that does not.

A logistic regression model was estimated in order to predict the probability that a randomly chosen university or college would be a private university using information on mean total Scholastic Aptitude Test score (SAT) at the university or college and whether the TOEFL criterion is at least 90 (Toefl90 = 1 if yes, 0 otherwise.) The dependent variable, Y, is school type (Type = 1 if private and 0 otherwise). The PHStat output is given below: Referring to scenario above, which of the following is the correct interpretation for the SAT slope coefficient?

Holding constant the effect of Toefl90, the estimated natural logarithm of the odds ratio of the school being a private school increases by 0.0028 for each increase of one point in mean SAT score.

To explain personal consumption (CONS) measured in dollars, data is collected for INC: personal income in dollars CRDTLIM: $1 plus the credit limit in dollars available to the individual APR: mean annualized percentage interest rate for borrowing for the individual ADVT: per person advertising expenditure in dollars by manufacturers in the city where the individual lives SEX: gender of the individual; 1 if female, 0 if male A regression analysis was performed with CONS as the dependent variable and CRDTLIM, APR, ADVT, and GENDER as the independent variables. The estimated model was What is the correct interpretation for the estimated coefficient for GENDER?

Holding the effect of the other independent variables constant, mean personal consumption for females is estimated to be $0.39 higher than males.

The marketing manager for a nationally franchised lawn service company would like to study the characteristics that differentiate home owners who do and do not have a lawn service. A random sample of 30 home owners located in a suburban area near a large city was selected; 11 did not have a lawn service (code 0) and 19 had a lawn service (code 1). Additional information available concerning these 30 home owners includes family income (Income, in thousands of dollars) and lawn size (Lawn Size, in thousands of square feet). The PHStat output is given below: Referring to scenario above, which of the following is the correct expression for the estimated model?

In (estimated odds ratio)= - 7.8562 + 0.0304Income + 12804LawnSize

A logistic regression model was estimated in order to predict the probability that a randomly chosen university or college would be a private university using information on mean total Scholastic Aptitude Test score (SAT) at the university or college and whether the TOEFL criterion is at least 90 (Toefl90 = 1 if yes, 0 otherwise.) The dependent variable, Y, is school type (Type = 1 if private and 0 otherwise). The PHStat output is given below: Referring to scenario above, which of the following is the correct expression for the estimated model?

In (estimated odds ratio)= -3.9594 + 0.0028 SAT + 0.1928 Toefl90

A real estate builder wishes to determine how house size (House) is influenced by family income (Income) and family size (Size) . House size is measured in hundreds of square feet and income is measured in thousands of dollars. The builder randomly selected 50 families and ran the multiple regression. Partial Microsoft Excel output is provided below: Referring to the scenario above, which of the independent variables in the model are significant at the 5% level?

Income and Size

An economist is interested to see how consumption for an economy (in $ billions) is influenced by gross domestic product ($ billions) and aggregate price (consumer price index). The Microsoft Excel output of this regression is partially reproduced below. Referring to the scenario above, the p-value for GDP is:

None of the selections

An economist is interested to see how consumption for an economy (in $ billions) is influenced by gross domestic product ($ billions) and aggregate price (consumer price index). The Microsoft Excel output of this regression is partially reproduced below. Referring to the scenario above, the p-value for the aggregated price index is :

None of the selections

A large national bank charges local companies for using their services. A bank official reported the results of a regression analysis designed to predict the bank's charges (Y)-- measured in dollars per month-- for services rendered to local companies. One independent variable used to predict service charges to a company is the company's sales revenue (X)-- measured in millions of dollars. Data for 21 companies who use the bank's services were used to fit the model: y1=b0+b1x1+E1 The results of the simple linear regression are provided below. Y= =2,700 + 20X, S(yx w/ a line)= 65, two-tail p value= 0.034 (for testing B1) Referring to scenario above, interpret the p-value for testing whether b1 exceed 0.

There is sufficient evidence (at the Math Formula = 0.05) to conclude that sales revenue (X) is a useful linear predictor of service charge (Y).

The Chancellor of a university has commissioned a team to collect data on students' GPAs and the amount of time they spend bar hopping every week (measured in minutes). He wants to know if imposing much tougher regulations on all campus bars to make it more difficult for students to spend time in any campus bar will have a significant impact on general students' GPAs. His team should use a t test on the slope of the population regression. T/F

True

The coefficient of multiple determination measures the proportion of the total variation in the dependent variable that is explained by the set of independent variables. T/F

True

The confidence interval for the mean of Y is always narrower than the prediction interval for an individual response Y given the same data set, X value, and confidence level. T/F

True

The interpretation of the slope is different in a multiple linear regression model as compared to a simple linear regression model. T/F

True

The slopes in a multiple regression model are called net regression coefficients. T/F

True

When an additional explanatory variable is introduced into a multiple regression model, the coefficient of multiple determination will never decrease. T/F

True

When an explanatory variable is dropped from a multiple regression model, the adjusted can increase. T/F

True

When r = - 1, it indicates a perfect relationship between X and Y. T/F

True

You give a pre-employment examination to your applicants. The test is scored from 1 to 100. You have data on their sales at the end of one year measured in dollars. You want to know if there is any linear relationship between pre-employment examination score and sales. An appropriate test to use is the t test of the population correlation coefficient. T/F

True

If you wanted to find out if alcohol consumption (measured in fluid oz.) and grade point average on a 4-point scale are linearly related, you would perform a

a t test for a correlation coefficient

In a multiple regression model, the value of the coefficient of multiple determination:

has to fall between 0 and +1.

Testing for the existence of correlation is equivalent to

testing for the existence of the slope (B1)

In a multiple regression problem involving two independent variables, if b1 is computed to be +2.0, it means that:

the estimated mean of Y increases by 2 units for each increase of 1 unit of X1, holding X2 constant.

The coefficient of determination (r2) tells you:

the proportion of total variation that is explained.

An interaction term in a multiple regression model may be used when

the relationship between X1 and Y changes for differing values of X2.

A dummy variable is used as an independent variable in a regression model when

the variable involved is categorical.

In performing a regression analysis involving two numerical variables, you are assuming

the variation around the line of regression is the same for each X value

The standard error of the estimate is a measure of

the variation around the sample regression line.

If the correlation coefficient (r) = 1.00, then

there is no unexplained variation

Data that exhibit an autocorrelation effect violate the regression assumption of independence. T/F

true

The Regression Sum of Squares (SSR) can never be greater than the Total Sum of Squares (SST). T/F

True

If the Durbin-Watson statistic has a value close to 4, which assumption is violated?

Independence of errors.

In a multiple regression model, which of the following is correct regarding the value of the adjusted ?

It can be negative

The least squares method minimizes which of the following?

SSE

If you have taken into account all relevant explanatory factors, the residuals from a multiple regression model should be random. T/F

True

The Y-intercept (b0) represents the

predicted value of Y when X=0

The variation attributable to factors other than the relationship between the independent variables and the explained variable in a regression analysis is represented by

error sum of squares.

Weeks 5 and 6

Ensembles d'études connexes

Progressive Tax

exam ch2-3

History 11 Chapter 2:

Administration of Medications

Quiz Ch. 4-6 IT Project Management

Sociology 100 Exam 2

CHAPTER 2 - OWNERSHIPS and TRANSFERS

Use to and Would for past habits

Art Appreciation 4.1-

PSYCH EOR ROSH

AWS Services

Chapter 5- Smartbook questions

Early Modern Europe

Mental Health

Earth Science - Exogenic Processes (Chapter 7.1-Weathering)

epi midterm

Chapter 15: Sexual Morality

Chapter 20

Masterin Astronomy Final - Unit 1

An Intro to CC