Module 5

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Suppose we want to assign dummy variables to months (Jan-Dec) and day of week (Sun-Sat). How many dummy variables do we need?

17 For each category, we must use one fewer dummy variables than the number of options for that category. Since month and day of week are separate categories, we should subtract one for each category. Thus we would use 12-1=11 variables for month and 7-1=6 variables for day of week, giving a total of 17 dummy variables.

The following are quantitative variables:

FreeIndependent, Group, Rate, SpecialEvent, TotalRewards, VIP, and Wholesale

The organizer of a late night street fair in a popular tourist city wants to analyze the relationship between daily revenue and the following variables: the number of male visitors, the number of female visitors, the number of retail stands, the number of food (and beverage) stands, and the number of performances that take place on a given night. The regression output table is provided below. Based on these results and using a 10% significance level, the organizer thinks he can improve the model. He wants to try removing at least one variable from the analysis to create and compare new models. Which variable or variables would you recommend that he consider removing from the regression model? SELECT ALL THAT APPLY.

Number of Male Visitors The p-value of "Number of Male Visitors", 0.2016, is greater than 0.1 so the organizer should consider removing this variables from the regression model. Number of Performances The p-value of "Number of Performances", 0.5412, is greater than 0.1 so the organizer should consider removing this variable from the regression model.

What are the bounds of the 95% confidence interval for the coefficient, ValentinesDay? Consult the regression output table above.

-761.38; -125.85 -761.38 is listed under "Lower 95%" for ValentinesDay, and -125.85 is listed under "Upper 95%" for ValentinesDay, so these are the bounds of the 95% confidence interval for the coefficient. The estimated coefficient is the center of the range and can be found under "Coefficients."

If the street fair organizer wanted to compare the explanatory power of the original model and the following new regression model, which value should he consult for the new model?

0.9225 It is important to use the Adjusted R2 to compare two regression models that have a different number of independent variables. 0.9225 is the Adjusted R2 of the new model.

Suppose we want to assign dummy variables to months (Jan-Dec). How many dummy variables do we need?

11 We always have one fewer dummy variable than the number of options. Since there are 12 months, there would be 11 dummy variables.

The following are qualitative variables:

2010, 2012, Christmas, NewYears, MemorialDay, PayDay, NewYears, SuperBowl, and Thanksgiving

How many independent variables are there in the model Caesars uses? Consult the regression output table above.

38 There are 38 independent variables in this model. You could have found this either by counting the number of independent variables or by looking at the Regression df, which represents the number of independent variables

we wanted to compare the explanatory power of this model against a model that excludes the independent variables Christmas, Halloween, and MemorialDay, which value should we use? Consult the regression output table above.

39.94% We use the Adjusted R2 to compare the explanatory power of models with different numbers of independent variables.

A sporting goods store manager wants to forecast annual sneaker revenues based on the type of sport (running, tennis, or walking), color (red, blue, white, black, or violet) and its target audience (men or women). How many independent variables should the manager include in her multiple regression analysis? Please enter your answer as an integer; that is with no decimal point.

7 Sales revenue is the dependent variable. Type of sport, color, and target audience are categorical variables which must be represented using dummy variables. Recall that it is necessary to use one fewer dummy variables than the number of options in a category. Thus, type of sport should be represented by 3-1=2 dummy variables, color should be represented by 5-1=4 dummy variables, and target audience should be represented by 2-1=1 dummy variables, for a total of 2+4+1=7 independent variables

Net Relationship

A multiple regression model determines the net effect of an independent variable on a dependent variable. The net effect controls for all other factors (independent variables) included in the regression model. For example, in a regression model including both distance and house size as independent variables, the coefficient for house size controls for distance. That is, the regression determines the average change in selling price if a house's size increases by one square foot but its distance from Boston does not change. Coefficients in multiple regression are net with respect to variables included in the model and gross with respect to variables that are omitted from the model.

Gross Relationship

A single variable linear regression model determines the gross effect of an independent variable on a dependent variable. For example, the gross effect of house size on selling price is the average change in selling price when house size increases by one square foot. Since no other independent variables are included in the model, the coefficient for house size may pick up the effect of other factors related to selling price.

Two houses are the same size, but located in different neighborhoods: House B is five miles farther from Boston than House A. If the selling price of House A was $450,000, what would we expect to be the selling price of House B?

Approximately $396,000 Since the two houses are the same size, to predict the expected difference in selling prices we should use the net effect of distance on selling price (that is, the effect of distance on selling price controlling for house size). This value, -$10,840.04/mile, is found in the multiple regression model. House B is five miles farther from Boston than House A so House B's expected selling price is: =House A's selling price+net effect of distance on selling price≈$450,000-$10,840.04(5miles)≈$450,000-$54,200.20≈$395,799.80

Assume we have created two single linear regression models, and a multiple regression model to predict selling price based on HouseSizeHouseSize alone, DistancefromBostonalone, or both. The three models are as follows, where HouseSizeHouseSize is in square feet and DistancefromBostonDistancefromBoston is in miles: SellingPrice=13,490.45+255.36(HouseSize)SellingPrice=13,490.45+255.36(HouseSize) SellingPrice=686,773.86-15,162.92(DistancefromBoston) SellingPrice=194,986.59+244.54(HouseSize)-10,840.04(DistancefromBoston) House A and House B are the same size, but located in different neighborhoods: House B is five miles closer to Boston than House A. If the selling price of House A is $450,000, what would we expect to be the selling price of House B?

Approximately $504,000 Since the two houses are the same size, to predict the expected difference in selling prices we should use -$10,840.04/mile, the net effect of distance on selling price (that is, the effect of distance on selling price controlling for house size), which can be found in the multiple regression model. House B is five miles closer to Boston than House A so House B's expected selling price is: House A's selling price+net effect of distance on selling price ≈ $450,000+$10,840.04(5 miles) ≈ $450,000+$54,200.20 ≈ $504,200.20

For use in a linear regression model, categorize which of the following variables should be represented as dummy variables and which can be represented as quantitative variables.

Dummy Variables SHOE COLOR NUMBER ON AN ATHLETE'S JERSEY GENDER ICE CREAM FLAVOR Quantitative Variables TIME TO RUN A MARATHON HEIGHT SIZE OF FLAT-SCREEN TELEVISION HOURS SPENT STUDYING CORE CALORIES IN DESSERTS Time to run a marathon, height, size of flat-screen television, hours spent studying CORe, and calories in desserts are quantitative variables. Shoe color, number on an athlete's jersey, gender, and ice cream flavor are categorical/qualitative variables and need to be transformed into dummy variables. Note that although athlete's jerseys have numbers, those values cannot be interpreted as real numbers. For example, Eli Manning's number is 10, whereas Peyton Manning's was 18. However, you can't interpret them to mean that Peyton is 80% more than Eli in some way.

Determine which variables are significant—at either the 99% or 95% confidence level—and which are not significant at either level. Make sure to choose the highest level of significance for each variable.

For a variable to be significant at the 99% confidence level, its p-value must be less than 1-0.99=0.01. Likewise, a variable is significant at the 95% confidence level if its p-value is less than 0.05. If the p-value of a variable is greater than 0.05, the variable is not significant at the 95% (or 99%) level.

A real estate developer has data on a number of U.S. National financial variables for each quarter from 1995 to 2001. The variables are housing starts (in thousands), the housing price index (a measure of average housing selling prices), unemployment rate, average disposable income, and home owner vacancy rates. A partial view of the data is below. If the developer wanted to create a regression model to predict housing starts from all the other financial variables, which of the following would be INDEPENDENT variables? (Select all that apply.)

House Price Index, Unemployment Rate, Disposable Income, and Home Owner Vacancy Rates are the independent variables used to create the regression model. Housing Starts (thousands) is the dependent variable used to create the regression model. Year and Quarter is not included as a dependent or independent variable.

Now let's look at the model which includes house size, distance from Boston, and lot size. Are all of the independent variables significant at the 5% significance level?

No The p-value for lot size is 0.1975, which is greater than 0.05, indicating that the relationship is not significant. You may also notice that the range indicated by the lower and upper bounds of the 95% confidence interval for lot size (-2.20, 10.13) contain zero. However, note that house size and distance from Boston are significant.

Single Variable Linear Regression (one independent variable)

SellingPrice=13,490.45+255.36(HouseSize) SellingPrice=686,773.86-15,162.92(DistancefromBoston)

Which model would we use to predict the price of a house that is 2,700 square feet?

SellingPrice=13,490.45+255.36(HouseSize)SellingPrice=13,490.45+255.36(HouseSize) Since we have data about just one independent variable, we should use a single variable regression model. This is a single variable linear regression model, in which house size is the only independent variable.

Suppose we want to forecast selling price based on house size and distance from Boston. Which equation should we use to forecast the price of a house that is 2,700 square feet and 15 miles from Boston?

SellingPrice=194,986.59+244.54(HouseSize)-10,840.04(DistancefromBoston) Since we have data about two independent variables, house size and distance from Boston, we should use the multiple regression model with those two variables.

lagged variable.

Step 1: Copy the advertising data in range C2:C11. Step 2: To create the lagged variable, paste the advertising data into the range D3:D12 in Column D, under the title "Previous Year's Advertising." That is, the value from C2 will be pasted into D3, from C3 into D4, and so on until the value in C11 is pasted into D12. For example, in D3, the value for 2005 Previous Year's Advertising will be the advertising expenditure for 2004, $35,000. When completed properly, Row 12 should contain only one observation (in D12). Since we do not have advertising data for 2003, we do not know Previous Year's Advertising for 2004; thus, D2 should be blank. Note: Rather than copying and pasting, you may also choose to link directly to cells (for example, cell D3 would contain the formula =C2).

egression model, including the residuals and residual plots, with lagged data.

Step 1: Select Data, then Data Analysis, then Regression. Step 2: Enter your Input Y range as B3:B11. (Notice that we cannot use the data for Sales in B2 since we do not have an entry for D2) Step 3: Enter your Input X range as C3:D11. (Notice that we cannot use the data for Advertising for 2004 in C2 since we do not have an entry for D2. Moreover, we cannot use the data in D12 since we don't have data for other variables for 2014.) Step 4: Check the Residuals and Residual Plot boxes, but DO NOT check the Labels box. Click OK to start the regression analysis.

In order to create a regression model to analyze the relationship between housing starts and the other financial variables, which cell references should be entered?

The "Input Y Range" denotes the cell reference for the dependent variable, Housing Starts. The data of the dependent variable is in B1:B81. The "Input X Range" denotes the cell references for the independent variables: House Price Index, Unemployment Rate, Disposable Income, and Home Owner Vacancy Rates. The data of the dependent variables is in C1:F81. Data contained in column A, Year and Quarter, are not included as a dependent or independent variable in the regression model.

Using the new model, forecast the daily revenue when there are 10 retail stands and 15 food stands open, and approximately 1,500 women visiting. fx

The expected daily revenue is B15+(1500*B16)+(10*B17)+(15*B18)=$49,485. You must link directly to values in order to obtain the correct answer.

Use the multiple regression model SellingPrice=194,986.59+244.54(HouseSize)-10,840.04(DistancefromBoston) where HouseSizeHouseSize is in square feet and DistancefromBostonDistancefromBoston is in miles, to predict the selling price of a house that is 1,500 square feet and 10 miles from Boston. pc3

The expected selling price of a 1,500 square foot home that is 10 miles from Boston is B15+B16*1,500+B17*10=$453,397.59. You must link directly to the values in order to obtain the correct answer.

Use the single variable regression model with house size as the independent variable to predict the selling price of a house that is 2,700 square feet. fx

The expected selling price of a 2,700 square foot home is B2+B3*2700=$702,972.54. You must link directly to the values in order to obtain the correct answer. (b2:inter/coeff b3:house size/coeff

Use the multiple regression model with house size and distance from Boston as the independent variables to predict the selling price of a house that is 2,700 square feet and 15 miles from Boston.

The expected selling price of a 2,700 square foot home that is 15 miles from Boston is B2+B3*2700+B4*15=$692,646.51. You must link directly to the values in order to obtain the correct answer. (b2:inter/coeff b3:house size/coeff b4:distance from boston

An airport shuttle company forecasts the number of hours its drivers will work based on the distance to be driven (in miles) and the number of jobs (each job requires the pickup and drop-off of one set of passengers) using the following regression equation: Travel time=-0.60+0.05(distance)+0.75(number of jobs) On a given day, Victor and Sofia drive approximately the same distance but Sofia has two more jobs than Victor. If Victor worked for 4 hours, for how long can the company expect Sofia to work? Please enter your answer rounded to one digit to the right of the decimal point. For example, if you think Sofia would work 236.7134 hours, enter 236.7.

The only difference between the workloads of the two drivers is the number of jobs each has; Sofia has two additional jobs. Therefore the company can expect Sofia to work the four hours Victor worked, plus an additional 0.75 hours for each of the two additional jobs, that is, 4+0.75(2)=5.5 hours.

Which of the following independent variables are significant at the p < .05 level?

The p-value column in the bottom table gives the significance level of each variable. The only p-values that are less than .05 are for the Intercept (which we do not assess for significance) and ERA. Thus, ERA is the only independent variable that is significant at p < .05. Note also that ERA is the only independent variable with a 95% confidence interval that does not contain 0. Significant: ERA Not significant: Runs, Strikeouts, Completed Games runs:0.01 era:-0.12 rest:0

The regression table below shows the relationship among selling price, distance from Boston, and lot size. Are both independent variables significant at the 5% significance level?

The regression table below shows the relationship among selling price, distance from Boston, and lot size. Are both independent variables significant at the 5% significance level?

The spreadsheet below contains data about the current and lagged variables from the pop-culture blogger's tweets and the number of followers she gained that week. Create a regression model to predict the number of followers from the current week, the previous week, and the two weeks prior. Be sure to include the residuals and residual plots in your analysis.

Thus, you should only select rows with complete data and leave the Labels box unchecked. From the Data menu, select Data Analysis, then select Regression. The Input Y Range is B4:B18 and the Input X Range is C4:E18. You must check the Residuals and Residual Plots boxes so that you are able to analyze the residuals.

Based on the following partial regression output table, from which the information on the coefficients' t-statistics and p-values has been removed, which of the independent variables are significant at the 95% confidence level? SELECT ALL THAT APPLY.

Variable A The 95% confidence interval for the variable's coefficient does not contain 0, which indicates that Variable A is significant at the 95% confidence level. The p-value (not shown) of Variable A, is 0.0001. Since it is less than 1-0.95=0.05, its value confirms that the variable is significant at the 95% confidence level. Variable D The 95% confidence interval for the variable's coefficient does not contain 0, which indicates that Variable D is significant at the 95% confidence level. The p-value (not shown) of Variable D, is 0.0028. Since it is less than 1-0.95=0.05, its value confirms that the variable is significant at the 95% confidence level.

Is the relationship between selling price and house size significant at the 95% confidence level?

Yes Since the p-value for the independent variable (house size), 0.0000, is less than 0.05, we can be confident that the relationship between price and house size is significant. Recall that the p-value for the intercept does not determine the significance of the relationship between the dependent and independent variable, so even though the p-value for the intercept is greater than 0.05, we can still say that the relationship between price and house size is significant.

Are the relationships between selling price and house size, and between selling price and distance both significant at the 95% confidence level?

Yes The p-values for the independent variables (house size and distance), 0.0000 and 0.0033, respectively, are less than 0.05, so we can be confident that the relationship between price, house size, and distance is significan

Multiple Regression (two or more independent variables)

sellingPrice=194,986.59+244.54(HouseSize)-10,840.04(DistancefromBoston)


Kaugnay na mga set ng pag-aaral

Chapter 7: Thinking and Intelligence

View Set

Ch 11 Systems Planning, Analysis, and Design

View Set

Chapter 12 Food Production and the Environment

View Set

ATI_Med-Surg_Renal & Reproductive Systems

View Set

Iggy ch. 3 common health problems of older adults - module 1

View Set