OIS 3440: Final Exam
The sign on the intercept coefficient in a simple regression model will always be the same as the sign on the correlation coefficient.
False
A study was recently performed by the Internal Revenue Service to determine how much tip income waiters and waitresses should make based on the size of the bill at each table. A random sample of bills and resulting tips were collected. These data are shown as follows: Total Bill Tip $126 $19 $58 $11 $86 $20 $20 $3 $59 $14 $120 $30 $14 $2 $17 $4 $26 $2 $74 $16 Based upon these data, what is the approximate predicted value for tips if the total bill is $100?
$20.61
A study was done in which the high daily temperature and the number of traffic accidents within the city were recorded. These sample data are shown as follows: High Temperature Traffic Accidents 91 7 56 4 75 9 68 11 50 3 39 5 98 8 Given this data the sample correlation is:
0.57
The following data for the dependent variable, y, and the independent variable, x, have been collected using simple random sampling: x y 10 120 14 130 16 170 12 150 20 200 18 180 16 190 14 150 16 160 18 200 Compute the correlation coefficient.
0.89
A cell phone company wants to determine if the use of text messaging is independent of age. The following data has been collected from a random sample of customers. Regularly use text messaging Do not regularly use text messaging Under 21 82 38 21-39 57 34 40 and over 6 83 To conduct a contingency analysis, the number of degrees of freedom is:
2
A forecasting model of the following form was developed:
3rd degree polynomial model
If a decision maker wishes to develop a regression model in which the University Class Standing is a categorical variable with 5 possible levels of response, then he will need to include how many dummy variables?
4
A cell phone company wants to determine if the use of text messaging is independent of age. The following data has been collected from a random sample of customers. Regularly use text messaging Do not regularly use text messaging Under 21 82 38 21-39 57 34 40 and over 6 83 Based on the data above what is the expected value for the "under 21 and regularly use text messaging" cell?
58
A cell phone company wants to determine if the use of text messaging is independent of age. The following data has been collected from a random sample of customers. Regularly use text messaging Do not regularly use text messaging Under 21 82 38 21-39 57 34 40 and over 6 83 To conduct a contingency analysis, the value of the test statistic is:
88.3
The following regression output is from a multiple regression model: The variables t, t2, and t3 represent the t, t-squared, and t-cubed respectively where t is the indicator of time from periods t = 1 to t = 20. Which of the following best describes the type of forecasting model that has been developed?
A complete third-order polynomial model
The following regression output is available. Notice that some of the values are missing. chapter 14c.jpg Given this information, what is the standard error of the estimate for the regression model?
About 1.98
A recent study by a major financial investment company was interested in determining whether the annual percentage change in stock price for companies is linearly related to the annual percent change in profits for the company. The following data was determined for 7 randomly selected companies: % Change Stock Price % Change in Profit 8.4 4.2 9.5 5.6 13.6 11.2 -3.2 4.5 7 12.2 18.4 12 -2.1 -13.4 Based upon this sample information, what portion of variation in stock price percentage change is explained by the percent change in yearly profit?
About 49 percent
Consider the following partially completed computer printout for a regression analysis where the dependent variable is the price of a personal computer and the independent variable is the size of the hard drive. Based on the information provided, what is the F statistic?
About 69.5
In analyzing the relationship between two variables, a scatter plot can be used to detect which of the following?
All of the above
To determine the aptness of the model, which of the following would most likely be performed?
All of the above
The following multiple regression output was generated from a study in which two independent variables are included. The first independent variable (X1) is a quantitative variable measured on a continuous scale. The second variable (X2) is qualitative coded 0 if Yes, 1 if No. chapter15f.jpg Based on this information, which of the following statements is true?
All of the above are true.
The following regression output is available. Notice that some of the values are missing. chapter 14c.jpg Given this information, what percent of the variation in the y variable is explained by the independent variable?
Approximately 57 percent
A multiple regression is shown below for a data set of yachts where the dependent variable is the price of the boat in thousands of dollars. chapter15e.jpg Given this information, what percentage of variation in the dependent variable is explained by the regression model?
Approximately 68 percent
Consider the following partially completed computer printout for a regression analysis where the dependent variable is the price of a personal computer and the independent variable is the size of the hard drive. Based on the information provided, what is the estimate for the standard error of the estimate for the regression model?
Approximately 690.50
The editors of a national automotive magazine recently studied 30 different automobiles sold in the United States with the intent of seeing whether they could develop a multiple regression model to explain the variation in highway miles per gallon. A number of different independent variables were collected. The following regression output (with some values missing) was recently presented to the editors by the magazine's analysts: chapter15d.jpg Based on this output and your understanding of multiple regression analysis, what is the critical value for testing the significance of the overall regression model at a 0.05 level of statistical significance?
Approximately F = 2.50
An industry study was recently conducted in which the sample correlation between units sold and marketing expenses was 0.57. The sample size for the study included 15 companies. Based on the sample results, test to determine whether there is a significant positive correlation between these two variables. Use an alpha = 0.05
Because t = 2.50 > 1.7709, reject the null hypothesis. There is sufficient evidence to conclude there is a positive linear relationship between sales units and marketing expense for companies in this industry.
The billing department of a national cable service company is conducting a study of how customers pay their monthly cable bills. The cable company accepts payment in one of four ways: in person at a local office, by mail, by credit card, or by electronic funds transfer from a bank account. The cable company randomly sampled 400 customers to determine if there is a relationship between the customer's age and the payment method used. The following sample results were obtained: Based on the sample data, can the cable company conclude that there is a relationship between the age of the customer and the payment method used? Conduct the appropriate test at the alpha= 0.01 level of significance.
Because x^2 = 50.3115 > 21.666, reject the null hypothesis. Based on the sample data conclude that age and type of payment are not independent
The editors of a national automotive magazine recently studied 30 different automobiles sold in the United States with the intent of seeing whether they could develop a multiple regression model to explain the variation in highway mileage per gallon. A number of different independent variables were collected. The following correlation matrix was developed chapter15c.jpg If only one variable were to be brought into the model, which variable should it be if the goal is to explain the highest possible percentage of variation in the dependent variable?
Curb weight
Residual analysis is conducted to check whether regression assumptions are met. Which of the following is not an assumption made in simple linear regression?
Errors are linearly related to x.
A business with 5 copy machines keeps track of how many copy machines need service on a given day. It believes this is binomially distributed with a probability of p = 0.2 of each machine needing service on any given day. It has collected the following based on a random sample of 100 days. X Frequency 0 28 1 38 2 22 3 7 4 or 5 5 Total 100 Given this information, assuming that all expected values are sufficiently large to use the classes as shown above, the critical value for testing the hypothesis will be based on 5 degrees of freedom.
False
A major car magazine has recently collected data on 30 leading cars in the U.S. market. It is interested in building a multiple regression model to explain the variation in highway miles. The following correlation matrix has been computed from the data collected: mileage, highway mileage, city Curb Weight cylinders Horsepower mileage, highway 1 mileage, city 0.857550598 1 Curb Weight -0.739110566 -0.70765104 1 cylinders -0.694837149 -0.866135056 0.596475711 1 Horsepower -0.549172956 -0.684199197 0.293202385 0.840347219 1 If only one independent variable (ignoring city mileage) is to be used in explaining the dependent variable in a regression model, the percentage of variation that will be explained will be nearly 74 percent
False
A major car magazine has recently collected data on 30 leading cars in the U.S. market. It is interested in building a multiple regression model to explain the variation in highway miles. The following correlation matrix has been computed from the data collected: mileage, highway mileage, city Curb Weight cylinders Horsepower mileage, highway 1 mileage, city 0.857550598 1 Curb Weight -0.739110566 -0.70765104 1 cylinders -0.694837149 -0.866135056 0.596475711 1 Horsepower -0.549172956 -0.684199197 0.293202385 0.840347219 1 The analysts also produced the following multiple regression output using curb weight, cylinders, and horsepower as the three independent variables. Note, a number of the output fields are missing, but can be determined from the information provided. chapter15b.jpg If the analysts are interested in testing whether the overall regression model is statistically significant, the appropriate null and alternative hypotheses are: H0 : β1 = β2 = β3 Ha : β1 ≠ β2 ≠ β3
False
A research study has stated that the taxes paid by individuals is correlated at a .78 value with the age of the individual. Given this, the scatter plot would show points that would fall on straight line on a slope equal to .78.
False
A study was recently conducted in which people were asked to indicate which new medium was their preferred choice for national news. The following data were observed: radio television newspaper under 21 30 50 5 21-40 20 25 30 41 and over 30 30 50 Given this data, if we wish to test whether the preferred news source is independent of age, the expected frequency in the cell, radio—under 21 cell is 30.
False
If any of the observed frequencies are smaller than 5, then categories should be combined until all observed frequencies are at least 5.
False
If two variables are highly correlated, it not only means that they are linearly related, it also means that a change in one variable will cause a change in the other variable.
False
In a goodness-of-fit test, when the null hypothesis is true, the expected value for the chi-square test statistic is zero.
False
In developing a scatter plot, the decision maker has the option of connecting the points or not
False
In multiple regression analysis, the residual is the absolute difference between the actual value of y and the predicted value of y.
False
The adjusted R2 value can be larger or smaller that the R2 values depending on the data set.
False
A recent study of 15 shoppers showed that the correlation between the time spent in the store and the dollars spent was 0.235. Using a significance level equal to 0.05, which of the following is the appropriate null hypothesis to test whether the population correlation is zero?
H0 : ρ = 0.0
Which of the following statements is true
If the confidence interval estimate for the regression slope coefficient, based on the sample information, crosses over zero, the true population regression slope coefficient could be zero
A multiple regression is shown for a data set of yachts where the dependent variable is the price in thousands of dollars. chapter15e.jpg Given this information, which of the following is true regarding the slope coefficient for Age, where Age represents how many years old the yacht is?
On average the price of the yacht falls by $1778 per year
The editors of a national automotive magazine recently studied 30 different automobiles sold in the United States with the intent of seeing whether they could develop a multiple regression model to explain the variation in highway miles per gallon. A number of different independent variables were collected. The following regression output is the result of using a forward selection stepwise regression approach. Based on the regression output, which of the following statements is true?
The R-square value increased when the second variable entered the model
The editors of a national automotive magazine recently studied 30 different automobiles sold in the United States with the intent of seeing whether they could develop a multiple regression model to explain the variation in highway miles per gallon. A number of different independent variables were collected. The following regression output is the result of using a forward selection stepwise regression approach. chapter15g.jpg Based on the regression output, which of the following statements is true?
The R-square value increased when the second variable entered the model.
It is believed that number of people who attend a Mardi Gras parade each year depends on the temperature that day. A regression has been conducted on a sample of years where the temperature ranged from 28 to 64 degrees and the number of people attending ranged from 8400 to 14,600. The regression equation was found to be y^ = 2378 + 191x. Which of the following is true?
The average change in parade attendance is an additional 191 people per one-degree increase in temperature.
The National Football League has performed a study in which the total yards gained by teams in games was used as an independent variable to explain the variation in total points scored by teams during games. The points scored ranged from 0 to 57 and the yards gained ranged from 187 to 569. The following regression model was determined: y^ = 12.3 + .12x Given this model, which of the following statements is true?
The average change in points scored for each increase of one yard will be 0.12
Use the following regression results to answer the question below. chapter 14b.jpg Which of the following is true?
The correlation between x and y must be approximately -0.8851.
What does the term expected cell frequencies refer to?
The frequencies computed from H0
Which of the following is not an indication of potential multicollinearity problems?
The sign on the standard error of the estimate is positive.
A bank is interested in determining whether its customers' checking balances are linearly related to their savings balances. A sample of n = 20 customers was selected and the correlation was calculated to be +0.40. If the bank is interested in testing to see whether there is a significant linear relationship between the two variables using a significance level of 0.05, the value of the test statistic is approximately t = 1.8516.
True
A complete polynomial model contains terms of all orders less than or equal to the pth order.
True
A goodness-of-fit test can be used to determine whether a set of sample data comes from a specific hypothesized population distribution.
True
A model is a representation of an actual system.
True
A multiple regression model of the form = B0 + B1x + B2x2 + ε is called a second-degree polynomial model.
True
A study has recently been conducted by a major computer magazine publisher in which the objective was to develop a multiple regression model to explain the variation in price of personal computers. Three independent variables were used. The following computer printout shows the final output. However, several values are omitted from the printout. chapter15a.jpg Given this information, the regression model explains just under 70 percent of the variation in the price of personal computers.
True
A study was recently done in which the following regression output was generated using Excel. SUMMARY OUTPUT Given this output, we would reject the null hypothesis that the population regression slope coefficient is equal to zero at the alpha = 0.05 level.
True
A survey was recently conducted in which males and females were asked whether they owned a laptop personal computer. The following data were observed: Males Females Have Laptop 120 70 No Laptop 50 60 Given this information, the sample size in the survey was 300 people.
True
Given a sample of data for use in simple linear regression, the values for the slope and the intercept are chosen to minimize the sum of squared errors.
True
If it is known that a simple linear regression model explains 56 percent of the variation in the dependent variable and that the slope on the regression equation is negative, then we also know that the correlation between x and y is approximately -0.75.
True
If the correlation coefficient for two variables is computed to be a -0.70, the scatter plot will show the data to be downward sloping from left to right.
True
In a multiple regression model, each regression slope coefficient measures the average change in the dependent variable for a one-unit change in the independent variable, all other variables held constant.
True
In a university statistics course a correlation of -0.8 was found between numbers of classes missed and course grade. This means that the fewer classes students missed, the higher the grade.
True
In order to apply the chi-square contingency methodology for quantitative variables, we must first break the quantitative variable down into discrete categories.
True
It is possible for the standard error of the estimate to actually increase if variables are added to the model that do not aid in explaining the variation in the dependent variable.
True
Residuals are calculated by e = y - ^y.
True
State University recently randomly sampled ten students and analyzed grade point average (GPA) and number of hours worked off-campus per week. The following data were observed: GPA HOURS 3.14 25 2.75 30 3.68 11 3.22 18 2.45 22 2.80 40 3.00 15 2.23 29 3.14 10 2.90 0 In this study the independent variable is the number of hours worked off campus per week.
True
The Conrad Real Estate Company recently conducted a statistical test to determine whether the number of days that homes are on the market prior to selling is normally distributed with a mean equal to 50 days and a standard deviation equal to 10 days. The sample of 200 homes was divided into 8 groups to form a grouped data frequency distribution. If a chi-square goodness-of-fit test is to be conducted using an alpha = .05, the critical value is 14.0671.
True
The following regression model has been computed based on a sample of twenty observations: = 34.2 + 19.3x. Given this model, the predicted value for y when x = 40 is 806.2.
True
The scatter plot is a two dimensional graph that is used to graphically represent the relationship between two variables.
True
To check out whether the regression assumption involving normality of the error terms is valid, it is appropriate to construct a normal probability plot. If this plot forms a straight line from the lower left-hand corner diagonally up to the upper right-hand corner, the error terms may be assumed to be normally distributed.
True
To describe variable credit status that has three levels: Excellent, Good, and Poor, we need to use two different dummy variables.
True
To employ contingency analysis, we set up a 2-dimensional table called a contingency table.
True
When a pair of variables has a positive correlation, the slope in the regression equation will always be positive.
True
You are given the following sample data for two variables: Y X 10 100 8 110 12 90 15 200 16 150 10 100 10 80 8 90 12 150 The sample correlation coefficient for these data is approximately r = 0.755.
True
Which of the following statements is correct? A scatter plot showing two variables with a positive linear relationship will have all points on a straight line. The stronger the linear relationship between two variables, the closer the correlation coefficient will be to 1.0. Two variables that are uncorrelated with one another may still be related in a nonlinear manner. All of the above are correct.
Two variables that are uncorrelated with one another may still be related in a nonlinear manner.
Standard stepwise regression
combines attributes of both forward selection and backward elimination.
In a chi-square contingency analysis, when expected cell frequencies drop below 5, the calculated chi-square value tends to be inflated and may inflate the true probability of ________ beyond the stated significance level.
committing a Type I error
The degrees of freedom for the chi-square goodness-of-fit test are equal to ________, where k is the number of categories.
k - 1
Based on the residual plot below, which of the following is correct? The above residual plot shows:
linearity and constant variance.
The assumption that the errors or residuals are independent is best checked by:
looking at a plot of the residuals versus time and checking for trends
Given the data below, one ran the simple regression analysis of Y on X. Y X 4 2 3 1 4 4 6 3 8 5 The relationship between Y and X is
not significant at the alpha = 10 percent level.
Interaction exists in a multiple regression model when:
one independent variable affects the relationship between another independent variable and the dependent variable.
For a chi-square test involving a contingency table, suppose H0 is rejected. We conclude that the two variables are:
related.
A standardized residual is:
the ratio of each residual divided by an estimate for the standard deviation of the residuals.
A study published in the American Journal of Public Health was conducted to determine whether the use of seat belts in motor vehicles depends on ethnic status in San Diego County. A sample of 792 children treated for injuries sustained from motor vehicle accidents was obtained, and each child was classified according to (1) ethnic status (Hispanic or non-Hispanic) and (2) seat belt usage (worn or not worn) during the accident. The number of children in each category is given in the table below. Hispanic Non-Hispanic Seat belts worn 31 148 Seat belts not worn 283 330 Referring to these data, which test would be used to properly analyze the data in this experiment?
x2 test for independence in a two-way contingency table