ECON 203 FINAL
Exponential Smoothing is used to forecast
1 period ahead
When calculating the 8 period centered moving avg for a time series consisting of 100 days of gas prices, what is the # of periods for which u will not be able to obtain smoothed values?
8
Which of the following statements is (are) generally true assuming a mound shaped distribution?
A positively skewed distribution could be centered on zero The mean, median and mode are never the same if the distribution is skewed in either direction A negatively skewed distribution will tend to have more extreme values on the left side of the distribution The mean of a positively skewed distribution will be greater than its median The mean of a positively skewed unimodal distribution will be greater than its mode
Independent random samples of size n1= 5, n2= 5 produce 2 means and 2 SD's. Assume unequal variances. The test stat for testing this claim will be:
A t distribution
In a simple linear regression, if you were asked to perform the overall validity of the model, you would
all of the above are equivalent
Smoothing methods ______ applicable to the time series bcus the size of new one-family houses in the South has an _________.
are not; upward linear trend
There seems to be bigger variation in the weight among males than the females. To test this claim, what should you use?
F test for the difference in variances
University officials claim that 19.6% of U of Lima students smoke. The student senate disagrees.
Midterm 1 Spring 2013 301
Quality control test
Midterm 1 Spring 2013 304
You decide to go to London as an exchange student for a yr
Midterm 1 Spring 2014 267
Your friend is convinced that restaurants located on Green st near campus earn the same as those downtown
Midterm 1 Spring 2014 276
A researcher wants to estimate whether commuting time increases when public transport does not work in a city
Midterm 1 Spring 2014 277 Midterm 1 Fall 2013 293
A recent news article claims that over 70% of smart phone owners are "heavy data users."
Midterm 1 Spring 2014 278
In order to remove the random variation from time series data we can:
Use exponential smoothing Use moving averages
List the steps involved in p-value calculation in the correct order.
calculate the test statistic draw arrow from test statistic thru nearest rejection region If H1 is two-sided, duplicate on the other side of the distribution
Smoothing methods
moving averages, weighted moving averages, & exponential smoothing
population parameter
p
Type 2 error
probability of not rejected the null when it's false Ho is false and we do not reject Ho
Small ω
provides a lot of smoothing.
The difference btwen y & y^ for a particul. sample point (observation) is called a _________
residual
Which of the following are measures of association?
sample correlation coefficient, sample covariance
A cycle is a
wavelike pattern describing a long term behavior (for more than one year). Cycles are seldom regular, and often appear in combination with other components.
It has been computed that the 95% confidence interval is [144.4, 154.2] for the average exam score when a student spent 10 hours on average per week studying for the class. The 99% prediction interval for a student who spent 10 hours on average per week studying for the class will be
wider
Recomposition
will recreate the time series, based only on the trend and seasonal components—you can then deduce the value of the random component.
Which is true?
z .025 < t .025,30 < t .025,3
The variable that u are predicting or explaining
Dependent Variable
What regression violation wud u use a plot of the residuals vs the predicted score y^ (using XY (Scatter) in the Chart Wizard) to check?
Heteroskedasticity
Partial F-test for # of homicides
Ho: B3 = B4 = 0, H1: At least one Bi does not = 0 Fall 2013 Final
Assume you want to test whether the pop. variances were equal. Which of the following are true?
Ho: The variances are the same & H1: The variances are different.
Assume you want to test whether the pop. variances were unequal. Which of the following are true?
Ho: The variances are the same & H1: The variances are different.
You believe that the budgets of movies that win Oscars are no different from those who do not win.
Midterm 1 Fall 2013 292
Yell, a computer manufacturer, claims that their new ultra light notebook's battery can last 10 hours or more
Midterm 1 Fall 2013 294
A drug company wants to test the effectiveness of a new pill to reduce blood pressure. Midterm 1 Fall 2013 286
T=2.406 LCL= -.089 After new study, the reduction in blood pressure of the new drug is not larger than 1.5 at 5%
Autocorrelation can be a potential problem in a
time series
Line with most jagged appearance: brown circle symbol line must be the
time series.
In a linear regression in general, the test statistic for testing whether the slope of the regression line is non-zero is the same as the test statistic for testing whether the correlation coefficient is zero or not?
False
In a simple linear regression, the test statistic for testing the whether the slope of the regression line is non-zero is the same as the test statistic for the overall test of the model?
False
If avoiding large errors is important _________ shud be used.
MSE
In testing a hypothesis, statements for the null and alternative hypotheses as well as the selection of the level of significance should precede the collection and examination of the data.
True
If you removed the outlier from data set 1, wat wud u say about the relationship btwn the remaining X & Y values?
The slope of the regression line wud be undefined.
In a simple linear regression, the p-value for testing whether the slope of the regression line is non-zero is the same as the p-value for testing whether the correlation coefficient is non-zero?
True
In a simple linear regression, the p-value for testing whether the slope of the regression line is non-zero is the same as the p-value for the overall test of the model?
True
In a simple linear regression, the test statistic for testing whether the slope of the regression line is non-zero is the same as the test statistic for testing whether the correlation coefficient is non-zero?
True
Considering tha dL=.610 and dU=1.4, wat wud u conclude
there is a negative autocorrelation
The portion of the universe that has been selected for analysis is called
a sample
A manager of a company that produces tires is worried that the 4 company plants are not manuf. tires w/ exactly the same specifications and is especially concerned about the width of each tire. She gets a sample of 6 tires produced by each plant & records the difference in tenths of a millimeter btwn the width of each tire & the guideline. Can she conclude that the tires produced in the 3 plants differ? To help the manager w/ this concern, u wud test for the ___________ using the Tools > Data Analysis tool ___________.
difference in means; Anova: Single Factor Factor levels: Plants 1,2 & 3 Ho: m1 = m2 = m3 and H1: at least 2 m's differ Critical Value=FINV(.05,3,20) Approximate center of distribution= 1
In a moving avg
each observation receives the same weight.
It is claimed that students at the University of Illinois read more newspapers, on average, than students at the University of Michigan. If we were to test the claim, we would have
independent samples
We want to test whether juniors in the business school have higher average grades than sophomores. We collect data on 100 sophomores and 95 juniors. In this case, we have
independent samples
Marketing mix
refers to the different components that can be controlled in a maerketing strategy to increase sales or profit. The name comes from a cooking-mix analogy used by Nell Borden in his 1953 presidential address to the American Marketing Association.
Which of the following are true?
If number of new businesses started & population density are removed from the regression model, R^2 can never increase
Which of the following is true regarding covariance?
If the two variables move in the same direction, the covariance is a positive number
Which of the following is true about analysis of variance (ANOVA)?
It conducts hypothesis test about population means using the F distribution
Trend component of a time series
Its gradual shift to higher or lower values over a long period of time, such as 10 or 20 years.
If avoiding large errors is not important _________ shud be used.
MAE
Construct a 95% conf interval for the difference in means (m1 - m2)
(.0082, 1.3918) Midterm 1 Spring 2013 297
If t stat > critical
Fail to reject
The moving average method
does not provide smoothed values (moving average values) for the first and last set of periods. the moving average method considers only the observations included in the calculation of the average value.
In a weighted moving avg
the weights on the observations are not equal
In ANOVA, the total amnt of variation WITHIN samples is measured by
the sum of squares of error (SSE)
Serious multicollinearity can be a potential problem when
there are multiple independent variables. If correlation is less than .8, there is no serious multicollinearity
You are told that the 95% conf interval for the return of a stock (-.10;.16). Which cud represent a 90% conf interval for the return of this stock w/ the same sample data?
(-.08; .14) Midterm 1 Fall 2013 282
When we computed the sample correlation coefficient between sale of rollerblades and the average daily temperature of a quarter, we got a value of r is: 0.5, then which of the following CANNOT be a value of the sample covariance?
-1
Consider the following conditions that can change a conf interval for a single pop mean w unknown pop SD. Which will increase the width of the conf interval holding all other variables constant?
1) Sample size decreases 2) Level of conf increases (going from 95% to 90%) 3) Sample SD increases Midterm 1 Spring 2014 279
You would like to predict cigarette consumption for the year 2005. You know that you can use exponential smoothing to create forecasts when the time series exhibits gradual (not a sharp) trend, has no cyclical effects, and has no seasonal effects, and when you are forecasting only one period beyond the existing data. You are given the data set on the cigarette consumption from 1965 to 2004. Answer the following questions.
1. From the scatter plot you can conclude that: the series exhibits gradual (not a sharp) trend the series does not appear to have a cyclical component the series does not appear to have seasonal component time series forecasting with exponential smoothing is appropriate 2. Using the smoothing constant of 0.8, the forecasted value for the cigarette consumption in 2005 is 2703.131459
What are the procedures to detect violations of the regression assumptions?
1. Non-normality of the error term is detected by examining the histogram of standardized residuals 2. Heteroskedasticity is detected by plotting residuals against the independent variable 3. Non-independence of the error term is detected by plotting residuals against time 4. Outliers are detected by plotting a scatter diagram of the y-variable against the x-variable
A statistics professor wanted to know whether time limits on quizzes affected the marks on the quiz. Accordingly, he took a random sample of economics statistics students and split them into five groups of 20 students each. All students took a quiz that involved simple manual calculations. Each group was given a different time limit. Group 1 was limited to 40 minutes; group 2, 45 minutes; group 3, 50 minutes; group 4, 55 minutes; and group 5, 60 minutes.
1. The algebraic form of estimated equation of your model is SCORE = -2.2 + 0.55 TIME. 2. The value of the coefficient of deter is 0.743974422 . 3. From the residual analysis, we can conclude that the errors are tolerably normal . 4. Also, you want to check if the required assumption #2(Homoscedasticity) is violated.We can conclude that errors have heteroscedasticity. 5. The algebraic form of estimated equation of your model is LNSCORE = 2.129582054 + 0.021715898 TIME 6. Transformation of the dependent variable has increased the coefficient of determination. The estimated score on the quiz of a student that gets 55 minutes to solve the quiz is: 27.77000323
Meico insurance company offers car insurance contracts by directly contracting potential customers thru phone calls. Sales agents make these phone calls & describe insurance policies. Normal distrib w a mean of 15 thousand & SD of 5 thousand. If Meico wants to improve sales agents incentives by offering a bonus to agents selling more than 25 thousand weekly, what proportion of sales agents qualify for this bonus?
2.5% (1-.95=.05) (.05/2)= .025 Midterm 1 Spring 2013 296
The manager of a chicken farm wants to investigate the possibility of replacing the current chicken food w a new generation of heavier chickens. There are two brands A & B and 2 pops. What is the value of the test stat needed to make a decision about the equality of the 2 pop variances? Which category of hypothes is most appropriate to test
1.710 t-test for 2 pop difference in means w EQUAL unknown pop variances (If F-stat < critical, you fail to reject = equal variances) Midterm 1 Spring 2013 296
What percentage of variation of the dependent variable is explained by the model? a. 0.87 b. 19.196 c. 85.5 d. 87 e. 93.3
D
What is the value of test stat needed to prove brand A is superior?
2.016 Midterm 1 Spring 2013 297
Client earning or losses at a casino are humped shaped w a mean of -20 and var of 100. What % of customers leave the casino not losing money?
2.5%
Weekly pizza sales of Pizza Hut and those of its main competitor (Papa Dell's) were recorded for 1 year. You are given the following graphs that were obtained by running a simple linear regression of Pizza Hut's sales on Papa Dell's sales. Which assumption is most likely to be violated given the above graphs? a. heteroskedasticity b. positive autocorrelation c. negative autocorrelation d. non-normality e. everything seems right
29
the blue square symbol line must be the
3 month moving avg
A random sample of variable X has a positively skewed distribution. Which of the following is (are) correct, if the largest observation of X is dropped from this sample? (i) Sample mean of X will decrease. (ii) Sample median of X might decrease. (iii) Sample mode of X will decrease. (iv) Extreme values cannot affect sample statistics. Select one: a. Only (i) and (ii) b. Only (i) Incorrect c. Only (i), (ii) and (iii) d. Only (iv) e. None of the above
A
In your senior year you start looking for a job. You have broken down potential locations into three categories based on the population - large, medium and small. You are convinced that the starting salary in a large sized city will be greater than the starting salary in both a small sized city and a medium sized city, so you collect data on 65 cities total and perform an ANOVA to determine if there is some difference in pay among the three groups. The data is provided here. What is the null and alternative hypothesis? Select one: a. H0: μ1 = μ2 = μ3 vs HA: At least two μ's differ from one another b. H0: μ1 = μ2 = μ3 = 0 vs HA: At least one μ differ from 0 c. H0: μ1 - μ2 - μ3 = 0 vs H1: μ1 - μ2 - μ3 ≠ 0 d. H0: μ = -0 vs H1: μ ≠ 0 e. None of the above
A
Joe and Josh each got the same Sudoku puzzle book and recorded the time it took for each of them to do all 100 puzzles in the book. Assuming these puzzles are a random sample of all Sudoku puzzles, they proceed to perform a test to see if they can tell which of them is faster at Sudoku. Select one: a. t-test for difference in population means with dependent samples b. z-test for difference in population means with independent samples c. t-test for difference in population means with equal variances d. One-way ANOVA e. t-test for difference in population means with unequal variances
A
What is the most appropriate interpretation of a type II error? a. A type II error is when you incorrectly do not reject the null hypothesis Correct b. A type II error is when you correctly reject the null hypothesis c. A type II error is does not exist in statistical decision making d. A type II error is when you incorrectly reject the null hypothesis e. A type II error is when you correctly do not reject the null hypothesis
A
When using the analysis of variance (ANOVA) technique, you test for: Select one: a. the difference in means between two or more populations b. the difference in proportions between two or more populations c. the difference in variances between two or more populations d. only a and c Incorrect e. none of the above
A
Which of the following strategies removes more random variations from certain time series data (resulting in a smoothing series)?
A 4 period centered moving avg
To compare the compensation rates of female & male executives, each female in a random sample of executives is paired w a male executive in a comparable position. The study uses a ________ design.
A matched sample
You decided to predict gasoline prices in different cities and towns in the United States for your modelling project. Your dependent variable is price of gasoline per gallon and your independent variables are per capita income (in dollars), number of firms manufacturing parts of automobiles in and around the city, number of new businesses started over the last year, population density of the city (in 100's of persons per sq. mile), percentage of local taxes on gasoline (in %), and the number of people using public transportation per 100 people. Note: the numbers below are randomly generated and will likely change when a new quiz is loaded. You collected a sample of 33 cities and obtained an SSR= 137.1484. Which of the following statements is (are) true if Se2 is 3.5602 ?
Adjusted R2=0.5041 If number of new businesses started and population density are removed from the regression model, R2 can never increase.
Which of the following statements are correct with regard to the coefficient of determination and the adjusted coefficient of determination?
Adjusted coefficient of determination can have a negative value while the coefficient of determination can never be less than zero. Coefficient of determination can never decrease when we introduce more independent variables into a regression model.
Single-factor ANOVA is ______________ test w/ a rejection region _____________.
Always a one-tailed; on the right end of the F distribution
The returns of 2 portfolios were recorded for 11 years. Portfolio 1 variance= 344, Portfolio 2 variance=111. Can you conclude at a 5% sig level that porfolio 1 is riskier (has a higher variance) than portfolio 2? The test stat for testing this claim will be:
An F distribution w/ 10 & 10 DOF. (Ratio of variances)
You believe that ppl in the US spend about the same on clothing than ppl in 5 European countries do. To test ur belief, u collect data on amnt spent on clothing ($) for over 300 individuals in all 6 diff countries (including the US). Which test can u use?
An F-test for the difference in means, or ANOVA (difference in means among more than 2 pops)
Which of the following is true regarding a t distribution?
As sample size tends to infinity, its identical to the z distribution
Suppose that you are trying to figure out whether Martians spend more when they're out carousing with their buddies than when they party alone. You gather data on the spending of four groups of Martians who went out last weekend. (The Martian currency is the zorkmid.) What is the estimated slope coefficient? a. 1.25 b. 0.8 c. 1.14 d. -.8 e. -1.14
B
Suppose you have your eye on a two-bedroom apartment that has 2 bathrooms, its condition is 3, the quality of the neighborhood is 4, and it is located 1 mile from the Quad. How much should you expect to pay in rent? a. $272.414 b. $900.92 c. $968.897 d. $1036.874 e. $1358.228
B
Which of the following is a violation of the assumptions of the multiple linear regression model? a. ε follows a normal distribution with mean 0 b. the error terms are heteroskedastic c. the errors are independent over time d. the errors are not correlated with each other e. no serious multicollinearity is present in the independent variables
B
Suppose that you are testing the hypothesis that the slope coefficient is significantly different from zero. What is the probability of getting a test statistic at least as extreme as 28.250, assuming that the null hypothesis were true? a. 1- (1.88E-11) b. 1.88E-11 c. (1.88E-11)/2 d. 0.004 e. there is not enough information given to determine this
B, C
Given the following information, what do you conclude from the Durbin-Watson test for the presence of autocorrelation? n=35 k=7 dl=.857 du = 1.757 d= 1.65 α =.02 a. conclude that there is evidence of negative autocorrelation in the residuals b. conclude that there is evidence of positive autocorrelation in the residuals c. the test is inconclusive d. there is no evidence of autocorrelation in the residuals e. there is strong evidence of autocorrelation in the residuals
C
In a multiple linear regression, which of the following would be the best criterion for judging the overall validity of the model? a. SSR/SST > SSE/SST b. adjusted R2 >0.8 c. the p-value associated with the F-test is less than 0.05 d. R2 >0.8 e. the largest p-value associated with the individual t-tests is less than 0.05
C
Suppose that in a simple linear regression, the sample mean for the dependent variable is 1.5, and the sample mean for the independent variable is 4.2. You calculate your estimate for the slope coefficient and find that it's 0.7. The sample standard deviation of the independent variable is .3 and the sample covariance between the variables is 2. What is your estimate for the intercept of the regression line? a. -4.8 b. 3.15 c. -1.44 d. 22.22 e. 6.66
C
A group of 12 security analysts provided estimates of the yr 2001 eps of Qualcomm, Inc. You have to test whether the pop. variance is less than .01. The test stat for testing this claim will be:
Chi-squared w/ 11 DOF
A company produces electric devices operated by a thermostatic control. The variance of temperature shud not exceed 3. For a random sample of 20, the sample variance of operating temperatures was 2.4. The test stat for testing this claim will be:
Chi-squared w/ 19 DOF
One way to evaluate a teaching assistant's effectiveness is to examine the scores achieved by his or her students at the end of the semester. Mean score is of interest. however, variance also contains useful info. Variation=300 and class of 20, whose test scores had a variance of 380. The test stat for testing this claim will be:
Chi-squared w/ 19 DOF
Armani Pizza. You are given values of SSR=81702.499 and SSE=6331.901, and you know that there is a total of 10 observations. Based on these numbers, the
Coefficient of Determination is 0.928074696 and the Error of the Estimate is 28.13338986
A measure of how well the model fits the data.
Coefficient of determination (r^2)
The traditional long-term, often irregular variation that's present in many economic & financial data series is best described by______
Cyclical component of the time series
Evaluate the overall validity of the model by performing the F-test. Knowing that the relevant critical value is 2.467, what is your conclusion? a. the F statistic is 0.055 and the linear regression model is valid b. the F statistic is 0.055 and the linear regression model is not valid c. the F statistic is 3.999 and the linear regression model is valid d. the F statistic is 18.172 and the linear regression model is valid e. the F statistic is 18.172 and the linear regression model is not valid
D
Rely on your recollection of Project II for the context and data used in this question. What is the coefficient on FOURBDRM? a. 0.314 b. 3.179 c. 12.732 d. 108.534 e. 110.879
D
Although often unpredictable, cycles need to be isolated. To identify cyclical variation we use the percentage of trend.
Determine the trend line (by regression). Compute the trend value \hat{y}_{t} for each period t. Calculate the percentage of trend by (y_{t}/\hat{y}_{t})x100
If you wanted to test whether the correlation coefficient (ρ) between childs' education and mothers' education was statistically different from zero, what information would you use? a. t-stat = 7.953, p-value = 3.57E-12 b. t-stat = 2.915, p-value = .004 c. t-stat = 1.126, p-value = 0.263 d. F-stat = 10.147, p-value = 7.23E-06 e. this test cannot be performed with the given information
E
The correct procedure to test for the existence of heteroskedasticity is by: Select one: a. Calculating the Durbin-Watson test statistic b. A histogram of the standardized residuals c. A scatter plot of residuals vs time d. Calculating the correlation between independent variables e. A scatter plot of the standardized residuals vs predicted returns
E
How do you check the normality of errors assumption?
Examine the histogram of standardized residuals.
A supermarket is offering a bonus prize to any cashier who is significantly more productive than the rest
Final Spring 2014 394
Actual values of time series
Fluctuate around its trend due to a cyclical, seasonal, and irregular factors
Seasonal component of time series
Fluctuations that occur fairly regularly within each year
Equal Variances
If F-stat < critical, you fail to reject
Unequal Variances
If F-stat > critical, you reject
Which of the following statements is(are) true about the matched pairs t-test?
If a lot of variability is introduced into the system, then matched pairs should be used. A disadvantage of using matched pairs is that the design of the experiment does make it more expensive.
A time series can consist of four components.
Long - term trend (T). Cyclical effect (C). Seasonal effect (S). Random variation (R).
A hooligan walking erratically down the street tells u that the proportion of games won by Arsenal is at least 15% higher than Chelsea. Which graph?
Midterm 1 Spring 2014 268
As a farmer in Champaign, u want to plant the corn that will give u the highest yield per acre. compute MST, f stat, critical values
Midterm 1 Spring 2014 268
December Effect
May reflect stock buyers' anticipation of the January effect. In recent years, there has been a rise in stock prices in the last week of December, btwn Christmas & New Year holidays
The workers at a factory are discussing whether or not to unionize and hire u to evaluate
Midterm 1 Spring 2014 270
As a supervisor for the Food Safety Administration, you collect data on the # of violations found for a random sample of restaurants from cities to compare to towns
Midterm 1 Fall 2013 288
What regression assumption wud u use a histogram of the residuals (using Data Analysis > Histogram) to check?
Normality of the errors
Which of the following is FALSE regarding the seasonal index?
None. they are all true pg 367 Final Fall 2014
Suppose you want to know if a training program that your company (an automobile factory) has introduced recently has had any effect on worker productivity. To test this you decide to gather data from both before and after the training session on the workers that have received the training. The data you gather is monthly data measured in terms of average cars per day per worker. Which of the following tests should be performed given only this information?
Paired sample t-test for difference in means
Which of the following is used to identify cyclical variation in time series?
Percentage of Trend
11. In a linear regression in general, if you were asked to perform the overall validity of the model, you would
Perform the F-test
Which is NOT a part of a "generalized procedure" for developing a multiple regression model?
Proceed with model estimation even if the model lacks a theoretical basis.
The 4 P's of marketing
Product, price, place (or distribution), and promotion (composed by E. Jerome McCarthy in 1960. Variables related to the 4 P's are called marketing mix variables
You have run a model with 9 independent variables and then through the modeling process decide to exclude 4 of the original independent variables
R^2 will not increase, however u are unsure wat will happen to adjusted R^2 and Se
The equation that describes the relationship btwn the expected value of the dependent variable & the independent variable.
Regression Equation
If t stat < critical
Reject
Which of the following is (are) not a typical way to exclusively remove random variations from time series data?
Seasonal Adjustment Trend Line Estimation
measuring seasonal effects
Seasonal variation may occur within a year or even within a shorter time interval. To measure the seasonal effects we construct seasonal indexes. Seasonal indexes express the degree to which the seasons differ from one another. There are numerous methods to calculate seasonal indexes. We'll use a computationally easy approach: Calculate a trend line Take the actual/predicted value Average the results by time period
Any variable that is measured over time in sequential order is called a time series. We analyze time series to detect patterns. The patterns help in forecasting future values of the time series.
TRUE
The trend component of a time series can be linear or non-linear. It is easy to isolate the trend component using linear regression. For linear trend use the model y = β0 + β1t + ε. For non-linear trend with one (major) change in slope use the quadratic model y = β0 + β1t + β2t2 + ε
TRUE
Which of the following are measures of the linear relationship between two variables?
The covariance The coefficient of correlation
Which of the following is NOT a consequence of serious multicollinearity?
The SSE increases in value
Suppose you have a mound shaped, symmetrical distribution & now we add some extreme values on the LOWER tail. Which is true
The distrib wud be skewed to the LEFT
Which of the following is FALSE?
The mode changes when adding an extreme value to the right of the dsitribution
Which of the following are the characteristics of the population distribution?
The pop. distribution is centered on m. The pop. distrib. has a SD of 0.
Which of the following are the characteristics of the population distribution?
The population distribution is centered on m The population distribution has a standard deviation of o
Suppose we made an interval estimate for the mean of the population such as: [124.23, 175.34]. If we realize that the true population mean is 121.32, what should we conclude?
The procedure can still be valid, since we allow for a certain amount of error.
The Mean Absolute Deviation (MAD) of a set of data is the average distance between each data value and the mean.
The steps to find the MAD include: 1. find the mean (average) 2. find the difference between each data value and the mean 3. take the absolute value of each difference 4. find the mean (average) of these differences
Normalizing the ratios:
The sum of all the ratios must be 4, such that the average ratio per season is equal to 1. If the sum of all the ratios is not 4, we need to normalize (adjust) them proportionately. Suppose the sum of ratios equaled 4.1. Then each ratio will be multiplied by 4/4.1
The president of Tastee Inc, a baby food producer claims that her company's product is superior to her competitor, bcus babies gain weight faster w her product. To test this claim, a survey was taken. Mothers were asked which baby food they were gunna feed their babies and asked to keep track over next 2 months. 15 moms wud feed their babies Tastee & 12 said leading competitor. Test whether Tastee's president's claim is valid (do babies raised on Tastee weigh more on avg? What is the FIRST appropriate test u need to perform in excel?
Tools > Data Analysis > F-test 2 sample for variances. Then perform t-test assuming equal variances (pooled procedure) to test whether mean of Tastee is higher than competitor.
Suppose you want to know if a training program that your company (an automobile factory) has introduced recently has had any effect on worker mistakes. To test this you decide to gather data from both before and after the training session on the workers that have received the training. The data you gather is monthly data measured in terms of percentage of workers making mistakes during that month. Which of the following tests should be performed given only this information?
Z test for difference in proportions
In a random sample of 200 likely voters, 115 indicated that they believed President Bush was doing a good job as president. If we wanted to test whether more than 50% of likely voters believe he is doing a good job, what test would we use?
Z-test for population proportion
A one unit increase in the independent variable is associated with
a change in the dependent variable equal to b1(slope coeff)
A trend is
a long term relatively smooth pattern or direction, that persists usually for more than one year.
Random variation
comprises the irregular unpredictable changes in the time series. It tends to hide the other (more predictable) components.
One of the econ students decided that his grades are not good enough, but does not know how to improve them. He talked to one of his professors and the professor suggested that it really depends on how much one studies. So, the student collected data on 100 economics students and asked them how much they studied before an exam and what grade they got on that exam. Data is stored here. Run a regression where TIME is the independent variable and MARK is the dependent variable. Which of the follwing assumptions seems to be violated?
homoskedasticity
The process of using sample statistics to draw conclusions about population parameters is called
inferential statistics
The presence of a second-order term lifts the linearity restriction. (If the 2nd order term is the highest order term for the variable,
its curvillinear relationship w the dependent variable is restricted to a parabola
MAE is
the avg of the absolute values of the forecast errors
A regression model places restrictions on the relationships btwn
the dependent variable & each of the independent variables.
When performing an analysis of variance (ANOVA) test, u obtain a test stat of .6. Which of these if FALSE?
the test will tell which pairs of samples have diff means Midterm 1 Fall 2013 282
SSE
total amnt of variation within samples
Suppose you are smoothing random variation in quarterly US GDP data using the exponential smoothing technique and are debating btwn w=.4 & w=.8. What would be the difference btwn the resulting smoothed series s1 and s2?
w1<w2 implies that s1 will be smoother than s2 bcus s1 weighs actual observations less heavily than s2
A way to test difference in means between more than two groups is by using ANOVA. An equivalent test can be performed by use of regression analysis by using dummy variables for the different groups. You are asked to test whether the average temperature between four beach destinations is different (Cancun, Punta Cana, Bora Bora and Ibiza). Create the dummy variables omitting Cancun and regress Temperature on the dummies using this data. [link: beach.xls] What is the p-value of the test of overall significance of the model? Select one: a. 0.0137 b. 0.4260 c. 0.0252 d. 0.0274 e. 0.3443
A
Random samples from two binomial populations yielded the following statistics: The first sample proportion is 0.45, from a sample size of 100. The second sample proportion is 0.39 from a sample size of 100. Perform a two-sample z-test for proportions to test if the first population proportion is different than the second population proportion.
1. The value of the test statistic is: 0.859602383 2. The p value is: 0.390008262 3. At the 10% significance level we would fail to reject the null hypothesis and conclude there is insufficient evidence to claim that the population proportions differ . 4. For a 90% CI for the difference in population proportions, the lower and upper bounds are -0.054598028 and 0.174598028 respectively.
All Canadians have government-funded health insurance, which pays for any medial care they require. However, when traveling out of the country, Canadians usually require supplementary health insurance to cover the difference between the costs incurred for emergency treatment and what the government program pays. To cover for the difference, private insurance companies charge flat-rate weekly rates, regardless of age. However, they realized that older people frequently incur greater medical emergency expenses and decided to offer rates based on the age of the customer. To help determine the new rates, one insurance company gathered data covering the age and mean daily medical expenses of a random sample of 1348 Canadians.
1. Even before you analyze the residuals you know that this is cross sectional data and therefore, it cannot suffer from autocorrelation. 2. Non-normality of the error term is a problem in this distribution? true 3. Variance of the error term in this regression seems to be homoskedastic? false
A research project employing 22,000 American physicians was conducted to discover whether aspirin can prevent heart attacks. Half of the participants in the research took aspirin, and half took a placebo. In a three year period, 104 of those who took aspirin and 189 of those who took placebo had had heart attacks. Is aspirin effective in preventing heart attacks?
1. Our best estimate of the probability of having a heart attack for those that do not take aspirin is: 0.017181818 2. The value of the pooled proportion is: 0.013318182 3. The value of the test statistic is: -4.99915443 4. The p value is: 2.87912E-07 = 2.87912e-07 5. At the 5% significance level we would reject the null hypothesis and conclude that aspirin can prevent heart attacks . 6. For a 95% CI for the difference in population proportions, the lower and upper bounds are -0.010755099 and -0.004699446 respectively.
The President of Tastee Inc, a baby-food producer, claims that his company's product is superior to that of his leading competitor, because babies gain weight faster with his product. To test this claim, a survey was undertaken. Mothers of newborn babies were asked which baby food they intended to feed their babies. Those who responded Tastee or the leading competitor were asked to keep track of the babies' weight gains over the next two months. There were 15 mothers who indicated that they would feed their babies Tastee and 25 who responded they would feed their babies the product of the leading competitor.
1. The (first) appropriate test that he would have to perform by using excel is Tools --> Data Analysis --> F-Test Two-Sample for Variances 2. What's the p value for this test? 0.304982807 3. At the 10% significance level, the decision would be to fail to reject the null hypothesis and proceed with the t-test for the difference in means assuming equal variances 4. p value for the t test? 2.22655E-05 = 2.22655e-05 5. At the 5% level of significance, you would fail to reject the null hypothesis, concluding that there is not enough evidence to claim that babies using Tastee gain more weight than babies using the product of the leading competitor? false 6. For a 95% CI for the difference in the mean weight of the two products, the lower and upper confidence limits are 3.05324106 and 8.093425607 respectively.
An investor who wants to invest money for his retirement plan is trying to determine whether mutual fund "Safe Squared" is safer (less risky) than mutual fund "You Bet". His judgement is to be based on the weekly return data for the two funds which is available here.
1. The appropriate test that he would have to perform by using excel is Tools --> Data Analysis --> F-Test Two-Sample for Variances 2. Can he conclude at 5% significance level that "Safe Squared" is safer than "You Bet"? yes
A regional fast food chain wants to insure that their customers do not eat meat carrying E.coli bacteria. The preventive measure is to cook the meat at the required temp. Bcus of varying patty size & burner temperatures, meats cooked for the same length of time can have diff final internal temperatures. The health department is recommending that the newer, digitally controlled burners reduce the variation in the final internal temp. There are 2 models of digitally controlled burners to choose from. The restaurant chain will choose the model with the most consistent final internal temperature of meats cooked. They sample 11 batches of meat cooked by burner model 1, & 13 batches of meat cooked by burner model 2. They found S12=6.7, & S22=2.5.
1. The appropriate test that the restaurant chain owner would have to perform by is the: f-test for the ratio of variances . 2. What's the value of the appropriate test statistic for this test? 2.68 3. What's the p value for this test? What's the p value for this test? 0.109043655 4. Can the restaurant chain owner conclude at the 5% significance level that variances are unequal? no 5. For a 95% confidence interval, the lower and upper confidence limits are 0.794414707 and 9.704133895 respectively.
A market research group is asked to conduct a survey on people's spending habits on food when they are away from home. They took a sample of 100 households and collected data on their average spending on food away from home (FAH), their average family income (in thousands of dollars per month, Income), the total number of hours the working members of the family work away from home (Hours), the size of the household (Size), whether the household is in a Rural Area (Rural, 1 if rural, 0 otherwise), whether the head of the household is African American (Black, 1 if African American, 0 otherwise) and finally a variable which is 1 if the head of the household has some college education and 0 otherwise (College).
1. The coefficient of determination for the regression of FAH on all the other variables is 0.243451597 . 2. The adjusted coefficient of determination is 0.194642022 . 3. Assuming that all the independent variables in the full model are significant, people from rural households, while away from home, spend on average more on food than people from the non-rural households. 4. The F-test statistic for testing the significance of the subset variables is 0.837335017 . 5. The p-value of the test statistic for testing the significance of the subset variables is 0.504898014 . 6. What is your conclusion: fail to reject the null hypothesis and proceed with using the reduced model 7. According to the final model, people from rural households, while away from home, spend on average the same amout on food as people from the non-rural households.
In a multiple regression model with p independent variables (y=Bo+B1x1+...+e) u have these 4 assumptions:
1. The error term is a random variable w/ a mean of zero (E(e) = 0) for all values of the independent variables x 2. The variance of e, denoted by o^2, is the same for all values of the independent variables x1, x2, xp.. 3. The values of e are indpendent 4. The error term e is anormally distributed random variable. The residuals plotted againt the predicted value of y (y^) can be used to validate the assumptions of the multiple regression model.
It is believed that work experience should have a positive impact on the salary received by an employee. Given here is data on 14 randomly chosen employees from a firm. You are asked to estimate the impact of experience (in years) on salary (in thousands of dollars per month) and answer the following questions based on your EXCEL results.
1. The estimated slope coefficient of the regression of salary on experience is 1.613399136 2. The interpretation for the estimated slope coefficient is the estimated average salary of an employee will increase by 1.6134 thousand dollars per month with each additional year of experience. Estimate the salary (in thousands of dollars)of a person with 5 years of experience (note that here you have to plug 5 into the regression equation that you have created ): 0.624073171
It is believed that work experience should have a positive impact on the salary received by an employee. Given here is data on 14 randomly chosen employees from a firm. You are asked to analyse the impact of experience (in years) on salary (in thousands of dollars per month) and answer the following questions based on your EXCEL results.
1. The estimated slope coefficient of the regression of salary on experience is 1.613399136 2. The interpretation for the estimated slope coefficient is the estimated average salary of an employee will increase by 1.614 thousand dollars per month with each additional year of experience. 3. The residual degrees of freedom are: 12 4. Estimate the salary (in thousands of dollars)of a person with 5 years of experience. 0.624073171 5. Calculate the 95% prediction interval for your estimated salary above, for a person who has exactly 5 years of experience. The upper bound of this prediction interval is: 11.82518033 6. Now calculate the 95% confidence interval for your estimated salary, for a group of people who on average have 5 years of experience. The upper bound of this confidence interval is: 3.767366956 Which one of the following is NOT an assumption or required condition in simple regression: the variance of the error term should increase with x
You want to know how the weather affects ticket sales at a ski resort. You have data on tickets sold and total snowfall and average temperature over Christmas week for 20 consecutive years. Run a multiple regression of ticket sales on snowfall and temperature and answer the following questions.
1. The plot of residuals against time suggests that there may be positive autocorrelation because the residuals show persistently negative and then positive positive values. 2.The calculated value of the DW statistic is 0.593140311 3From the Durbin-Watson statistic in Excel and given the lower critical value (dL) and upper critical value (dU) of 1.10 and 1.54, we can say that there is significant positive autocorrelation.
The assumptions about the error term are:
1. The random errors have a mean of zero 2. They have the same variance 3. They are independent 4. They are normally distributed
They want you to do a test on the correlation coefficient (that is, a test to see if any linear relationship exists between two variables) between student population size and pizza sales, based on the Armani Pizza example. The data is provided here. Answer the following questions based on your findings.
1. The sample correlation coefficient between sales in thousands of dollars and the number of students in thousands is given by 0.963366334. If sales was measured in millions of dollars instead of thousands of dollars then the correlation coefficient would stay the same 2. Suppose you would like to test whether the population correlation coefficient between number of students and sales is 0. The value of the test statistic for testing this hypothesis is 10.16004421 and the p-value is 7.53839E-06 = 7.53839e-06 . This test statistic and p-value are the same as the test statistic and p-value for the test of the regression slope coefficient. You can say that: a linear relationship does exist between pizza sales and the size of student population. 3. Now do the F-test for overall validity of the model (slide 30 on page 72). What is the value of the test statistic? 103.2264983 and the corresponding p-value is 7.53839E-06 = 7.53839e-06 . This p-value is the same as the p-value for the individual t-test of the regression slope coefficient.
Suppose that the distribution of daily temp for Champaign is symmetric & mound shaped w a mean of 62 & SD of 30. What proportion of days were above freezing (32)?
84% Midterm 1 Fall 2013 282
Do a test on the correlation coefficient (that is, a test to see if any linear relationship exists between two variables) between the odometer mileage and used car prices, based on the Car Sale Price example. Partial results are provided here. Answer the following questions based on your findings.
1. The sample correlation coefficient between used car sale price and the odometer mileage is given by -0.806307604 . 2. Suppose you would like to test whether the population correlation coefficient between odometer mileage and used car price is 0. The value of the test statistic for testing this hypothesis is -13.49465085 and the p-value is 4.44346E-24 = 4.44346e-24 . 3. Now do the F-test for overall validity of the model (slide 30 on page 72) . What is the value of the test statistic? 182.1056015 . The numerator and denominator degrees of freedom for this F-test are respectively 1 and 98 . The corresponding p-value is 4.44346E-24 = 4.44346e-24 .
It is believed that higher vacancy rates of office space reduce the rental rate of offices. An economist has run a simple linear regression where vacancy was the independent variable and rental rate was the dependent variable. Based on the following partial excel output, answer the following questions:
1. The sample used to generate the above excel output has 30 observations. 2. The coefficient of determination is 0.291145846 . 3. The value of multiple R is 0.539579323 and the sample correlation coefficient between vacancy and rent is -0.539579323 . 4. The sample estimate of the standard deviation of the population error term is 2.873201487 . In the ANOVA table, the value of SSE is 231.1480299 , while the value of MSE is 8.255286784
You were asked to investigate the effect of average temperature of a particular day on profits from icecream sales in a Midwestern town. You are given the data on 15 daily average temperatures and thousands of dollars in icecream profits in the town. Run a simple linear regression of profits from icecream sales on the average daily temperature and answer the following questions.
1. The scatter plot of profit from icecream sales and temperature shows a weak positive relationship. 2. The calculated value of the Durbin-Watson statistic is 0.954888834 3. From the Durbin-Watson statistic in Excel and given the lower critical value (dL) and upper critical value (dU) of 1.08 and 1.36, we can say that there is significant positive autocorrelation.
Based on this data on the experience (in years) and salary (in thousands of dollars) of 51 employees of a firm, answer the following questions using your results of the sample linear regression of salary (dependent variable) on experience (independent variable) and the corresponding residual plot.
1. The scatterplot of salary on experience suggests that there is A positive curve-linear relationship between salary and experience. 2. Which one of the following seems to be an accurate representation of the residual plot. An U-shaped band. 3. The residual plots of regressing experience (dependent variable) on salary (independent variable) will be the opposite of the residual plot from a regression of salary on experience and exhibit an inverted-U shape band. 4. The value of R?you obtain from this regression is 0.784507379 = 0.784507379 5. Now add an additional independent variable, called "experience2". Create this variable by squaring all the values of experience and entering them in to a new column. Now run a multiple regression with experience & experience2 as ur independent variables, keeping salary as your dependent variable. The value for R? is 0.963849893
Recall the used car price example. Suppose that you have run a regression of car price (the dependent variable) on odometer mileage (the independent variable) and partial results are stored here.
1. The value of R2 is 0.650131952 . 2. The value of the standard error of the estimate (Se) is 303.1375029 . 3. If you wanted to test if the slope is non-zero in this regression, the t-test statistic for the slope would be -13.49465085 . 4. The p-value for the above test is 4.44346E-24 = 4.44346e-24 . 5. After looking at the p-value corresponding to the above test statistic, your conclusion is that the slope is different than zero, the model is useful.
A car dealer wants to find the relationship between the odometer reading and the selling price of used cars. A random sample of 100 cars is selected, and the data recorded. The necessary data is here. Find the regression line.
1. The value of is 36009.45 . 2. The value of is 14822.82 . 3. The value of Sx2 is 43528689.66 . 4. The value of cov(x,y) is (Note that what excel gives you in a covariance matrix is population covariance and here you need sample covariance. When you get covariance from the excel matrix, multiply it by n and divide by n-1 to get sample covariance.): -2712511.08 . Then, 5. The value of b1 is -0.062315477 . 6. Then the value of b0 is 17066.76607 .
Armani Pizza. You are given the following partial output.
1. The value of the standard error of the slope (Sb1) is 1.072263792 2. The value of the test statistic is: 10.16004421
The following sample statistics are given to you: The first sample proportion is 0.12, from a sample size of 400. The second sample proportion is 0.16 from a sample size of 400. Perform a two-sample z-test for proportions to test if the first population proportion is smaller than the second population proportion.
1. The value of the test statistic is: -1.630278292 2. The p value is: 0.051521347 3. At the 5% significance level we would fail to reject the null hypothesis and conclude there is insufficient evidence to claim that the first population proportion is smaller than the second population proportion . 4. For a 90% CI for the difference in population proportions, the lower and upper bounds are -0.080290521 and 0.000290521 respectively.
Assume that the distribution of final scores of golf players is normal w a SD of 3 & a median of 78, par=72. What % of players score higher than the par?
97.5 Midterm 1 Spring 2014 266
The following sample statistics are given to you: The first sample proportion is 0.368, from a sample size of 1000. The second sample proportion is 0.275 from a sample size of 1000. Perform a two-sample z-test for proportions to test if the first population proportion is greater than the second population proportion by more than 5%.
1. The value of the test statistic is: 2.068955805 2. The p value is: 0.019275117 3. At the 5% significance level we would reject the null hypothesis and conclude that the first population proportion is greater than the second population proportion by more than 5% . 4. For a 95% CI for the difference in population proportions, the lower and upper bounds are 0.052265224 and 0.133734776 respectively.
We are interested in understanding what factors impact box office ticket sales for movies. We postulate that the amount of money spent on production of the film is an important determinant. We have collected data on the amount of money spent on a movie and the box office returns made by the movie for 27 movies, both quantities are in millions of dollars. Answer the following questions based on the summary statistics given below:
1. What is the dependent variable in this case? the amount of box-office ticket sales dollars 2. Then the algebraic form of the equation is given by: = 12.82339491 + 1.139980482 x. Assume that in the above equation, the correct value for the intercept term was 5.5, and the correct value for the slope was .43. What would you expect the returns at the box office to be for a remake of "Varsity Blues", if 40 million dollars was spent on its production? 22.7
Now suppose that Mr. Corleone is going through a transitional period and would only like to severely punish the dealer if the dealer is cheating by a very substantial amount. If the winning percentages at the low stakes tables is greater than the winning percentages at the high stakes tables by more than 5%, then the dealer will be punished, otherwise he will just be fired.
1. What is the value of the test statistic for testing if the winning percentages at the low stakes tables is greater than the winning percentages at the high stakes tables by more than 5%? 1.870552005 2. Using a 5% significance level, what is the decision and conclusion? Reject the null and conclude that the difference in % wins is greater than 5%.
Michael Corleone believes that one of the new blackjack dealers he has hired in one of the family's Las Vegas casinos may be cheating when he deals at one of the high stakes (expensive) tables. Michael sends Pete Clemenza and Tom Hagen to record the number of times the dealer wins and loses at each of two different tables (ties are not included). One table has a $10 minimum bet (low stakes) and the other has a $1000 minimum bet (high stakes).
1. What type of test could you use to test Michael Corleone's belief? Z-test for difference in proportions 2. How many wins are observed on the $1000 minimum bet table? 39 3. What is the proportion of wins at the $10 minimum bet table? 0.57 4. What is the value of the test statistic to test whether the % of wins at the low stakes table is different than the % of wins at the high stakes table? 2.547623327 5. Using a 5% chance of committing a Type-I error, what is our decision and conclusion of the test? Reject the null and conclude that the dealer is cheating.
It has been a popularly held belief that surfing the web actually reduces your productivity at the workplace. You are appointed by the manager of Hole-in-One, a Swiss cheese distributing company, as a consultant to find out the statistical effect of average hours spent by the employees surfing the web on the productivity measured in amount of sales per week (in thousands of dollars). You were given 52 weeks of data. You ran a simple regression of sales (in thousands of dollars) on number of hours spent on the web per week (download the partial results from here). Answer the following questions based on your findings.
1.The sample correlation coefficient between sales in thousands of dollars and the numbers of hours spent surfing the web is given by 0.739070231 . If sales was measured in millions of dollars instead of thousands of dollars then the correlation coefficient would stay the same 2.Suppose you would like to test whether the correlation coefficient between hours spent surfing the web and sales is 0 or not equal to 0. The value of the test statistic for testing this hypothesis is 7.758010017 and the p-value is 3.9439E-10 = 3.9439e-10 . This test statistic and p-value are the same as the test statistic and p-value for the test of the regression slope coefficient. 3. From the information available from the regression output, the best point estimate for the standard deviation of the error term is 3.724559111 . Also, the standard error of the slope coefficient is given by 0.107124209 .
As an intern at the EPA for the summer, you are tasked with understanding what factors are most important in determining miles per gallon in the automobile industry. Your analysis will be used to help develop effective laws designed to raise average fuel efficiency. Once factor you believe negatively impacts fuel efficiency is vehicle weight. You gather data and graph the relationship between these two variables. You are interested in developing a regression model which quantifies the relationship between your dependent variable (MPG) and the independent variable (vehicle weight). The data is stored here .
1.The scatterplot of MPG vs. Weight suggests that there is a positive curve-linear relationship between MPG and Weight. 2.Run a regression with MPG as your dependent variable and Weight as your independent variable. The standard error of the estimate is 2.531312933 , while the p-value for the overall F-test is 6.39401E-11 = 6.39401e-11 , indicating that at the 5% level of significance, the linear model is valid . 3.The adjusted R2 is 0.721254632 , while the p-value for the overall F-test is 1.08566E-11 = 1.08566e-11 , indicating that at the 5% level of significance, the quadratic model is valid . Looking at the two independent variables, you can say that at the 10% level of significance, both Weight and Weight 2 are significant variable(s). 4.The adjusted R2 is 0.731999118 , while the p-value for the overall F-test is 2.69522E-11 = 2.69522e-11 , indicating that at the 5% level of significance, the cubic model is valid . The value of adjusted R2 has gone up in going from the quadratic model to the cubic model. 5.After examining the p-values for the highest order term in the quadratic and the cubic models, you would choose the quadratic model as your final model. 6.Using the final model, what would be the estimated MPG for a car that weighs 4000 pounds? 17.43884417 7.While a car that weighs 10,000 pounds would have an estimated MPG of? 48.04632291 . Why would you not want to estimate MPG for a car that weighs 10000 pounds? there is no data in that range, so estimation should not take place for those large values.
line with the smoothest appearance: pink diamond symbol line must be
5 month moving avg
Right now the State of Illinois is debating whether casinos should be opened in the Chicago area. Proponents of casinos are saying that Chicago is not attractive to tourists because there are no casinos in the city. The opponents of casinos, on the other hand, claim that casinos negatively impact poor people in the community. You are hired by the State to investigate the issue and you collect data on 100 random people who live near casinos in other states. You ask them on how many times they went to a casino in the past year, the years of education (EDUCATION), their age (AGE), how many children they have (CHILDREN), and their income (INCOME). The adjusted R2 is? a. 0.410 b. 0.433 c. 0.567 d. 0.590 e. 2.190
A
Suppose you want to test whether more than 60% of UIUC students weigh more than 175 pounds. What would your null and alternative hypothesis look like? a. H0: p≤ 0.6; H1: p> 0.6 b. H0:μ≤60;H1:μ>60 c. H0: μ≥ 175; H1: μ< 175 d. H0: μ≤ 175; H1: μ> 175 e. H0: p≤ 1.75; H1: p> 1.75
A
What would happen to the standard error of the estimate and coefficient of determination if you introduce a second independent variable that is very important to explain profits? Select one: a. The standard error decreases and coefficient of determination increases b. They both decrease c. They both increase d. The standard error increases and coefficient of determination decreases e. They both remain unchanged
A For the standard error of the estimate, SSE would decrease more than the denominator would decrease because we add a variable. For the coefficient of determination, SSR would increase and SST remains unchanged.
Using data collected from local grocery stores, you would like to examine factors affecting the quantity of bread purchased by families. The data consist of the quantity of bread (QBread), family size (Size), family income (Income), family years of education (Education), the price of bread (PBread) and the price of meat (PMeat). The data is available here. [link: bread.xls] Regress QBread on Size, Income, Education, PBread and PMeat. Which of the following problems will never be encountered in this context? Select one: a. Autocorrelation b. Non-normality c. Heteroskedasticity d. Extreme outliers e. All are possible in this context
A Since the data is cross-section, there is no time component. For this reason, it is impossible to encounter autocorrelation in the data.
Perform a Durbin-Watson test. Based on critical values of dL=1.1250 and dU=1.7340, which of the following is true? Select one: a. The error term has positive autocorrelation, a trend term must be included b. There is non-normality, a trend term must be included c. There is no autocorrelation, no changes are necessary d. The error term has negative autocorrelation, a trend term must be included e. There is no autocorrelation, but a trend term must be included just in case
A The Durbin-Watson test statistic is 0.0925. Since DW< dL we conclude that there is positive autocorrelation and we must include a trend term.
Suppose that as an investor in the stock market, you're looking for a stock that's going to move in tandem with the overall market. You compute the returns for a stock you're looking at, and run a regression against a broad market index, obtaining the following regression output: What is the test statistic corresponding to the claim that this stock moves perfectly with the market? a. -0.571 b. 0.967 c. 1.009 d. 47.218 e. 2229.5 What is the F-statistic for the overall validity of the model? a. -0.571 b. 0.967 c. 1.009 d. 47.218 e. 2229.5
A E
It is believed that women's weight after pregnancy is less that before pregnancy. Using a sampling where each of a randomly selected group of women is weighed before & after pregnancy, what shud u use to test this claim?
A paired sample t-test for difference in means (testing 1 pop for the difference btwn means under 2 diff conditions)
In an agricultural experiment, two expensive high-yield varieties of corn are to be tested and yield improvements are to be measured. The experiment is arranged so that each variety is planted in ten pairs of similar plots. Data is collected on yield increases obtained for these two varieties.
A paired sample t-test should be used since we have dependent samples.
In an agricultural experiment, 2 expensive high-yield varieties of corn are to be tested and improvements are to be measured to determine whether there's a difference in yield increase btwn them. Each variety is planted in 10 pairs of similar plots and data on yield increases is given for these 2 varieties. Which test should be used?
A paired sample t-test, because the samples are dependent.
A large discount chain compares performance of its credit managers in Ohio and Illinois by comparing the mean dollar amounts. 2 independent random samples are selected. 2 means and SD's are given. Assume unequal variances. The test stat for testing this claim will be:
A t distribution
A maker of high-fiber cereal is looking for additional material to market its product. The company interviewed 150 ppl, identified each person as a consumer of nonconsumer, and recorded the # of calories each person consumed. Assume the 2 pops have unequal variances. The test stat for testing this claim will be:
A t distribution
A political scientist is interested in comparing characteristics of students who don't and do vote in elections. 2 averages & 2 sample SD's are given and pop variances are equal. The test stat for testing this claim will be:
A t distribution
A publisher is interested in the effect on sales of producing college texts w a large # of data files. Publisher produces 20 tests and randomly selects 10 not to have a limit. The remaining 10 are to be produced w no more than 100 data files each. 2 averages & 2 sample SD's are given. 2 pops are normally distributed w the same variance and pop variances are equal. The test stat for testing this claim will be:
A t distribution
Furniture World, a manuf. of office furniture, is coming out w a new computer desk.. She has gathered samples of 25 for each design type. You can assume production time variances are equal. The test stat for testing this claim will be:
A t distribution
To determine whether a new model of steel-belted radial tire lasts longer than an existing model, Firestone installed 1 tire of each type on the rear wheels of 20 randomly selected cars. The # of miles driven by each driver is given. The test stat for testing this claim will be:
A t distribution
As a new intern at Yellow Pages, ur job is to call retailers & encourage them to advertise w/ the company in the future. You obtain a random sample of 45 this year, but not last year. U are interested in comparing their annual sales figures to see whether there is a clear increase in sales this yr vs. last year. Which test should be used?
A t test
Suppose u are the owner of a chain of pizza restaurants. U believe that the manager at the Champaign branch is doing poorly. U gather monthly data on profits for the Champaign branch. Which test wud be the most appropriate to use to test whether the avg monthly profit for Champaign is less than $20,000?
A t-test for the mean (testing 1 pop mean against a constant & don't have info on pop. variance)
Your company has recently introduced a training program to improve worker produtivity to gain an edge over competitors. U decide to gather data on workers from ur plant. The data u gather are monthly measurements of avg cars per day per worker. Is ur company now more productive than the main competitor (w/ an avg productivity of 2.3 cars)? Which test shud be performed, given only this info?
A t-test for the mean (testing 1 pop mean against a constant & don't have info on pop. variance)
January Effect
A tendency for prices to increase (especially for small-capitalization stocks) in the first few weeks of January. One explanation is that investors sell poor-performing stocks at year-end to reduce their capital gains taxes, thus decreasing the prices. This prompts investors to buy undervalued stocks during the 1st few weeks of January, causing their prices to go up. *the most well known effect
A survey of 430 randomly chosen adults found that 58/222 men & 38/208 women purchased books online. Test whether men are more likely than women to buy books online.
A two sample z-test (2 pop difference in proportions)
In October 2000, the US Department of Commerce reported the results of a large-scale survey on high school graduation. Researchers contacted more than 25,000 americans aged 24. 10,987 of the 12,460 men & 11,317 of the 12,678 women said they had diplomas. Which test should be used?
A two sample z-test (2 pop difference in proportions)
The following sample statistics are given to u: The 1st sample proportion is .251 from a sample of 1000. The 2nd sample proportion is .199 from a sample of 1000. Test whetehr the 1st pop proportion is greater than the 2nd pop proportion by more than 5%.
A two sample z-test (2 pop difference in proportions)
A research project employing 22,000 American physicians was conducted to discover whether aspirin can prevent heart attacks. Half took aspririn, half too placebo. In a 3 yr period, 121 of those who took aspirin & 115 of those who took placebo had heart attacks. Which test should be used?
A z test (2 pop difference in proportions)
Suppose u are the manager of a chain of pizza restaurants. U wud like to know whethe rthe restaurant in Champaign is more or less profitable than the restaurant in Urbana. U gather daily profit data from both stores for a 6 month period March-August. Suppose the pop variance is known. Which test shud u use?
A z-test for the difference in means (comparing 2 pop means & know pop. variance)
The Internal Revenue Service (IRS) wud like to reduce the amnt of time it takes for the public to fill out their tax forms. IRS researchers produce a new form & also have the old standard form. For each form, they have a sample of 50 ppl fill it out & they record the amnt of time each persn takes to complete it. If they want to determine whether the new form is faster to fill out & the pop variance of both is known, wat test can they use?
A z-test for the difference in means (comparing the means of 2 diff pops & know the pop variance)
In a random sample of 200 likely voters, individuals answered "How wud u rate President Obama's job on the economy, where 1 is bad & 10 is excellent?" The avg=6.78 & SD=3.2. If you wanted to test whether the avg score were higher than 7, knowing that pop variance is 9, what test wud u use?
A z-test for the mean (question is about mean (rating) of 1 pop (likely voters) & u know the SD)
You decided to predict gasoline prices in different cities and towns in the United States for your modelling project. Your dependent variable is price of gasoline per gallon and your independent variables are per capita income (in dollars), number of firms manufacturing parts of automobiles in and around the city, number of new businesses started over the last year, population density of the city (in 100's of persons per sq. mile), percentage of local taxes on gasoline (in %), and the number of people using public transportation per 100 people. You collected a sample of 20 cities and obtained an SSR= 121.6308 along with Se2 = 4.4775 . Which of the following statements is (are) true?
Adjusted R2=0.5270 Suppose Se2 was 0, then adjusted R 2 would be 1. Suppose Se2 was 0, then R 2 would be 1. If number of new businesses started and population density are removed from the regression model, R2 can never increase.
An automotive part must be machined to close tolerances to be acceptable to customers. The part maker is deciding btwn 2 diff machines to replace its old one. The company will buy the machine w/ the lower variance in length of the parts. Which test shud be used?
An F-test for the difference in variances (Comparing variances btwn 2 pops)
Random samples of 60 young adult men were taken in the US & in England. Each man was asked how many minutes of sports he watched daily. If u wanted to test whether the dispersion in minutes watched is different btwn both countries, assuming u only have this info, what type of test wud u use?
An F-test for the difference in variances (Dispersions= measured by variances)
Suppose u want to know whether a training program that u company (car manufacturer) has introduced in one of the plants has had any effect on the dispersion of the # of worker mistakes. U gather data from 1 plant that recieved training & 1 that did not. The data are monthly reports of the total # of mistakes made by workers during that month. Which test shud be performed?
An F-test for the difference in variances (Dispersions= measured by variances)
Adjusted R^2 is the preferred measure of goodness of fit, compared to R^2, in multiple regression bcus:
An additional variable must substantially contribute to explaining SST for adjusted R^2 to go up
In which of the following can multicollinearity result?
An increase p-value for the individual t-tests A decrease in the absolute value of the individual t-test test statistics An inability to interpret the estimates of the slope coefficients An increased standard error of the individual slope coefficients
A study is undertaken to probe the varying levels of police brutality in the US in 1980 compared to 2000. A random sample of 200 US residents in 1980 & 350 in 2000 were interviewed. They were asked: "On a scale from 1-10, how well do police treat u?" U fail to reject the null hypothesis for the F-test. Which test shud u use?
An independent samples t-test for difference in means (equal variances) (Equal variances bcus u fail t0 reject null)
Suppose u are the owner of a chain of pizza restaurants. U want to expand operations to whichever of the cities of Decatur & Springfield has the highest avg income per capita. U collect data on a random sample of residents from both cities, asking wat their yearly income is. The p-value for the F-test=.231. Which test wud u use?
An independent samples t-test for difference in means (equal variances) Equal variances bcus p-value (.231) > .05 (Fail t0 reject null)
A large multi-plant corporation is concerned about safety at the workplace. The corp installed new safety equipment & wud now like to test how effective these changes have been in reducing hazards. 2 random samples of 15 plants are drawn. In the 1st, no changes took place; the 2nd had the equipment installed. The # of worker-hours lost due to injuries is recorded for each plant. The p-value of the F-test was .325. In order to test whether the equipment has proven effective, u need to perform:
An independent samples t-test for difference in means (equal variances) Equal variances bcus p-value (.325) > .05 (Fail t0 reject null)
An educational testing company claims that its course will improve SAT scores by more than 120 points on avg. U set out to test whether this is accurate. U gather a sample of SAT scores from 35 students who took the course & 48 students who did not. You reject the null hypothesis for the F-test. Which test shud u perform?
An independent samples t-test for difference in means (unequal variances) (Reject null hypoth means unequal variances)
Calender Effect
An observed pattern in stock prices based on the calender (for ex: a rise or fall associated w a particular weekday or month)
Which of the following is NOT true regarding exponential smoothing?
As the smoothing constant (w) approached 1, the series becomes smoother
Which of the following is true about a confidence interval
As type 1 error increases, confidence interval becomes tighter. To construct lower and upper confidence limits, standardized distribution values are converted to corresponding sampling distribution values
Which is true about the F test of overall significance of the model?
As unexplained variation increases, the F test statistic decreases
A consortium of colleges is interested in how spending by college students on textbooks each semester differs between students attending public (population 1) and private (population 2) universities. An expert claims that textbook expenditures each semester higher in public universities. But you believe the opposite is true and would like to formally prove that textbook expenditures must higher in private universities. You collect data on expenditure on books made by students selected randomly from public and private universities. The data is here. [link: bookspending.xls] What is the appropriate null and alternative hypothesis to prove that expenditures must be higher in private universities? Select one: a. H0: µ1 - µ2 = 0 vs. H1: µ1 - µ2 ≠ 0 b. H0: µ1 - µ2 = 0 vs. H1: µ1 - µ2 < 0 c. H0: µ1 - µ2 < 0 vs. H1: µ1 - µ2 = 0 d. H0: µ1 - µ2 = 0 vs. H1: µ1 - µ2 > 0 Incorrect e. H0: µ1 - µ2 > 0 vs. H1: µ1 - µ2 = 0
B
An electrical contractor has observed that his accounts are paid on average in 40 days, with a standard deviation of 10 days. Estimate what proportion of the accounts will be paid between 50 and 60 days? Select one: a. 68% b. 13.5% Correct c. 27% d. 95% e. 5%
B
Suppose that you have just been hired by Champaign Automotive and they want you to use your statistical skills to appraise used cars. You are given a list of cars they have sold in the past 6 months and the dataset contains price or each car (PRICE), its age (AGE), mileage (MILEAGE), how many miles it gets per gallon (MPG), whether it was a domestic or a foreign car (FOREIGN) and whether it had a leather interior (LEATHER). If a car was foreign, the variable FOREIGN has a value of 1 and 0 otherwise and if a car had leather interior, the variable LEATHER has a value of 1 and 0 otherwise. Which of the following represents the prediction equation for the above model prior to considering the possible deletion of one or more independent variables? a. PRICE = 10314.306*INTERCEPT - 286.341*AGE + 0.021*MILEAGE + 2.574*MPG + 483.485*FOREIGN + 381.974*LEATHER b. PRICE = 10314.306 - 286.341*AGE + 0.021*MILEAGE + 2.574*MPG + 483.485*FOREIGN + 381.974*LEATHER c. INTERCEPT = 10314.306 - 286.341*AGE + 0.021*MILEAGE + 2.574*MPG + 483.485*FOREIGN + 381.974*LEATHER d. PRICE = 476.056*INTERCEPT + 34.615*AGE + 0.002*MILEAGE + 15.205*MPG + 199.981*FOREIGN + 194.884*LEATHER e. PRICE = 476.056 + 34.615*AGE + 0.002*MILEAGE + 15.205*MPG + 199.981*FOREIGN + 194.884*LEATHER
B
Suppose you have some data on how many books different college libraries presently have in their collections. You figure that the longer the college has been around, the more books they have. Your data is drawn from similar colleges of different ages that were founded before Professor Petry was born. You run a regression of number of books against age of the college. Sure enough, you get positive numbers for the estimated slope and estimated intercept, and both are significant at the .05 level. Now you want to interpret the intercept. Which of the following statements best describes the situation? a. the estimated intercept gives our best guess of how many books a college had when it was founded b. we cannot interpret the estimated intercept as the number of books a college had when it was founded since we don't have colleges in our sample that were founded recently c. if we wanted to interpret the intercept this way, we would need to run a multiple regression d. if we wanted to interpret the intercept this way, we would have to do a partial F-test e. none of the above statements are correct
B
Which of the following is not a correct procedure for model selection when forecasting? Select one: a. Use some of the observations to develop several competing forecasting models. b. Use the model which generates the lowest MAD value if you want to avoid (even a few) large errors. c. Use the model which generates the lowest SSE value if you want to avoid (even a few) large errors. Incorrect d. Run the models on the rest of the observations. e. Calculate the accuracy of each model using both MAD and SSE criterion.
B
Which of the following may be a(n) appropriate technique(s) for removing random variation from time series data? (i) Using moving average (ii) Regress the variable of interest on time in linear regression. (iii) Random variation is not removable, since it is not detectable. (iv) Use exponential smoothing. Select one: a. Only (iii) b. Only (i) and (iv) c. Only (i), (ii) and (iv) d. Only (i) e. Only (i) and (ii)
B
You are given quarterly data on the returns for the S&P 500. The data is available here. [link: SP500.xls] What is the exponentially smoothed value for 2005-quarter 1, using a smoothing constant of 0.7? Select one: a. 1.3063 b. 1.535 c. 1.1243 Incorrect d. 0.8805 e. 2.0835
B
You would like to buy a cell phone, but want to avoid signing up with a carrier that is prone to poor connections. You proceed to investigate the matter by collecting data from 100 random students on campus that have cell phones. You collect information on the number of dropped calls in the past month, the average daily talk time, and the name of the carrier. Since data on the name of the carrier is categorical, you need to create dummy variables. If you know that the carriers in Champaign are Cingular, Verizon, T-mobile, ATT, and Sprint, how many dummy variables do you need to create? a. 3 b. 4 c. 5 d. 6 e. 7
B
You would like to study crude oil price fluctuations over time. You conjecture that this variable is affected by systematic calendar effects that arise due to demand shocks in certain times of the year. Select one: a. z-test for single population mean b. time series analysis c. z-test for difference in population means with independent samples d. t-test for difference in population means with unequal variances e. t-test for difference in population means with equal variances
B
According to the model, if an owner remodels the apartment and as a consequence its condition improves by 2, what should happen to the estimated rent? a. it should increase by $53.899 b. it should decrease by $53.899 c. it should increase by $105.798 d. it should decrease by $105.798 e. the estimated rent would not change
C
Disregarding the result of the previous question (assume you don't know whether variable LEATHER is important or not), if we removed variable LEATHER from the model, what would happen? a. we are sure that adjusted R2 will not increase b. we are sure that adjusted R2 will not decrease c. we are sure that R2 will not increase d. we are sure that R2 will not decrease e. none of the above
C
In a simple linear regression, which of the following statements best characterizes the regression line? a. it maximizes the number of data points that are close to the regression line b. it does the best job of predicting points that fall just outside the right end of the data set c. it minimizes the sum of squared vertical distances between the sample data and the regression line d. it minimizes the distance between the regression line and the point representing the sample means e. the slope of the regression line is equal to the correlation coefficient, and together with the sample means, this determines the intercept
C
Now your boss comes along and informs you that although the model that you have developed (FULL model) is fine, it could be improved. He suggests that variables AGE and CHILDREN are individually not important, so he wants you to develop a new model (REDUCED model) without those two variables. You know that variables AGE and CHILDREN are individually not important, but before you make any definitive conclusion to remove those variables from the FULL model, you want to test whether they as a group are significant and should, therefore, be kept. The REDUCED model output is given below. The test statistic for testing whether variables AGE and CHILDREN could be removed as a group from the FULL model is? a. -0.832 b. 0.416 c. 0.832 d. 18.172 e. 35.643
C
Which of the following statements is true concerning confidence vs. prediction intervals for the estimate ŷ = b0 + b1x* where x* is a particular value for the independent variable (assume Sε is positive)? a. the 95% confidence interval will always be wider than the 95% prediction interval b. the 95% confidence interval will always be the same as the 95% prediction interval c. the 95% confidence interval will always be narrower than the 95% prediction interval d. all of the above statements are true e. cannot be determined
C
You are shopping for an apartment and want to know what is a fair rent. You gather data on a fairly large number of apartments in the Champaign-Urbana area and in addition to recording rent of each apartment, you gather data on the general characteristics of each apartment. The regression of the model that you have built is presented in the following table: Before we proceed with analyzing the above model, we should check the necessary assumptions on the error term. Which problem will not occur in this type of data? a. non-normality b. heteroskedasticity c. autocorrelation d. outliers e. multicollinearity
C
You would like to know whether college-aged men or women are more likely to own cars. You ask a random sample of 100 college students (48 men and 52 women) whether or not they own a car, noting the responses. Is either gender more likely to own a car than the other? Select one: a. t-test for difference in population means with equal variances b. t-test for difference in population means with dependent samples Incorrect c. z-test for difference in population means with independent samples d. t-test for difference in population means with unequal variances e. One-way ANOVA
C
Hyundai have recently made a safety recall in the Midwest. They worry that a specific part is damages as a results of salt usage to clear roads over the winter. If so, Hyundai will replace the part at no charge. However, the intensity of salt usage may be different from one state to the other among the 7 states they worry about the most. They would like to test whether the average number of replacement differ across different sates. Select one: a. z-test for difference in population means with independent samples b. z-test for single population mean c. time series analysis d. one-way ANOVA e. t-test for difference in population means with unequal variances
D
Movie studios segment their markets by age. Two segments that are particularly important to this industry are teenagers and 20-30-year-olds. They are planning on their next project and want to decide whether the new movie should focus on the teenagers or on 20-30-year olds. Specifically, they want to test whether teenagers watch more movies than 20-30-year-olds per year. If that is true, they will focus on making more movies for teenagers.
Cannot tell yet because I don't know whether the two populations have equal or unequal variances, so first I need to perform the F-Test What is the value of the test statistic for the test you have selected in the previous question? 2.732120054 What is the p-value for this test? 5.14049E-10 = 5.14049e-10 Reject the null hypothesis and conclude that variances between teenagers and 20-30-year-olds are different Which test are you going to use to test whether teenagers watch more movies than 20-30-year-olds per year? t-Test: Two-Sample assuming Unequal Variances What is the relevant point estimate? 3.877435065 What is the value of the test statistic? 2.284997053 What is the p-value for this test? 0.011513775 Reject the null hypothesis and conclude that teenagers watch more movies than 20-30-year-olds
All of the following are possible consequences of the presence of serious multicollinearity between the independent variables EXCEPT a. increased standard errors for the slope coefficients b. t-statistics are deflated c. difficulty in interpreting the slope coefficients d. confidence intervals for the slope coefficients are narrower e. high R2 value but insignificant t-statistics
D
In multiple regression analysis, if a plot of the standardized residuals versus predicted values displays a fan shape then this implies a violation of which of the following assumptions? a. the residuals follow a normal distribution b. the error term has mean zero c. the error term has different variance at different values of the independent variables d. the error term has the same constant variance e. the variance of the error term follows a normal distribution
D
Now test for the significance of the variable LEATHER in the above model. What is your conclusion at 5% significance? a. the relevant test statistic is 381.974 and the coefficient is significantly different than zero b. the relevant test statistic is 381.974 and there is insufficient evidence to claim that the coefficient is significantly different than zero c. the relevant test statistic is 2.001 and the coefficient is significantly different than zero d. the relevant test statistic is 2.001 and there is insufficient evidence to claim that the coefficient is significantly different than zero e. the relevant test statistic is 2.001 and there is not enough information provided to either reject or fail to reject the null hypothesis
D
Queen Latifa suspects that students at the University of Michigan spend more time reading newspapers than students at the University of Illinois. To test her claim, she gathers a sample of the average number of hours per week spent reading newspapers for 100 students at the University of Michigan and does the same for 100 students at the University of Illinois. Assuming that her choice of tests is limited to the following, which test should she use to examine her claim? a. Z-test for difference in proportions b. Z-test for difference in means c. paired sample t-test for difference in means d. t-test for difference in means assuming unequal variances e. F-test for difference in variances
D
Regardless of the conclusion from the previous question, which of the following is a possible remedy if autocorrelation is found? a. use time as the dependent variable b. multiply the dependent variable by two c. eliminate outlying observations d. add time as an independent variable e. drop at least one of the independent variables
D
Which of the following distributions is NOT symmetric about its mean? a. normal distribution b. standard normal distribution c. t-distribution d. chi-squared distribution e. more than one of the above distributions is not symmetric about its mean
D
Which of the following is EFFICIENCY? the proper interpretation of the coefficient on a. holding all other characteristics constant, rent of an efficiency apartment is approximately 39 dollars less than rent of any other type of apartment b. holding all other characteristics constant, rent of an efficiency apartment is approximately 39 dollars more than rent of any other type of apartment c. holding all other characteristics constant , we would expect rent of an efficiency apartment to be approximately 39 dollars more than rent of a one bedroom apartment d. holding all other characteristics constant , we would expect rent of an efficiency apartment to be approximately 39 dollars less than rent of a one bedroom apartment e. we would expect an efficiency apartment to have a positive relationship with rent
D
Which of the followings best characterizes the null and alternative hypotheses of a test inspecting the overall validity of the model? Select one: a. H0: β2= 0 vs HA: β2≠ 0 b. H0: β1= β2 vs HA: β1≠ β2 c. H0: β1= β2= 0 vs HA: β1≠ β2≠ 0 d. H0: β1= β2= 0 vs HA: At least one βi ≠0 Correct e. H0: β1= 0 vs HA: β1≠ 0
D
Meteorologists believe that the air pollution in Ahwaz (the most polluted world's city) is attributed by the gasoline consumption and air temperature. They gather data on air pollution (PL), gasoline consumption (Gas), air temperature (Temp) over a 12 year period and run the following regression, which is estimated in terms of logarithms. The partial output is provided. A final observation is that minimum and maximum values for log(Gas) are 12.03 and 12.35, respectively. The researchers claim that more gasoline consumption increases pollution. The partial excel output is available here. [link: pollution.xls] log(PL) = β0 + β1 log(Gas) + β2 log(Temp) + ε What are the values of coefficient of determination and adjusted-R2? Select one: a. 0.7693 and 0.7421 b. 0.0434 and 0.0531 c. 0.9912 and 0.9469 d. 0.9566 and 0.9469 e. We don't have enough information to compute those quantities.
D R2=1-SSE/SST=1-0.416232/9.587168=0.9566 and adjusted-R2= 1-(SSE/(n-k-1))/(SST/(n-1))=1-(0.416/9)/(9.587/11)=0.94.69
Psychologists believe that a person's happiness adapts over time. In other words, that today's happiness depends negatively on yesterday's level of happiness. You are given a series of average happiness levels for the United States. The data is available here. [link: happiness.xls] Estimate an AR(1) model. What is the slope coefficient? Select one: a. -2.2851 b. 0.1655 c. -2.2851 Incorrect d. -0.3782 e. 8.8860
D use the toolpak to regress the previous period's happiness on the current period's level. You will loose one observation when running this regression. The slope coefficient is -0.3782 The correct answer is: -0.3782
Which of the following is true about the F-test of overall significance of the model? (i) Large F statistics result from large total variation. (ii) As unexplained variation increases, the F-test statistic decreases. (iii) As explained variation increases, we tend to fail to reject the null hypothesis. (iv) A higher p-value for the test results from a higher F-test statistic. Select one: a. Only (iv) b. Only (i) and (iii) c. Only (i) d. Only (ii) e. Only (i) and (ii)
D The F-test statistic of the overall significance test is: F = MSR/MSE = (SSR/k)/(SSE/n-k-1). Therefore, if unexplained variation increases (SSE increases) the denominator increases, making the F-test statistic smaller. At the same time, if SSE increases, SSR decreases (since SST=SSR+SSE and SST is always the same). So F-test statistic will also decrease because SSR decreases.
Disregarding the result of the previous question, if we rejected the null hypothesis when testing the significance of variables AGE and CHILDREN as a group, then we should conclude that: a. variables AGE and CHILDREN are individually significant, so we should proceed by using the REDUCED model b. variables AGE and CHILDREN are individually insignificant, so we should proceed by using the REDUCED model c. variables AGE and CHILDREN are insignificant as a group, so we should proceed by using the FULL model d. variables AGE and CHILDREN are insignificant as a group, so we should proceed by using the REDUCED model e. variables AGE and CHILDREN are significant as a group, so we should proceed by using the FULL model
E
Drew and Paul would like to compare how effective their discussion board hints are. In 2012, Drew picks the 6th week of the Fall semester, which has a class size of 649, and Paul picks the 6th week of the Spring semester, with class size of 871, to lead the discussion board. Students submit the same set of homework questions on the 6th week of both semesters. 157 students in the Fall semester and 282 students in the Spring semester earn full credit on the 6th week's homeworks. Select one: a. F-test for ratio of variance b. t-test for difference in population means with unequal variances Incorrect c. z-test for proportions d. t-test for difference in population means with equal variances e. z-test for difference in proportions with zero hypothesized difference
E
What test should you perform to prove that the opposite is true? (allowing for a 10% chance of Type I error) Select one: a. F-test for ratio of variances Incorrect b. t-test for difference in means assuming equal variances c. t-test for difference in proportions d. z-test for difference in means e. t-test for difference in means assuming unequal variances
E
Which of the following is always true? a. the covariance of a variable with itself is always equal to one b. the coefficient of correlation of a variable with itself is always equal to minus one c. the covariance between two variables is always positive d. the correlation coefficient between two variables is always positive e. the covariance between a variable and itself is equal to its variance
E
You are considering making a substantial investment in a new company on punchstarter.com. You believe that projects with the most potential attract more investors. In order to test this belief, you select a random sample of 30 companies that requested investments on the website and collect data on profits one year after appearing on the website, and total number of investors. You are interested in measuring profit (y) as a function of the total number of investors (x). What is the value for the intercept? Select one: a. 600 b. 550 c. 0.6518 d. 941.0933 e. 158.9067
E
You want to estimate how world conflict affects international oil prices. You collect data on annual average price per barrel in dollars and the number of conflicts in that year. The data is available here. [link: oil.xls] Run a regression using Price as dependent variable and Conflicts and independent variable. What is the correct interpretation for the coefficient for Conflicts? Select one: a. The Price on average is $825.70 higher a year with at least a conflict b. Increasing conflict by one unit decreaes Price by $8.26 on average c. Increasing conflict by one unit decreases Price by $825.70 on average d. Increasing conflict by one unit increases Price by $825.70 on average e. Increasing conflict by one unit increases Price by $8.26 on average
E
Suppose you are estimating a wage regression, where salary is the dependent variable and age, years of education and a dummy variable for male are your independent variables. You are interested in measuring how salary differs between those who have at least a college education with those who have less than a college education. If a person is considered as having a college education when she has more than 12 years of education, how can you measure the difference in salary between college and non-college educated individuals? Select one: a. Re-estimate model interacting years of education with a dummy variable for college b. Re-estimate model replacing years of education with a dummy variable for college and one for no college c. Calculate the difference in predicted salary between an individual with 14 years of education and one with 7 years of education d. Multiply coefficient for years of education in original regression by 12 e. Re-estimate model replacing years of education with a dummy variable for college
E Since we always have to omit one category when including dummy variables in the regression, we re- estimate the model replacing years of education with a dummy variable for college (omitting no college).
When testing the overall validity of a linear regression model, which of the following is associated with a larger F-statistic? Select one: a. Larger total variations. b. Larger unexplained variations. c. A larger critical value of overall validity test. d. Smaller explained variations. e. None of the above.
E The F-statistic is the ratio of explained to unexplained variation. Hence it is not affected by the amount of total variation. Larger unexplained variation lowers the F-statistic. Smaller explained variation also lowers the F-statistic. The magnitude of the critical value is determined from the significance level (α) and relevant degrees of freedom. It is unrelated to the size of any test statistic.
Suppose you are the manager of a chain of pizza restaurants. You would like to know if the restaurant in Champaign is more or less sensitive to the number of students on campus than the restaurant in Urbana (in other words, if one store has larger fluctuations in profit as a result of having more students on campus). You gather daily profit data (measured in dollars and cents) from both stores for a six-month period from March until August. Given this information, which of the following tests would be most appropriate?
F test for difference in variances
The Internal Revenue Service (IRS) would like to reduce the amount of time it takes for the public to fill out IRS forms. They produce four new forms and also have the old standard form. They have a sample of 50 people each fill out one of the forms and record the amount of time each form took to complete. If they want to determine if any of the forms is faster to fill out, what test can they use?
F test for the difference in means.
A regional fast food chain wants to ensure that their customers don't eat mean carrying E. coli bacteria. There are 2 models of digitally controlled burners to choose from. The restaurant will choose the model w the most consistent final internal temp. They samples 9 batches of meat cooked by burner model 1 & 9 bathces by burner model 2. They found 2 variances (s^2): 9.9 & 2.7. Is there a difference in temp consistency? The appropriate test would be:
F-test for the ratio of variances
Recently a manufacturer decides to implement a newly introduced assembly line which is thought to decrease the variability of the time involved in producing the manufactured item. This way he will be able to predict more accurately the time needed to produce an item. Therefore he gets hold of the data where time is measured in minutes, of the two processes represented by old and new.
F-test two sample for variances At a 5% significance level he would reject his null hypothesis and conclude that the new process has decreased the variability of manufacturing time. For a two-sided test performed at the 5% significance level, the lower and upper CRITICAL VALUES are 0.672841664 and 1.486233767 respectively.
A car insurance company is interested in ur work & funds to study whether the avg # of tickets is diff for at least 2 of 7 diff care types in total. U realize that ANOVA was copied incorrectly.
F= 2.187 Midterm 1 Fall 2013 287
An instructor is interested in learning about the impact of the class textbook on his students' performances. He teaches 3 sections & assigns a diff textbook for each section. At end of semester, he records all the students' grades & tests whether the scores differ btwn sections. Which are factor levels, what are the null & altern. hypoth, DOF's, & test stat?
Factor levels: Textbook 1,2 & 3 Ho: m1 = m2 = m3 and H1: at least 2 m's differ DOF's: k-1 (3-1) & n-k (122-3) Value of test stat F= MST/MSE = SST/df(2) divided by SSe/df(119) P-value= FDIST(F, df for treatments, df for error)
A farmer wants to know whether using diff brands of fertilizer makes a difference in her corn crop yields. There are 4 fertilizer brands & she applies each one in 10 diff locations with the same area & records crop yields. She runs a one-way analysis of variance in excel. Test whether the mean yields are different. Which of the following are factors, which are factor levels, what are the DOF's, and what is the test statistic?
Factor: fertilizer Factor levels: Bran 1,2,3 & 4 DOF's: k-1 (4-1) & n-k (4(10)-4) Value of test stat F= MST/MSE P-value= FDIST(F, df for treatments, df for error)
The director of graduate placement of an MBA program is interested in the impact of the diff majors on the # of job offers the MBA graduates recieve. He randomly selects 30 students from each of the following 5 majors: Finance, Marketing, Accy, Econ, & Polical Science and asks them the # of job offers recieved after graduation. Can he conclude that the mean # of job offers differs by major? Which of the following are factors, which are factor levels, and to perform the difference in means test for these data, u wud go to _______.
Factor: major Factor levels: marketing, finance, accy Tools > Data Analysis > Anova: Single Factor (For more than 2 pops: Anova: Single Factor For 2 pops, you can use Anova: Single Factor or one of the 2 sample t-tests)
Drew and Paul would like to compare how effective their discussion board hints are
Fall 2013 Final 418
Hyundai have recently made a safety recall in the Midwest. They worry that a specific part is damaged
Fall 2013 Final 418
Suppose you want to compare the effectiveness of two diff brands of flu shots
Fall 2013 Final 418
You would like to study crude oil price fluctuations over time. This variable is affected by systematic calender effects
Fall 2013 Final 418
An airline manager wants to kno whether customers from many diff states in the US fly a diff amnt than others
Fall 2014 Final 373
For an article in the Daily Illini you are asked to test
Fall 2014 Final 373
In order to compare the effectiveness of Gatorade vs Powerade you convince 15 ppl to participate in ur exper
Fall 2014 Final 373
You want to predict the future prices of Apple stock using its current and past prices
Fall 2014 Final 373
You want to test if the proportion of Econ 202 students (say, group 1) who get A is higher than that of Econ 203 students (say, group 2). Suppose, p1 was determined to be .48 and p2 was .52, and the test statistic was determined to be -0.5660, then the figure that represents the p-value for this specific test is:
Figure 4 (arrow going to right)
An airline manager wants to know if this year's advertising campaign increased the loyalty of their customers
Final Spring 2014 394
As an analyst for an insurance company, u are interesting in examining whether age influences the # of car accidents
Final Spring 2014 394
The format of the SAT exam has changed once more.
Final Spring 2014 394
You are given daily data on the temperature in Champaign for the last 5o yrs and are asked to forecast the temperature for july 4, 2015
Final Spring 2014 394
You are thinking of investing some savings in a mutual fund and are comparing 2 diff funds and will choose the one with the lowest volatility
Final Spring 2014 394
You worked as an intern at We Always Win Car Insurance Company last summer and noticed that individual car insurance premium depended very much on the age of the individual, the number of traffic tickets received by the individual and the population density of the city in which the individual lived.
From the regression output, 0.604292353 of the total variability in insurance premium could be explained by AGE, TICKET and DENSITY. Also the coefficient of determination adjusted for the degrees of freedom is 0.496372086 , the standard error of the estimate is 46.6128731 , the estimated average change in insurance premium for every ten additional tickets received is 367.8352132 dollars per month, while the estimated average change in insurance premium for every five additional years of age is -7.018193488 dollars per month.
Which of the following are parts of a "generalized procedure" for developing a multiple regression model?
Gather data for the variables in the model Proceed with model estimation using statistical software Draw scatter diagrams to determine whether a linear model is appropriate
Your fraternity roommate claims that males are smarter than females. To back up his claim, he used the information on average exam scores of the male students (population 1) and female students (population 2) collected from the Econ 203 course that he took last semester. The statistics are presented in the following Excel output: To test your roommate's claim, what should be your null and alternative hypotheses? (Hint: The are two correct ways to state the null hypothesis, and one correct way to state the alternative hypothesis that should be selected.) What type of test is used here? Use a 7% level of significance, what should be your decisionand conclusion?
H0: The average score of male students and female students are the same. H0: The average score of male students is no higher than female students. H1: The average score of male students is higher than female students. pooled-variance t test for the difference in means Do not reject the null hypothesis. There is insufficient evidence to conclude that average score of male students is higher than that of female students.
A professor in the school of business wants to investigate the prices of new textbooks at two different bookstores, I.U.B. and T.I.S. The professor randomly chooses the required texts from 33 business school courses (same books at each store). In order to test the claim that the average prices at the two bookstores are different, suppose you have calculated the p-value of this test to be .0056. Use a significance level of .01. Which of the following are true for this test?
H0: The means are the same and H1: The means are different The test statistic is a t-statistic with 32 degrees of freedom Reject H0 implying that there is significant evidence to suggest that the average prices at the two stores are different
An instructor considers changing the difficulty of a specific test if the avg time students spend finishing the test is DIFFERENT than 72 min. Which of the following characterizes the hypothesis test, value of test stat? Which of the following is most accurate regarding p-value of the instructor's hypoth test?
Ho: m =72 vs H1: m does not = 72 t= -.832 p-value>.03 *look at t.15,8=1.108 -.832 t stat is less extreme than -1.108 Midterm 1 Spring 2013 297
You are a revenue manager for a low-cost airline. To maximize rev, u believe that charging a reduced fare while charging extra for lavatory use will earn more money than the normal fare. U randomly assign flights to these 2 diff schemas & collect data. Determine whether the variances are unequal.
Ho: var/var = 1, H1: var/var does not = 1 F=1.842 LCL= 1.014 t-test difference in means assuming unequal variances Midterm 1 Fall 2013 285
Which of the following best describes the null and alt. hypothesis for testing whether variances are UNEQUAL?
Ho: var/var =1 vs H1: var/var does not =1
As a market analyst for a car dealership, u are interested in measuring the effect of govt subsidy for fuel efficient cars on the proportion of SUVs sold. pop 1: 36/50, pop 2:52/100. U believe the govt subsidy reduced the proportion of SUVs sold by more than 5%. What's the hypoth, test stat & conclusion?
Ho:p1-p2 = .05 vs H1: p1-p2 > .05 Z=1.857 Reject the null, conclude that subsidy reduced SUV sales by more than 5% Midterm 1 Fall 2013 283
Suppose you want to develop a model to predict pizza consumption for students at the University of Illinois. You have collected data from 100 different U of I students on the amount of money they spent on pizza (measured in dollars) in one semester, their income (measured in dollars), whether they live in a dorm or an apartment (dorm=1, not dorm=0), and how much time they spend in the library studying (measured in hours). Suppose you have run the appropriate regression with the correct variable as a dummy variable, and the coefficient on the variable DORM is estimated to be 3.7564. What is the correct interpretation of this coefficient?
Holding constant the effects of income and library study time, students living in a dorm spend on average $3.7564 more on pizza in a semester than students not living in dorms. Holding constant the effects of income and library study time, students living in an apartment spend on average $3.7564 less on pizza in a semester than students living in dorms.
Which of the following statements are true about the matched pairs t-test?
If a lot of variety is introduced into the system, then matched pairs shud be used. A disadvantage of using matched pairs is that design of the experiment makes it more expensive.
Which is an appropriate solution to solve autocorrelation in the error term?
Include a time variable in regression model
The variable that is doing the predicting or explaining
Independent Variable
Which of the following are the characteristics of the sampling distribution of the sample mean?
It has a SD of (o/sqrt of n). It is centered on m. It can have a smaller SD than the pop. distribution It changes on the sample size we use.
Which of the following are the characteristics of the standard normal distribution?
It has a SD of 1. It is centered on zero.
Which of the following about the Chi-squared distib is FALSE?
It's an approximation to the center of the distribution
A procedure used to develop an estimate of the regression equation that minimizes the sum of the squared errors.
Least Squares Method
Blue Mississippi Diamonds bought a new diamond cutting machine in order to produce higher quality cuts
Midterm 1 Fall 2013 288
For spring break you want to escape the cold to a warm destination, but are unsure where to go.
Midterm 1 Fall 2013 288
You want to prove to ur friends that soccer players from Argentina score more goals than players from Brazil
Midterm 1 Fall 2013 288
A sapphire glass manufacturer produced cellphone screens w a SD of .6 in its thickness.
Midterm 1 Fall 2013 289
The Governor of East Jersey is running for the Senate & wants to win by a landslide. He will block the bridge to Old York if he is not winning by 60%
Midterm 1 Fall 2013 290
Researchers are comparing the effectiveness of two drugs A and B for treating virus XX. Which describes the p=value, what is a 95% conf interval for the parameter of interest (m1-m2)? You run a hypothesis test with a null of Ho: m1 -m2 =0. Which of the following are affected? t stat & p val
Midterm 1 Spring 2013 300
You want to study the effect of winter on student performance.
Midterm 1 Spring 2014 272
The Transport Security Admin (TSA) is deploying new scanners at high volume airports.
Midterm 1 Spring 2014 273
The venus fly trap is one of the few carnivore plants in the world & u are considering planting many in backyard
Midterm 1 Spring 2014 275
Jiffany's (a jewelery franchise) is reconsidering whether or not to open a store in downtown Champaign
Midterm 1 Spring 2014 276
The university will be testing a new crossing on 4th street btwn Wohlers & Busines.
Midterm 1 Spring 2014 276
You want to test whether Apple stocks are relatively riskier than Amazon stocks
Midterm 1 Spring 2014 276
Which of the following is true about type 2 error?
One minus type 2 error = power of the test
Which of the following is true if o (sd)=.05?
One would reject the null incorrectly w 5% probability o(sd)= prob of rejecting the null wen it's true
The equation that describes the relationship btwn the independent variable & the dependent variable & an error term.
Regression Model
Type 1 Error
Reject Ho when true
Certain types of cars are thought to attract more police attention than others. U collect data on # of traffic tickets of 100 cars representing the 3 diff car types sold. Compute the sum of squares for treatments, sum of squares for errors, & correct F critical value.
SST=8.835 SSE=89.320 F.05,2,97 Midterm 1 Fall 2013 284
A random sample of variable X has a positively skewed distribution. Which is correct if the largest is dropped from the sample?
Sample mean of X will decrease Sample median of X might decrease
Create indicator (dummy) variables White & Silver, valued at 1 or 0, derived from the color info. Then run a regression on Odometer, White, & Silver. Which of these are true?
Silver cars sell for, on avg, 149.48 more than cars of other colors except white Silver cars sell for, on avg, 115.62 more than white cars For every additional mile on a car's odometer, the avg decline in price is about 2.3 cents, holding other variables constant The fact that a car is white has no statistical significance
To produce a better forecast we need to determine which components are present in a time series. To identify the components present in the time series, we need first to remove the random variation. This can be easily done by using smoothing techniques. We will consider two possibilities: Moving average Exponential smoothing
TRUE
Too much smoothing may eliminate patterns of interest. Here, the seasonality component is removed when using 5-period moving average. Too little smoothing leaves much of the variation, which disguises the real patterns.
TRUE
With even number of observations included in the moving average, the average is placed between the two periods in the middle. To place the moving average in an actual time period, we need to center it. Two consecutive moving averages are centered by taking their average, and placing it in the middle between them.
TRUE
A governmental organization was interested in measuring the effect of schooling on mental ability of students, so they designed a particular IQ test for elementary students. The test was performed on a group of students right before school started in the fall (after a long and lazy summer). The population standard deviation on this IQ test is assumed to be 10 points. With a random sample of 74 students, the sample mean is found to be 103 with a sample standard deviation of 20 points. Can the researchers conclude that summer break has increased the variance in scores? What is the p-value for this test? Select one: a. 5.70709E-28 b. 4.30785E-07 c. 1.15046E-27 d. 0.0013 e. 7.30601E-07
Test stat is Chi-Squared The test stat is = (74-1)*(20^2)/(10^2) = 292. The p-value is the area measured from 292 to the right (since we are testing that the variance increased during the summer). The p-value is then =CHIDIST(292,73) = 5.70709E-28 The correct answer is: 5.70709E-28
Without doing any mathematical calculations, which of the following is (are) definitely correct?
The 87% confidence interval for the population mean is narrower than the 90% confidence interval holding everything else constant. The 87% confidence interval for the population mean from a sample of 50 observations is wider than the 87% confidence interval for a sample of 100 holding everything else constant.
You would like to prove that on average the Big-10 scores less points per football game than the PAC-10. You take a random sample of 20 games for the PAC-10 (population 1) and a random sample of 15 games for the Big-10. The data is provided here.[link: BIG10.xls] Before you perform your analysis, you will need to test if the two variances are equal or not. What is the p-value for testing whether the two variances are equal or not? Select one: a. 0.4376 b. 0.0342 c. 0.7052 d. 0.3526 e. 0.2188
The F-statistic is: =VAR(A2:A21)/VAR(B2:B16) = 1.5082 To obtain the p-value, remember that: this is a two tailed test, the FDIST() function calculates the area from the F-stat to the right and the degrees of freedom are n1-1 and n2-1 (19 and 14). The p-value is: =2*(FDIST(1.5082, 19, 14)) = 0.4376 (OR use the toolpak to perform the test, the 2-tailed p-value is provided) The correct answer is: 0.4376
Most basketball fans wanted the 1998 NBA lockout to end soon, and so did the players and the owners of the teams -- so why wasn't it over sooner? You were hired as a statistician by HOOPS (Helping Out Organizers and Players of Sports) Inc. to sort out if there was any reason for that strike to go on.
The coefficient of determination is 0.13022095 , the standard error of the estimate is 4.884768386 , the estimated slope coefficient is 0.061003308 , the corresponding test statistic of a two-tail test on the significance of the slope parameter is 1.547732667 , the corresponding p-value of the test statistic for a two-tail test is 0.141236936. The 80% confidence interval for the SLOPE parameter is [ 0.008315521, 0.113691096 ], and the 95% confidence interval for the INTERCEPT parameter has a POINT ESTIMATE of 7.937526731 , plus and minus (the width) 9.44092044
It is believed in certain circles in Mongolia that salaries of top executives in companies have fallen out of sync with the actual profits made by the company.You were given a data set on the total salaries paid out to executives in 30 Mongolian companies and the profits made by them in 1999 (both in millions of dollars). Run a regression of salaries (the dependent variable) on profits (the independent variable) to see if executive salaries were really justified.
The coefficient of determination is 0.291145846 , the standard error of the equation is 2.873201487 , the estimated slope coefficient is -0.303797797 , the corresponding measured test statistic of a two-tail test on the significance of the slope parameter is -3.391219263 , the p-value corresponding to the measured test statistic for the two-tail test is 0.002089293 , the 80% confidence interval for the slope parameter is [ -0.421378735 , -0.186216858 ], and the 95% confidence interval for the intercept parameter is [ 18.29880608 , 22.98061064
When the standardized residual plot displays a fan shape (Model A in the next question is a good example of a fan shape), which of the following assumptions is(are) violated?
The error term has the same constant variance. The dispersion of the error term is the same whatever the value of the independent variable is.
Which of the following are the required assumptions in a simple linear regression model?
The errors are not correlated. The error term has a normal distribution. The errors are independent. The variance of the error term is the same constant. The mean of the error term is zero.
Which best describes the condition of heteroskedasticity?
The errors have changing variance depending on x
One of the most common questions of prospective house buyers pertains to the average cost of heating in dollars (Y). To provide its customers with information on that matter, a large real estate firm used the following five variables to predict heating costs: the daily minimum outside temperature in degrees of Fahrenheit (X1), the amount of insulation in inches (X2), the number of windows in the house (X3), and the age of the furnace in years (X4)and the number of separate rooms in the house (X5). At a 10% level of significance, each of the independent variables are insignificant individually. However, when tested as a group, they are significant at a 10% level of significance. Which of the following assumptions is(are) violated?
The independent variables are not seriously correlated with each other.
Under which of the following conditions is multicollinearity likely to be causing problems with ur results?
The independent variables have one or more correlation coefficients about .8. The overall model F-test indicates a valid model, but the individual t-tests indicate that none of the independent variables is linearly related to the dependent variable.
Multicollinearity can be best described as the condition in which:
The independent variables in a regression have a high degree of correlation with one another
According to the criterion used in the least squares method, which of the 2 lines provides a better fit to the data?
The line u plotted that has a sum of the distances equal to 0.
A real estate agent is interested in estimating the value of a piece of lake front property. He believes that price is a function of Lot Size (1000s of square feet), Number of Mature Trees on the Lot, and Distance to the Lake (in yards). He has collected data on the basis of recent sales, which is provided here.
The overall significance F-test p-value is 0.001315371 Your measured p-values for the 3 independent variables are respectively 0.215633165 , 0.004499833 ' 0.057676085. 0 Lot size 1 Trees 1 Distance Is there a contradiction in the results between the F-test and the 3 t-tests for slope? No The largest correlation (in absolute value) you find is 0.285681687 Therefore, your conclusion should be that there is NO reason to suspect multicollinearity in this example.
You worked as an intern at We Always Win Car Insurance Company last summer and noticed that individual car insurance premiums depended very much on the age of the individual, the number of traffic tickets received by the individual, whether the car was a convertible and the population density of the city in which the individual lived.
The p-value for the partial F-test when the "group" of variables you are testing is made up of the variable DENSITY is (density is the variable you remove for your partial F-test): 0.174472482 The p-value for the t-test of the individual slope coefficient for DENSITY is 0.174472482
The returns of two portfolios were recorded for ten years. Portfolio 1 had a variance of returns of 295, while portfolio 2 had a variance of 105. Can we conclude at a 5% level of significance that portfolio 1 is riskier (has a higher variance) than portfolio 2?
The test statistic for testing this claim will have the following distribution: F distribution with 9 &9 dof The value of the test statistic for testing the hypothesis is 2.80952381 . The p-value associated with this test statistic is 0.069912873 Do not reject the null hypothesis and conclude that there is not enough evidence to claim that the variation in portfolio 1 is greater than in portfolio 2. Construct a 95% confidence interval for the true ratio of portfolio variances. The LCL is 0.697845973 . The UCL is 11.31112644
A professor in the school of business wants to investigate the prices of new textbooks at 4 diff bookstores: IUT, BIS, Silos & Lord, & Orinoco. He randomly chooses from 33 business schools (same books at each store). In order to test the claim that the avg prices in at least 1 of the stores, suppose u have p-value of .034. Use a sig level of .05. Which are true for this test?
The test statistic is an F-statistic w/ 3 & 128 DOF. Ho: The means are all the same; H1: At least 2 means are different. Reject Ho, implying that there is significant evidence to suggest that there is a difference in prices btwn at least 2 of the stores.
Random component of time series
The unpredictable or unsystematic component of a time series
As the number of randomly drawn sample decreases, what happens?
The variance of the sampling distribution of the sample mean increases & the variance of the pop distribution stays the same
Exponential smoothing as a prediction tool can only be used when
There is a gradual trend There are no cyclical trends There are no seasonal effects
The 95% confidence interval for the population average final exam score is [126.4, 195.5]. To test the claim the average final exam score of the population is 180 at a 15% level of significance, what will be your decision and conclusion?
There is not enough information to make any decision or conclusion
Does alcohol affect your ability to think? A random sample of 11 people was selected to study whether drinking alcohol increases the amount of time necessary to complete a puzzle. Under one scenario (population 1), each subject would drink a beverage that contains no alcohol; under another scenario (population 2), the same person would drink a beverage with a small amount of alcohol. Each time, duration to complete a puzzle was recorded. The data is available here. [link: puzzle.xls] What are the parameter of interest and the value test statistic for this hypothesis, respectively?
This is a matched pairs situation in which the parameter of interest is μD The correct answer is: μD and -3.49
Recomposing a time series
Use seasonal indices to apply the seasonal component to the trend-based forecast
Deseasonalizing a time series
Use seasonal indices to smooth or just the original time series so that they stand corrected for seasonal fluctuations or imbalances
Which of the following is NOT a correct procedure for model selection when forecasting?
Use the model which generates the lowest MAD value if u want to avoid errors
Often the problem of nonconstant variance can be corrected. When unequal variance of e is suspected, which of the following methods might a statistician use to remedy the situation?
Use the natural log of y as the dependent variable instead of y
A local record store is getting ready for the holiday season and needs to come up with a reliable way to forecast what sales will be like in order to have the appropriate amount of cds available for holiday shoppers. The storeowner asks for your help and provides you with data of her store's sales over the past 10 years. Drop the 4th Quarter like we did in the previous homework exercise.
Using a model that includes indicator variables for the quarters, what is the test statistic for testing whether or not the first quarter has a significant effect on cd sales? -12.91217218 What would be the predicted value for the fourth quarter of 2001 using this model? 48.04626667
You have always been told that your car insurance rate will decrease when you become older. Let us check this claim out using the data on 15 persons' ages and their quarterly insurance premiums in dollars. Download the data from here and then run the appropriate regression to test the linear dependence of insurance rate for cars on ages at the 1% level of significance.
We want to test H0: β = 0 against alternative H1: β < 0 We want to test H0: ρ = 0 against alternative H1: ρ < 0 The p-value for the test is 0.170 We will fail to reject the null hypothesis. There is insufficient evidence to conclude that insurance premium depends negatively and linearly on age.
You have been given the following information involving a simple linear regression: Unexplained Variation=5306, Total Variation in y=80552, and Total Degrees of Freedom=16.
What is the proportion of the variation in y that is explained by the variation in x? 0.934129506 What are the Residual Degrees of Freedom? 15 What is the Standard Error of the Estimate? 18.8077998
According to a recent study, kids aged 12 to 17 watched an average of 3 hours of television per day. Suppose that the standard deviation is 1 hour and that the distribution of the time watching television has a bell-shaped distribution. Answer the following questions relying only on the Empirical Rule.
What percentage of kids aged 12 to 17 watches television between 2 and 3 hours per day? 34% What percentage of kids aged 12 to 17 watches television between 1 and 4 hours per day? 81.5% What percentage of kids aged 12 to 17 watches television more than 4 hours per day? 16%
Assuming that the null hypoth is true, the p-value is Midterm 1 Fall 2013 282
the probability of obtaining a test statistic at least as extreme as the 1 obtained
You have just performed a hypoth test w a 5% level of significance. U have not rejected the null hypoth that the pop mean= 50. If u were to create a 90% conf interval:
You are unsure if the 90% conf interval includes 50, bcus the 95% conf interval contains 50.
A friend asks you for help but only remembers that the 90% conf interval is (.75,3.75).
You cannot decide the hypothesis test with the info provided Midterm 1 Fall 2013 289
In the Rose Bowl, Illinois lost the turnover statistic and ended up losing the game. Sports commentators stress all the time that winning the turnover statistic leads to winning games and claim that this happens more than 90% of the time. You are interested in proving this claim so you randomly sample 46 games and find that in 42 of the 46 games, that when teams have a positive turnover statistic, they end up winning the game.
Z-test
Including time squared along the time as an explanatory variable in the regression model wud allow for
a curvilinear trend (not linear)
Multicollinearity can result in:
a decrease in the absolute value of the individual t-test test statistics an increased p-value for the individual t-tests an increased standard error of the individual slope coefficients an inability to interpret the estimates of the slope coefficients
The most basic of smoothing techniques is
a moving average. Most commonly it is calculated using an odd number of periods (k), with the moving average value being designated for time period t.
A summary measure that is computed to describe a characteristic of an entire population is called
a parameter
The universe or "totality of items or things" under consideration is called
a population.
If two variables are unrelated, the covariance will be
a smaller positive or smaller negative number than if they were related
A summary measure that is computed to describe a numerical characteristic from only a sample of the population is called
a statistic
Positive autocorrelation
the residuals show persistently negative then positive values
To remove most of the random variation but leave the seasonal effects
average the terms StRt for each season.
It has been computed that the 95% confidence interval is [144.4, 154.2] for the average exam score when a student spent 10 hours on average per week studying for the class. The 90% prediction interval for a student who spent 10 hours on average per week studying for the class will be
cannot be determined based on the provided information
High-school juniors and seniors usually take the ACT test as one of the requirements for the university application process. Kaplan, the company that specializes in helping students get good scores on the ACT exam claims that if you take their class, your score will improve. We want to test the claim and gather ACT scores of 101 students before they took the Kaplan course and then again after they took the Kaplan course. In this case, we have
dependent samples.
You would check the normality of errors assumption by
examining the histogram of standardized residuals
The seasonal component of the time-series
exhibits a short term (less than one year) calendar repetitive behavior.
When _______ of the % of trend alternate around 100%, cyclical effects are present
groups
The exponential smoothing method
provides smoothed values for all the time periods observed. When smoothing the time series at time t, exponential smoothing considers all the data available at t (yt, yt-1,...)
Assume you are given two data sets of two variables each. The data sets are unrelated to one another. You are told that the covariance of the first data set is .1. You are told that correlation coefficient of the second data set is .45. Which data set has a stronger relationship between its variables?
impossible to tell with information provided
x^2
is not centered at 1 only includes pos #s becomes more symmetrical as sample size increases
If the independent variable appears without a higher-order term,
its relationship w the dependent variable is restricted to be linear.
Which of the following is correct about a positively skewed distribution?
mean>median>mode
A large multi-plant corporation is concerned about safety at the workplace. They installed new safety equipment and would now like to test how effective these changes have been in reducing hazards. A random sample of 25 plants is drawn. The number of man-hours lost due to injuries in the month prior to the installation of new safety equipment and the month after is recorded. In order to test if the equipment has proven effective, we need to perform:
paired sample t-test for mean difference.
Which of the following are measures of central tendency?
sample mean, sample median, sample mode
Which of the following are measures of dispersion?
sample range, sample variance, sample standard deviation
A moving average
smooths out the fluctuations in the time series. The more periods in the moving avg, the greater the extent of the smoothing.
A local newspaper called 'Daily Chambana' has decided to conduct a survey to determine the effect of whether advertising in their ad pages had an effect of improving sales. They took a sample of 40 retail stores in the Urbana-Champaign area who put an ad in year 2001 (population 1) but did not in year 2000 (population 2). The annual sales data for both years are available
t-Test: Paired Two Sample for Means
A new sports drink is claimed to improve resistance level of athletes if drunk for a week. To test this, 50 athletes form diff fields are randomly selected & resistance levels are recorded. They retake the resistance test after a week of drinking.
t-test for diff in pop means w dependent samples
Suppose you are the manager of a chain of pizza restaurants. You would like to know if the restaurant in Champaign is more or less profitable than the restaurant in Urbana. You gather daily profit data (measured in dollars and cents) from both stores for a six-month period from March until August. Suppose that both stores are located on campus and are thus affected by the number of students on campus. Given this information, which of the following tests would be most appropriate to use to test whether one store is more profitable than the other?
t-test for difference in means (equal variances)
Suppose you are the manager of a chain of pizza restaurants. You would like to know if the restaurant in Champaign is more or less profitable than the restaurant in Urbana. You gather daily profit data (measured in dollars and cents) from both stores for a six-month period from March until August. Suppose that you know that the Champaign store is located on campus and therefore does much more business during times when more students are present than when students are not present. The number of students on campus does not usually affect the Urbana store. Given this information, which of the following tests would be most appropriate to use to test whether one store is more profitable than the other?
t-test for difference in means (unequal variances)
What test applies to the Block Sign
t-test for difference in means for matched pairs Midterm 1 Spring 2013 307
A random sample of 60 young adult men was taken. Each person was asked how many minutes of sports they watched daily. If you wanted to test whether on average young adult men watched more than 60 minutes a day, what type of test would you use?
t-test for mean
Multicollinearity is likely to be causing problems with your regression results if:
the Overall Model F-test indicates a valid model but the individual t-tests indicate none of the independent variables are linearly related to the dependent variable the independent variables have one or more correlation coefficients above .80
In a simple linear regression model, the least squares estimators for the intercept and slope of the population regression line are computed by minimizing
the SSE. the sum of squared residuals. the error sum of squares. the sum of squared discrepancies between the values of the dependent variable and its estimated conditional means. the sum of squared differences between the observed values of the dependent variable and its fitted values.
Residual in the regression of Y on X equals
the difference between the actual and predicted value of Y the difference between the actual Y and b0+b1*X the difference between observed and predicted value of Y
Which of the following us false?
the mode is always equal to the median
The seasonal indexes tell us
what is the ratio between the time series value at a certain season, and the overall seasonal average.
The error term e is the difference btwn _________. The residual is the difference btwn ______. The _______ is an approximation of the ________, assuming that the regression model is true.
y and E(y); y and y^; residual; error term
Annual sales for a pharmaceutical company are believed to change linearly over time. Based on the last 10 year sales records, measure the trend component. Start by renaming your years 1, 2, 3, etc.
y= 17.28 + .6254(11) = 24.954
Females on average spend more on clothing and apparel than males.?You want to find out whether more females spend at least 1/3 of their income on clothing and apparel than males do.?Which of the following tests can you use?
z test for difference in proportions
A study is undertaken to probe the varying levels of police brutality in New York, New York and Kabul, Afghanistan. A random sample of 100 residents is selected from each city and are asked the following question: Do the police treat you well? Residents can answer yes or no. With the null hypothesis being that police brutality levels are the same in both cities, which of the following tests should be used?
z test for difference in proportions.
You would like to know whether college aged men or women are more likely to own ipods. You ask a random sample of 100 students (48 men & 52 women) whether or not they own an ipod. Is either gender more likely to own an ipod than the other?
z-test for diff in proportions (48/100, 52/100)