BA
IQ scores are known to be normally distributed. The mean IQ score is 100 and the standard deviation is 15. What percent of the population has an IQ over 115?
100%-Norm.Dist(value,mean,sd,true)
P(μ+2σ≤x)
2.5%
P(μ-σ≤x≤μ+σ)
68%
margin of error
CONFIDENCE.T(alpha, standard_dev, size)
IQ scores are known to be normally distributed. The mean IQ score is 100 and the standard deviation is 15. What percent of the population has an IQ between 85 and 105?
NORM.DIST(105,mean,sd,TRUE)-NORM.DIST(85,mean,sd,TRUE)
p-value
t.test
standard_dev
SQRT(Variance)
P(μ-2σ≤x≤μ)
47.5%
If the expected production volume when there are 120 workers is approximately 131,958 units, which of the following equations would provide a reasonable estimate of the 68% prediction interval for the output of those 120 workers?
A reasonable estimate of the prediction interval is the point forecast (131,958) plus or minus the z-value times the standard error of the regression (14,994.93). As usual, the z-value is based on the desired level of confidence. Since we want a 68% prediction interval, the z-value is equal to one. Therefore 131,958±14,994.93 is the best option.
Coefficient of Variation
Coefficient of Variation=Standard Deviation/Mean
y
Dependent variable. Expected value of y, the value we try to predict
Adjusted R2
It is important to use the Adjusted R2 to compare two regression models that have a different number of independent variables.
null hypothesis
The null hypothesis is the opposite of the hypothesis you are trying to substantiate.
R-square value
R-square indicates what percentage of the variability in the dependent variable is explained by the regression line
When analyzing a residual plot, which of the following indicates that a linear model is a good fit?
Random spread of residuals around the x-axis
one-sided hypothesis tests
Test whether children who take a vitamin C supplement are less likely to become ill during flu season than children who do not Test whether there has been an increase in the average amount spent per table since the restaurant changed its head chef
two-sided hypothesis tests
Test whether there has been a change in average salary at a company since its merger with another company Test whether the average number of clicks on a website has changed since the implementation of a new home page
type 3
type 3 test (an unpaired test with unequal variances)
What can be concluded from the fact that the correlation coefficient between the acceptance rate at the top 100 U.S. MBA programs and the percent of students in those programs who are employed upon graduation is -0.32?
-0.32 is negative which indicates that, on average, as acceptance rate decreases, the percent of students employed upon graduation increases.
Which of the following would increase the width of the confidence interval?
Increasing the confidence level means that we must be more confident that the actual population mean lies within our range. The confidence level must be wider to increase the likelihood that it captures the true population mean. Note that confidence level determines the z-value, which in turn drives the width of the interval. Decreasing the sample size Decreasing the sample size will result in a less accurate prediction, and, therefore, a wider confidence interval. Note that nn is in the denominator, so as nn decreases, sn√sn increases, that is, the width of the confidence interval increases.
A curious student in a large economics course is interested in calculating the percentage of his classmates who scored lower than he did on the GMAT; he scored 490. He knows that GMAT scores are normally distributed and that the average score is approximately 540. He also knows that 95% of his classmates scored between 400 and 680. Based on this information, calculate the percentage of his classmates who scored lower than he did.
Since GMAT scores are normally distributed, we know that P(μ-1.96σ ≤ x ≤ μ+1.96σ) = 95%. Thus, to find the standard deviation, subtract the lower bound from the mean and divide by 1.96. The standard deviation of the distribution is (mean-lower bound)/1.96 = (540-400)/1.96 = 71.4. (Note that because the normal curve is symmetrical, we could calculate the same value using (B3-B1)/1.96 = (680-540)/1.96 = 71.4). To find the cumulative probability, P(x ≤ 490), use the Excel function NORM.DIST(x, mean, standard_dev, TRUE). Here, NORM.DIST(B4,B1,71.4,TRUE) = NORM.DIST(490,540,71.4,TRUE) = 0.24, or 24%. Approximately 24% of his classmates scored lower than he did. You must link directly to the values in order to obtain the correct answer
which of the independent variables are significant at the 95% confidence level?
The 95% confidence interval for the variable's coefficient does not contain 0
An airport shuttle company forecasts the number of hours its drivers will work based on the distance to be driven (in miles) and the number of jobs (each job requires the pickup and drop-off of one set of passengers) using the following regression equation: Travel time=-0.60+0.05(distance)+0.75(number of jobs) On a given day, Victor and Sofia drive approximately the same distance but Sofia has two more jobs than Victor. If Victor worked for 4 hours, for how long can the company expect Sofia to work? Please enter your answer rounded to one digit to the right of the decimal point. For example, if you think Sofia would work 236.7134 hours, enter 236.7.
The only difference between the workloads of the two drivers is the number of jobs each has; Sofia has two additional jobs. Therefore the company can expect Sofia to work the four hours Victor worked, plus an additional 0.75 hours for each of the two additional jobs, that is, 4+0.75(2)=5.5 hours.
A business school professor is interested to know if watching a video about the Central Limit Theorem helps students understand it. To assess this, the professor tests students' knowledge both immediately before they watch the video and immediately after. The professor takes a sample of students, and for each one compares their test score after the video to their score before the video. Using the data below, calculate the p-value for the following hypothesis test: H0:μafter≤μbeforeH0:μafter≤μbefore Ha:μafter>μbefore
The p-value of the one-sided hypothesis test is T.TEST(array1, array2, tails, type)=T.TEST(B2:B31,C2:C31,1,1), which is approximately 0.0128. You must designate this test as a one-sided test (that is, assign the value 1 to the tails argument) and as a type 1 (a paired test) because you are testing the same students on the same knowledge at two points in time. You must link directly to values in order to obtain the correct answer.
Quantitative Variables
Time to run a marathon, height, size of flat-screen television, hours spent studying CORe, and calories in desserts
IQ scores are known to be normally distributed. The mean IQ score is 100 and the standard deviation is 15. The top 25% of the population (ranked by IQ score) have IQ's above what value? fx
Use the properties of the normal distribution to solve this problem. Since you are only interested in the top 25%, calculate the IQ at which 75% of people are below. The Excel function NORM.INV(probability, mean, standard_dev) returns the inverse of a normal cumulative distribution function. Here, NORM.INV(0.75,mean,sd)=NORM.INV(0.75,100,15)=110 indicates that 75% of people have IQ's lower than 110. Hence, 25% of people have IQ's greater than 110.
Suppose that you have a sample with a mean of 50. You construct a 95% confidence interval and find that the lower and upper bounds are 42 and 58. What does this 95% confidence interval around the sample mean indicate? Select all that apply.
We are 95% confident that the population mean lies between 42 and 58. is correct The 95% confidence interval is a range around the sample mean. We can say that we are 95% confident that the true population mean is within this range, based on the methods we used to calculate the range. If we were to construct similar intervals for 100 samples drawn from this population, on average 95 of the intervals will contain the true population mean.
P(μ-2σ≤x≤μ-σ)
13.5%
If a standardized test has a mean score of 500 and standard deviation of 100, what percentage of test-takers score between 500 and 600?
34% 100 is one standard deviation above the mean (600-500 =100= 1*100 = 1*stdev). We know that approximately 68% of the distribution is within 1 standard deviation of the mean. Therefore 34% must fall beyond 1 standard deviation above the mean
A grocery store owner wants to analyze how weather, day of the week, and time of day are related to the number of transactions completed per hour. Which of the following hypothesis tests is NOT conducted in the multiple regression model that contains these variables?
A hypothesis test for the significance of day of the week on time of day, provided number of transactions remain constant
margin of error
CONFIDENCE.NORM(alpha, standard_dev, size)
A company randomly surveys 15 VIP customers and records their customer satisfaction scores out of a possible 100 points. Based on the data provided, calculate a 90% confidence interval to estimate the true satisfaction score of all VIP customers.
First calculate the mean and standard deviation of the sample, which you can do in Excel using either the descriptive statistics tool or the AVERAGE and STDEV.S functions. The mean and standard deviation are approximately 76.60 and 11.28 respectively. Since the sample size is only 15, use the function CONFIDENCE.T(alpha, standard_dev, size) to find the margin of error using the t-distribution. Here, this is approximately CONFIDENCE.T(0.1,11.28,15)=5.13. The lower bound of the 90% confidence interval is the mean minus the margin of error, 76.60-5.13=71.47. The upper bound of the 90% confidence interval is the mean plus the margin of error, 76.60+5.13=81.73. You must link directly to values in order to obtain the correct answer.
multicollinearity
Multicollinearity means that two or more of the independent variables are collinear, meaning they are highly correlated. One or more the independent variables may not be significant because the variable with which it is correlated serves as a proxy variable. Multicollinearity is typically not a problem when the model is being used for forecasting, especially if the predicative power of the model is increased by the additional variable(s). Multicollinearity affects the estimates of the coefficients, thereby distorting the net relationships. Multicollinearity can be reduced by increasing the sample size. Multicollinearity can be reduced by removing one or more of the collinear variables.
x
Independent variable. Use to help us predict the dependent variable
The manager of a furniture factory that operates a morning and evening shift seven days a week wants to forecast the number of chairs its factory workers will produce on a given day and shift. The production manager gathers chair production data from the factory and lists whether the production day was a weekday or a weekend (i.e., Saturday or Sunday), and whether the shift was in the morning or evening. Using the regression model, forecast the number of chairs that will be produced on a Thursday during the evening shift.
Intercept+1*weekday+0*evening shit
The linear relationship between two variables can be statistically significant but not explain a large percentage of the variation between the two variables. This would correspond to which pair of R^2 and p-value?
Low R-squared, Low p-value
Dummy Variables
Shoe color, number on an athlete's jersey, gender, and ice cream flavor are categorical/qualitative variables and need to be transformed into dummy variables.
R Square
The R Square value measures how much of the total variation in the dependent variable (in this case, revenue) that is explained by the independent variable (in this case, away game). As shown in the regression output, the R-square value is 0.2252, or approximately 22.5%
Which of the following formulas would calculate the statistic that is MOST APPROPRIATE for comparing the variability of two data sets with different distributions?
This is the formula for the coefficient of variation, the best statistic to compute to compare the variability of two data sets with different distributions. Dividing by the mean provides a measure of the distribution's variation relative to the mean.
Type I error
We incorrectly reject the null hypothesis when it is actually true (False positive)
Type II error
we incorrectly fail to reject the null hypothesis when it is actually not true (False negative)
z score
z = (x-μ)/σ,
Z value
z-score is is z = (x-μ)/σ, where x is the raw score, μ is the population mean, and σ is the population standard deviation. As the formula shows, the z-score is simply the raw score minus the sample mean, divided by the sample standard deviation.