Business Analytics

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Conditional Mean

=AverageIF(range, condition, range)

multiple regression

investigate the relationship between a dependent variable and multiple independent variables. ŷ =a+b1x1+b2x2+...+bkxk Coefficients in multiple regression characterize relationships that are net with respect to the independent variables included in the model but gross with respect to all omitted independent variables.

What can be concluded from the fact that the correlation coefficient between the acceptance rate at the top 100 U.S. MBA programs and the percent of students in those programs who are employed upon graduation is -0.32?

-0.32 is negative which indicates that, on average, as acceptance rate decreases, the percent of students employed upon graduation increases.

Hidden value

A hidden variable is one that is correlated with each of two variables that are not fundamentally related to each other. In this case, the size of the fire leads to a call for more firefighters, and the size of the fire also generally leads to more damage. The number of firefighters does not lead to a greater amount of fire damage.

Calculate the correlation coefficient between the acceptance rate at the top 100 U.S. MBA programs and the percent of students in those programs who are employed upon graduation.

CORREL(B2:B101,C2:C101)=-0.32. The correlation coefficient between the acceptance rate at the top 100 U.S. MBA programs and the percent of students that are employed upon graduation is approximately -0.32.

Before beginning a hypothesis test, an analyst specified a significance level of 0.10. Which of the follow is true?

Correct. The significance level specifies how different the observed sample mean has to be from the mean expected under the null hypothesis before we reject the null hypothesis. A significance level of 0.10 means that the observed sample mean is so different from the mean expected under the null hypothesis that it would only occur 10% of the time if the null hypothesis were true.

multicollinearity

Multicollinearity occurs when there is a strong linear relationship among two or more of the independent variables. Indications of multicollinearity include seeing an independent variable's p-value increase when one or more other independent variables are added to a regression model. We may be able to reduce multicollinearity b

multicollinearity

Multicollinearity occurs when two or more independent variables are highly correlated Multicollinearity is usually not an issue when the regression model is only being used for forecasting

If an independent variable has a p-value of 0.07, which of the following could represent the Lower 95% and the Upper 95% for that variable?

Should cross 0 The p-value, 0.07, is greater than 0.05 so the independent variable is not significant at the 5% significance level. Therefore, the 95% confidence interval for the coefficient of the independent variable must include zero. The interval between -14.52 and 3.25 contains zero.

A journalist wants to determine the average annual salary of CEOs in the S&P 1,500. He does not have time to survey all 1,500 CEOs but wants to be 95% confident that his estimate is within $50,000 of the true mean. The journalist takes a preliminary sample and estimates that the standard deviation is approximately $449,300. What is the minimum number of CEOs that the journalist must survey to be within $50,000 of the true average annual salary? Remember that the z-value associated with a 95% confidence interval is 1.96.

The formula for calculating the minimum required sample size is n≥(zs/M)2 where MM is the desired margin of error for the confidence interval. (1.96449,30050,000)2=310.20(1.96449,30050,000)2=310.20.

A researcher wants to select a random sample of consumers for a study. Generate a random ID number between 0 and 1,000 for each consumer in the spreadsheet.

Use the function =RAND()*1000 in cells A2:A30 to generate random numbers for each consumer.

A sporting goods store manager wants to forecast annual sneaker revenues based on the type of sport (running, tennis, or walking), color (red, blue, white, black, or violet) and its target audience (men or women). How many independent variables should the manager include in her multiple regression analysis?

Sales revenue is the dependent variable. Type of sport, color, and target audience are categorical variables which must be represented using dummy variables. Recall that it is necessary to use one fewer dummy variables than the number of options in a category. Thus, type of sport should be represented by 3-1=2 dummy variables, color should be represented by 5-1=4 dummy variables, and target audience should be represented by 2-1=1 dummy variables, for a total of 2+4+1=7 independent variables.

A college student is interested in testing whether business majors or liberal arts majors are better at trivia. The student gives a trivia quiz to a random sample of 30 business school majors and finds the sample's average test score is 86. He gives the same quiz to 30 randomly selected liberal arts majors and finds the sample's average quiz score is 89. The student finds that the p-value for the hypothesis test equals approximately 0.0524. What can be concluded at αα=5%?

Since the p-value, 0.0524, is greater than the significance level, 0.05, the student should fail to reject the null hypothesis and conclude that there is insufficient evidence of difference between business and liberal arts majors' knowledge of trivia. Because the null hypothesis is that there is no difference between the two types of majors, this answer is correct.

The owner of a custom shoe company is interested in determining whether employee lay-offs and the adoption of new technology affected the average number of shoes made per day. Before the lay-offs and implementation of new technology, the company produced, on average, 12,154 shoes per day. After the lay-offs, the owner took a random sample of 30 days and found that the firm was now producing, on average, 11,958 shoes per day. The owner finds that the p-value for the hypothesis test is approximately 0.5687. What can be concluded at the 80% confidence level?

Since the p-value, 0.5687, is greater than the significance level, 0.20, the owner should fail to reject the null hypothesis.

A curious student in a large economics course is interested in calculating the percentage of his classmates who scored lower than he did on the GMAT; he scored 490. He knows that GMAT scores are normally distributed and that the average score is approximately 540. He also knows that 95% of his classmates scored between 400 and 680. Based on this information, calculate the percentage of his classmates who scored lower than him.

Thus, to find the standard deviation, subtract the lower bound from the mean and divide by 1.96. The standard deviation of the distribution is (B1-B2)/1.96 = (540-400)/1.96 = 71.4. (Note that because the normal curve is symmetrical, we could calculate the same value using (B3-B1)/1.96 = (680-540)/1.96 = 71.4). To find the cumulative probability, P(x ≤ 490), use the Excel function NORM.DIST(x, mean, standard_dev, TRUE). Here, NORM.DIST(B4,B1,71.4,TRUE) = NORM.DIST(490,540,71.4,TRUE) = 0.24, or 24%. Approximately 24% of his classmates scored lower than he did. You must link directly to the values in order to obtain the correct answer

z-value

a point xx is the distance xx lies from the mean, measured in standard deviations, z=x−µσz=x−µσ.

single variable linear regression

to investigate the relationship between a dependent variable and one independent variable. A coefficient in a single variable linear regression characterizes the gross relationship between the independent variable and the dependent variable.

normal distribution

bout 68% of the probability is contained in the range reaching one standard deviation away from the mean on either side, that is, P(μ−σ≤x≤μ+σ)≈68% P(μ−σ≤x≤μ+σ)≈68% About 95% of the probability is contained in the range reaching two standard deviations (1.96 to be exact) away from the mean on either side, that is, P(μ−2σ≤x≤μ+2σ)≈95% P(μ−2σ≤x≤μ+2σ)≈95% About 99.7% of the probability is contained in the range reaching three standard deviations away from the mean on either side, that is, P(μ−3σ≤x≤μ+3σ)≈99.7% P(μ−3σ≤x≤μ+3σ)≈99.7%

Previous Question Question 16 of 20 Next Question IQ scores are known to be normally distributed. The mean IQ score is 100 and the standard deviation is 15. What percent of the population has an IQ between 90 and 110?

=NORM.DIST(110,B1,B2,TRUE)-NORM.DIST(90,B1,B2,TRUE) 50%

Standard Deviation in excel

=STDEV.S(A2:A100)

Which statistic would be best for comparing the variability of two data sets with different distributions?

The coefficient of variation is the ratio of the standard deviation to the mean. The coefficient of variation should be used to compare data sets because it measures variability relative to the mean.

The client asked why the mean of the data set is so much larger than the median. Which of the following is most likely true?

The distribution of the data is skewed to the right is correct When the distribution of data is skewed to the right, the mean is most likely greater than the median. The extreme values in the right tail pull the mean towards them.

Consider the four outliers in the 2012 revenue data: companies with revenue of $237 billion, $246 billion, $447 billion, and $453 billion. If we removed these companies from the data set, what would happen to the standard deviation?

The standard deviation would decrease. is correct The standard deviation gives more weight to observations that are further from the mean. Therefore, removing the outliers would decrease the standard deviation.

An engineer designing a new type of bridge wants to test the stress and load bearing capabilities of a prototype before beginning construction. Her null hypothesis is that the bridge's stress and load capabilities are safe. Select which type of error would be worse

The type II error is that the engineer deems the bridge safe and moves onto construction even though it is not actually safe. This would be worse than presuming that a safe bridge is unsafe.

An automotive manufacturer has developed a new type of tire that the research team believes to increase fuel efficiency. The manufacturer wants to test if there is an increase in the mean gas mileage of mid-sized sedans that use the new type of tire, compared to 32 miles per gallon, the historic mean gas mileage of mid-sized sedans not using the new tires. The automotive manufacturer should perform a _____________ hypothesis test to _____________.

one-sided, analyze a change in a single population is correct The manufacturer believes that the new tires change fuel efficiency in a single direction (i.e., that efficiency increases) and thus should use a one-sided hypothesis test. The automotive manufacturer is analyzing the change of a single population mean compared to the known historic population mean of gas mileage in mid-sized sedans.

A college student is interested in testing whether business majors or liberal arts majors are better at trivia. The student gives a trivia quiz to a random sample of 30 business majors and finds the sample's average score is 86. He gives the same quiz to 30 randomly selected liberal arts majors and finds the sample's average score is 89. What is the alternative hypothesis of this test?

μBusiness≠μLiberal is correct The alternative hypothesis is the claim that is being tested. Since the student wants to test whether there is a difference between business school majors' and liberal arts majors' trivia scores, the alternative hypothesis is that the mean scores are not equal.

Previous Question Question 3 of 20 Next Question An airport shuttle company forecasts the number of hours its drivers will work based on the distance to be driven (in miles) and the number of jobs (each job requires the pickup and drop-off of one set of passengers) using the following regression equation: Travel time=-0.60+0.05(distance)+0.75(number of jobs) On a given day, Victor and Sofia drive approximately the same distance but Sofia has two more jobs than Victor. If Victor worked for 4 hours, for how long can the company expect Sofia to work?

5.5 The only difference between the workloads of the two drivers is the number of jobs each has; Sofia has two additional jobs. Therefore the company can expect Sofia to work the four hours Victor worked, plus an additional 0.75 hours for each of the two additional jobs, that is, 4+0.75(2)=5.5 hours.

Suppose you actually want to calculate the mean annual compensation of the 83 banking CEO's. Which of the following Excel functions calculates the mean? SELECT ALL THAT APPLY.

=AVERAGE(B2:B84) calculates the mean of the banking CEO's annual compensation. Note that another option is also correct. =SUM(B2:B84)/83 calculates the sum of the banking CEO's annual compensation and divides that sum by 83, the number of data points. This formula calculates the mean of the banking CEO's annual compensation. Note that another option is also correct.

The owner of an ice cream shop wants to determine whether there is a relationship between ice cream sales and temperature. The owner collects data on temperature and sales for a random sample of 30 days and runs a regression to determine if there is a relationship between temperature (in degrees) and ice cream sales. The p-value for the two-sided hypothesis test is 0.04. How would you interpret the p-value?

If there is no relationship between temperature and sales, the chance of selecting a sample this extreme would be 4%.

If the street fair organizer wanted to compare the explanatory power of the original model and the following new regression model, which value should he consult for the new model?

It is important to use the Adjusted R2 to compare two regression models that have a different number of independent variables.

When performing a hypothesis test based on a 95% confidence level, what are the chances of making a type II error?

It is not possible to tell without more information.

Which is the best estimate of the approximate amount of variability in Win Percentage that is explained by the regression model?

R-squared

A food truck operator has traditionally sold 75 bowls of noodle soup each day. He moves to a new location and after a week sees that he has averaged 85 bowls of noodle soup sales each day. He runs a one-sided hypothesis test to determine if his daily sales at the new location have increased. The p-value of the test is 0.031. How should he interpret the p-value?

There is a 3.1% chance of obtaining a sample with a mean of 85 or higher assuming that the true mean sales at the new location is still equal to or less than 75 bowls a day.

If the mean of a normally distributed population is -10 with a standard deviation of 2, what is the likelihood of obtaining a value less than or equal to -7? fx

To calculate the likelihood of obtaining a value less than or equal to -7, P(x≤-7), use the Excel function NORM.DIST(x, mean, standard_dev, TRUE).

If statement

=IF(A2="For",1,0)

R-squared vs adjusted R-squared

Because R2 never decreases when independent variables are added to a regression, it is important to multiply it by an adjustment factor when assessing and comparing the fit of a multiple regression model. This adjustment factor compensates for the increase in R2 that results solely from increasing the number of independent variables. Adjusted R2 is provided in the regression output. It is particularly important to look at Adjusted R2, rather than R2, when comparing regression models with different numbers of independent variables.

dummy value

Multiple regression models allow us to include multiple dummy variables for categorical data—day of week, for example. A dummy variable is equal to 1 when the variable of interest fits a certain criterion. For example, a dummy variable for "Saturday" would equal 1 for observations relating to Saturdays and 0 for observations related to all other days.

Select the p-value(s) at which you would reject the null hypothesis for a two-sided test at the 90% confidence level.

To reject the null hypothesis at the 90% confidence level, the p-value must be less than 1-0.90=0.10. 0.0900 is less than 0.10 so we can reject the null hypothesis.

Calculating the range of likely sample means using

CONFIDENCE.NORM or CONFIDENCE.T =T.TEST(array1, array2, tails, type)

Assume we have created two single linear regression models, and a multiple regression model to predict selling price based on HouseSizeHouseSize alone, DistancefromBostonDistancefromBoston alone, or both. The three models are as follows, where HouseSizeHouseSize is in square feet and DistancefromBostonDistancefromBoston is in miles: SellingPrice=13,490.45+255.36(HouseSize)SellingPrice=13,490.45+255.36(HouseSize) SellingPrice=686,773.86-15,162.92(DistancefromBoston)SellingPrice=686,773.86-15,162.92(DistancefromBoston) SellingPrice=194,986.59+244.54(HouseSize)-10,840.04(DistancefromBoston)SellingPrice=194,986.59+244.54(HouseSize)-10,840.04(DistancefromBoston) House A and House B are the same size, but located in different neighborhoods: House B is five miles closer to Boston than House A. If the selling price of House A is $450,000, what would we expect to be the selling price of House B?

Since the two houses are the same size, to predict the expected difference in selling prices we should use -$10,840.04/mile, the net effect of distance on selling price (that is, the effect of distance on selling price controlling for house size), which can be found in the multiple regression model. House B is five miles closer to Boston than House A so House B's expected selling price is: House A's selling price+net effect of distance on selling price ≈ $450,000+$10,840.04(5 miles) ≈ $450,000+$54,200.20 ≈ $504,200.20

What is the difference between analyzing residual plots for single variable regression models and analyzing residual plots for multiple regression models

Single variable regression plots give insight into the gross relationship between the independent and dependent variable, whereas multiple regression plots give insight into the net relationship, controlling for the other independent variables included in the regression model.

Calculate the coefficient of variation for the average driving distances of the PGA Tour.

Coefficient of Variation=Standard Deviation/ Mean. Entering =E6/E2 calculates the coefficient of variation, which is approximately 0.03.

The following data set provides the 2012 revenue (in billions of dollars) for the top 75 companies as declared by the Fortune 500 rankings. What amount do 60% of the companies earn equal to or less than?

PERCENTILE.INC(B2:B76,0.60)=$74.40 billion. You must link directly to values in order to obtain the correct answer.

What does R-square indicate?

R-square indicates what percentage of the variability in the dependent variable is explained by the regression line

Lagged variable

The lag period is based on managerial insight and data availability. Including lagged variables has some drawbacks: Each lagged variable decreases our sample size by one observation. If the lagged variable does not increase the model's explanatory power, the addition of the variable decreases Adjusted R2.

A college student is interested in testing whether business majors or liberal arts majors are better at trivia. The student gives a trivia quiz to a random sample of 30 business school majors and finds the sample's average score is 86. He gives the same quiz to 30 randomly selected liberal arts majors and finds the sample's average score is 89. Using the data provided below, calculate the p-value for the following hypothesis test: H0:μBusiness=μLiberal Arts Ha:μBusiness≠μLiberal Arts

The p-value of the two-sided hypothesis test is T.TEST(array1, array2, tails, type)=T.TEST(A2:A31,B2:B31,2,3), which is approximately 0.0524. You must designate this test as a two-sided test (that is, assign the value 2 to the tails argument) and as a type 3 test (an unpaired test with unequal variances) because you are testing two different samples.

The owner of a custom shoe company is interested in determining whether employee lay-offs and the adoption of new technology affected the average number of shoes made per day. Before the lay-offs and implementation of new technology, the company produced, on average, 12,154 shoes per day. After the lay-offs, the owner took a random sample of 30 days and found that the firm was now producing, on average, 11,958 shoes per day. The owner finds that the p-value for the hypothesis test is approximately 0.5687. How would you interpret the p-value?

f the average number of shoes made per day is still 12,154, the likelihood of obtaining a sample with a mean at least as extreme as 11,958 is 56.87% is correct The null hypothesis is that the average number of shoes made per day has not changed, that is, it is still 12,154. Therefore, the p-value of 0.5687 indicates that if the average number of shoes is still 12,154, the likelihood of obtaining a sample with a mean at least as extreme as 11,958 is 56.87%.

IQ scores are known to be normally distributed. The mean IQ score is 100 and the standard deviation is 15. The top 25% of the population (ranked by IQ score) have IQ's above what value?

he Excel function NORM.INV(probability, mean, standard_dev) returns the inverse of a normal cumulative distribution function. Here, NORM.INV(0.75,B1,B2)=NORM.INV(0.75,100,15)=110 indicates that 75% of people have IQ's lower than 110. Hence, 25% of people have IQ's greater than 110.

Central Limit Theorem

states that if we take enough sufficiently large samples from any population, the means of those samples will be normally distributed, regardless of the shape of the underlying population.


Kaugnay na mga set ng pag-aaral

Regulation - R2 - Individual Taxation

View Set

Kinesiology Chapter 5 Study Guide

View Set

2.07 Derivatives of Exponential and Logarithmic Functions

View Set

Chapter 22: Darwin and Descent with Modification (MasteringBiology- Pearson)

View Set

Contraceptive/Woman's Health quiz

View Set

Ch 28: Hematological Assessment of Hematologic Function and Treatment Modalities

View Set

[AAPC - Pathophysiology] Musculoskeletal Anatomy

View Set