Business Analytics Final Exam
which of the following are assumptions of regression
- there is a population regression line - the response variable is normally distributed - the errors are probabilistically independent
the correlation value ranges from
-1 to +1
the percentage of variety of (R^2) ranges from
0 to +1
as an investor in movies, you want to calculate the probability of investing successful Hollywood movies, given that the probability of a Hollywood movie being successful is 10% and you are investing in 10 total movies, what is the probability that AT LEAST 4 will be successful?
1-binom.dist(3,10,.1,true)
Suppose it is known that the distribution of purchase amounts by customers entering a popular retail store is approximately normal with mean $75 and standard deviation $20. What is the probability that a randomly selected customer spends more than $45 at this store?
1-norm.dist(45,75,20,1)
if the average speed of cars on route 18 are 60 mph with a standard deviation 5 mph, what is the probability that cars will be driving greater than 70 mph?
1-norm.dist(70,60,5,true)
Chanel has hired you to control the inventory of their signature bags. Graphing the monthly sales of hand bags in the past, you feel that the sales tend to follow a poisson distribution, with the average being 100 bags a month. What is the probability that at least 105 bags will be sold in a given month?
1-poission.dist(104,100,true)
in regression analysis, the variables used to help explain or predict the response variable are called the independent variables
dependent variable
the weakness of scatterplots is that they
do not actually quantify the relationship between variable
the ANOVA table splits the total variation into two parts. They are the
explained and unexplained variation
which of the following definitions best describes parsimony?
explaining the most with the least
a single variable X can explain a large percentage of the variation in some other variable Y when the two variables are
highly correlated
a scatterplot that exhibits a "fan" shape (the variety of Y increases as X increases) is an example of:
homoscedasticity
another term for constant error variance is:
homoscedasticity
regression analysis asks
how a single variable depends on other relevant variables
when determining whether to include or exclude a variable in regression analysis, if the p-value associated with the variable's t-value is above some accepted significance value, such as 0.05, then the variable:
is a candidate for exclusion
in multiple regression, the constant
is the expected value of the dependent variable Y when all of the independent variables have the value zero
which of the following is true regarding regression error
it cannot be calculated from the observed data
the covariance is not used as much as the correlation because
it is difficult to interpret
outliers are observations that
lie outside the typical patter of points on a scatterplot
in regression analysis, if there are several explanatory variables, it is called:
multiple regression
a correlation value of zero indicates
no linear relationship
a scatterplot that appears as a shapeless mass of data points indicates
no relationship among the variables
if the average speed of cars on route 18 are 60 mph with a standard deviation 5 mph, what is the probability that cars will be driving less than 50 mpg?
norm.dist(50,60,5,true)
Suppose it is known that the distribution of purchase amounts by customers entering a popular retail store is approximately normal with mean $75 and standard deviation $20. What is the probability that a randomly selected customer spends less than $85 at this store?
norm.dist(85,75,20,1)
Suppose it is known that the distribution of purchase amounts by customers entering a popular retail store is approximately normal with mean $75 and standard deviation $20. What is the probability that a randomly selected customer spends between $65 and $85 at this store?
norm.dist(85,75,20,1)-norm.dist(65,75,20,1)
if the average speed of cars on route 18 are 60 mph with a standard deviation 5 mph, what is the probability that cars will be driving between 80 and 90 mph?
norm.dist(90.60,5,true)-norm.dist(80,true)
if the average salary of a graduating senior is $70,000 with a standard deviation of $10,000, how much money must a senior be offered to be in the bottom 10% of salaries of the graduating seniors?
norm.inv(.1,70000,10000)
if the average salary of a graduating senior is $70,000 with a standard deviation of $10,000, how much money must a senior be offered to be in the top 20% of salaries of the graduating seniors?
norm.inv(.8,70000,10000)
Suppose it is known that the distribution of purchase amounts by customers entering a popular retail store is approximately normal with mean $75 and standard deviation $20. Find the dollar amount such that 80% of all customers spend AT LEAST this amount.
norm.inv(0.2,75,20) OR 1-norm.inv(0.8,75,20)
Suppose it is known that the distribution of purchase amounts by customers entering a popular retail store is approximately normal with mean $75 and standard deviation $20. Find the dollar amount such that 75% of all customers spend NO MORE than this amount.
norm.inv(0.75,75,20)
Chanel has hired you to control the inventory of their signature bags. Graphing the monthly sales of hand bags in the past, you feel that the sales tend to follow a poisson distribution, with the average being 100 bags a month. What is the probability that exactly 100 bags will be sold in a given month?
poission.dist(100,100,false)
Chanel has hired you to control the inventory of their signature bags. Graphing the monthly sales of hand bags in the past, you feel that the sales tend to follow a poisson distribution, with the average being 100 bags a month. What is the probability that between 90 and 110 bags will be sold in a given month?
poisson.dist(110,100,true)-poission.dist(89,100,true)
Chanel has hired you to control the inventory of their signature bags. Graphing the monthly sales of hand bags in the past, you feel that the sales tend to follow a poisson distribution, with the average being 100 bags a month. What is the probability that at most 95 bags will be sold in a given month?
poisson.dist(95,100,true)
the error term represents the vertical distance from any point to the
population regression line
in linear regression, the fitted value is the:
predicted value of the dependent variable
In linear regression, we can have an interaction variable. Algebraically, the interaction variable is the other variables in the regression equation.
product
in linear regression, we fit the least squares line to a set of values (or points on a scatterplot). the distance from the line to a point is called the:
residual
the percentage of variation (R^2) can be interpreted as the fraction (or percent) of variation of the
response variable explained by the regression line
________________ is/are especially helpful in identifying outliers
scatterplots
in choosing the "best-fitting" line though a set of points in linear regression, we choose the one with the:
smallest sum of squared residuals
the standard error of the estimate is essentially the
standard deviation of the residuals
the test statistic in an ANOVA analysis is:
the F-statistic
in regression analysis, multicollinearity refers to
the explanatory variables being highly correlated
the adjusted R^2 adjusts R^2 for:
the number of explanatory variables in a multiple regression model
correlation is a summary measure that indicates
the strength of the linear relationship between pairs of variables
in the regression analysis, the ANOVA table analyzes
the variation of the response variable Y
When the error variance is nonconstant, it is common to see the variation increases as the explanatory variable increases (you will see a "fan shape" in the scatterplot). There are two ways you can deal with this phenomenon. These are:
the weighted least squares and a logarithmic transformation
the term autocorrelation refers to
time series variables are usually related to their own past values
in linear regression, a dummy variable is used:
to include categorical variables in the regression equation
a "fan" shape in a scatterplot indicate
unequal variance
Which of the following definitions best describes parsimony? A. Explaining the least with the most B. Explaining the most with the least C. Being able to predict the value of the response variable far into the future D. Being able to explain all of the change in the response variable
A. Explaining the least with the most
Which of these is especially helpful in identifying outliers. A. Scatterplots B. Linear regression C. Normal curves D. Regression analysis
A. Scatterplots
The weakness of scatterplots is that they: A. do not actually quantify the relationships between variables B. do not help identify linear relationships C. only help identify outliers D. can be misleading about the types of relationships they indicate
A. do not actually quantify the relationships between variables
In regression analysis, if there are several explanatory variables, it is called: A. multiple regression B. composite regression C. compound regression D. simple regression
A. multiple regression
The value set for alpha is known as: A. the significance level B. the error in the hypothesis test C. the rejection level D. the acceptance level
A. the significance level
Another term for constant error variance is: A. autocorrelation B. homoscedasticity C. multicollinearity D. heteroscedasticity
B. homoscedasticity
If a teacher is trying to prove that new method of teaching economics is more effective than traditional one, he/she will conduct a: A. confidence interval B. one-tailed test C. two-tailed test D. point estimate of the population parameter
B. one-tailed test
In linear regression, we can have an interaction variable. Algebraically, the interaction variable is the -------------- of other variables in the regression equation A. sum B. product C. Ration D. Mean
B. product
In linear regression, we fit the least squares line to a set of values. The distance from the line to a point is called the A. fitted value B. residual C. correlation D. covariance
B. residual
The correlation value ranges from A. -2 to +2 B. 0 to +1 C. -1 to +1 D. None of the above
C. -1 to +1
Time series data often exhibits which of the following characteristics? A. homoscedasticity B. multicollinearity C. autocorrelation D. heteroscedasticity
C. autocorrelation
The test statistic in an ANOVA analysis is: A. the t-statistic B. the z-statistic C. the F-statistic D. the Chi-square statistic
C. the F-statistic
In linear regression, a dummy variable is used: A. to include hypothetical data in the regression equation B. to represent missing data in each sample C. to include categorical variables in the regression equation D. to represent residual variables
C. to include categorical variables in the regression equation
The hypothesis that an analyst is trying to prove is called the: A. elective hypothesis B. optional hypothesis C. null hypothesis D. alternative hypothesis
D. alternative hypothesis
Data collected from approximately the same period of time from a cross-section of a population are called: A. historical data B. time series data C. linear data D. cross-sectional data
D. cross-sectional data
The ANOVA table splits the total variation into two parts. They are the A. acceptable and unacceptable variation B. resolved and unresolved variation C. adequate and inadequate variation D. explained and unexplained variation
D. explained and unexplained variation
A scatterplot that appears as a shapeless mass of data points indicates: A. a linear relationship among the variables B. a curved relationship among the variables C. a nonlinear relationship among the variables D. no relationship among the variables
D. no relationship among the variables
In regression analysis, multicollinearity refers to: A. the response variables being highly correlated B. the response variables are highly correlated over time C. the response variable(s) and the explanatory variable(s) are highly correlated with one another D. the explanatory variables being highly correlated
D. the explanatory variables being highly correlated
The adjusted R square adjust R square for: A. non-linearity B. outliers C. low correlation D. the number of explanatory variables in a multiple regression model
D. the number of explanatory variables in a multiple regression model
which of the following is an example of a nonlinear regression model?
a quadratic regression equation a logarithmic regression equation constant elasticity equation the learning curve model
time series data often exhibits which of the following characteristics?
autocorrelation
Which of the following is not one of the assumptions of regression? a. There is a population regression line b. The standard deviation of the response variable increases as the explanatory variables increase c. The errors are probabilistically independent d. The response variable is normally distributed
b. The standard deviation of the response variable increases as the explanatory variables increase
forward regression
begins with no explanatory variables in the equation and successively adds one at a time until no remaining variables make a significant contribution.
as an investor in movies, you want to calculate the probability of investing in 3 successful Hollywood movies, given that the probability of a Hollywood movie being successful is 10% and you are investing in 10 total movies.
binom.dist(3,10,.1,false)
as an investor in movies, you want to calculate the probability of investing successful Hollywood movies, given that the probability of a Hollywood movie being successful is 10% and you are investing in 10 total movies, what is the probability that AT MOST 3 will be successful?
binom.dist(3,10,.1,true)
as an investor in movies, you want to calculate the probability of investing successful Hollywood movies, given that the probability of a Hollywood movie being successful is 10% and you are investing in 10 total movies, what is the probability that between 3 and 6 will be successful?
binom.dist(6,10,.1,ture)-binom.dist(2,10,.1,true)
data collected from approximately the same period of time from a cross-section of a population are called:
cross-sectional data
