MKT 317 Quizzes 1-4
Sometimes when summary statistics are shown for a variable, we see an abbreviation "1st Qu." What does "1st Qu." represent? Average Smallest value after we have removed the outliers 25th percentile Smallest value
25th Percentile
For the model Y = 2 * 1.4^X Whenever X increases 3 units, the predicted value of Y increases: 5.488% 5.488 units 174% 174 units
Answer: 174% Explanation: We know it will be an exponential model because of the equation, so if X changes absolutely, Y will change by a percentage. Plug in 3 for X, then 6 for X, and see by how much percentage it increases!
Suppose we have the following multiple linear regression model. The variables Online ad budget and Print ad budget are both given in dollars. Predicted number of units sold = 2000 + 1.6*(Online ad budget) + 1.3*(Print ad budget) When the online ad budget equals $3000, and the print ad budget equals $2000, then the predicted number of units sold equals _______ . 7400 9400 16500 None of the above / not enough information.
Answer: 9400 Explanation: No issues here, just plug in the numbers and solve!
Suppose you create a scatter plot where the x-axis uses the log scale and the y-axis uses a linear scale. Suppose the dots on this scatter plot do not follow a straight-line trend (the trend is not even close to a straight-line trend). From this information, we can conclude that ____________. A logarithmic model will not be an accurate model for this data set. An exponential model will not be an accurate model for this data set. A logarithmic model will be an accurate model for this data set. An exponential model will be an accurate model for this data set.
Answer: A logarithmic model will not be an accurate model for this data set.
Suppose we wish to create a model that provides the following interpretation: Whenever X increases $50, the predicted value of Y increases ___%. What type of model should we use? Logarithmic Exponential Power Law Linear
Answer: Exponential Explanation: For exponential models, the log is with the Y variable and there is not a log with the X variable, so that means there must be a percentage change in Y when there is an absolute change in X.
Suppose we create a multiple linear regression model with dependent variable Y and independent variables X1, X2, X3, X4, and X5. True or False: If none of the independent variables (X1, X2, X3, X4, X5) are strongly correlated with the variable Y, then we can conclude that there is little or no multicollinearity in the model.
Answer: False / Not Enough Information Explanation: Just because X and Y are not correlated does not mean there is not multicollinearity, we don't have enough information to draw a conclusion.
Suppose we have the following multiple linear regression model. The variables Online ad budget and Print ad budget are both given in dollars. Predicted number of units sold = 2000 + 1.6*(Online ad budget) + 1.3*(Print ad budget). After controlling for the online ad budget, whenever the print ad budget increases $100, the predicted number of units sold ____ . Increases 130 Increases 160 Increases 290 None of the above / not enough information
Answer: Increases 130 Explanation: 1.3*100 = 130.
Suppose you are looking at a graph and the y-axis is labeled with the following numbers (all appear to be equally-spaced on the graph): 1, 10, 100, 1000, 10000, 100000 What type of scale is being used on the y-axis? Linear Log Both of the Above None of the Above
Answer: Log Explanation: It is log because from 1 to 100, it increases by 10 times, 10 to 100, it increases by 10 times, etc.
A large data set has a variable called "Pet owner," which records if an individual owns a pet. The possible values for the Pet owner variable are yes and no. What type of variable is Pet owner? Categorical Quantitative
Categorical
Which of the following is an example of descriptive statistics? Displaying a graph that shows the total number of LEGO sets sold at the East Lansing Meijer during the previous four holiday seasons. Using data from the previous holiday seasons to predict demand. In other words, using past data to estimate how many LEGO sets the East Lansing Meijer should have in stock for the upcoming holiday season.
Displaying a graph that shows the total number of LEGO sets sold at the East Lansing Meijer during the previous four holiday seasons.
Suppose we have the following model, where the variables Sales and Budget are given in dollars. Suppose we use a large data set to compute the following model: Predicted Sales (dollars) = 60,000 + 1.8*(Budget) When the budget equals $5,000, then the predicted sales equals $ ____ .
Plug "5000" into the budget variable to get answer "69000".
A large data set has a variable called "Sales," which records the exact amount, in dollars, of an individual transaction. What type of variable is Sales? Categorical Quantitative
Quantitative
Suppose we are told that the correlation coefficient between X and Y is -0.97 What does this mean? There is a strong correlation between X and Y, and when X increases, the average value of Y also increases (Positive). There is a weak correlation between X and Y. We are not given enough information to determine if the correlation between X and Y is strong or weak. There is a strong correlation between X and Y, and when X increases, the average value of Y decreases (Negative).
There is a strong correlation between X and Y, and when X increases, the average value of Y decreases (Negative).
Suppose we are told that the correlation coefficient between X and Y is 0.00003. What does this mean? We are not given enough information to determine if the correlation between X and Y is strong or weak. There is a strong correlation between X and Y, and when X increases, the average value of Y also increases. There is a strong correlation between X and Y, and when X increases, the average value of Y decreases. There is a weak correlation between X and Y.
There is a weak correlation between X and Y.
You will need RStudio to complete this question. In RStudio, there is a built in data set named cars One of the variables in the cars data set is dist What is the Median of the variable dist?
To find answer, use command "summary(cars)" to view the cars data set's, and see info for the two variables "Speed" and "Dist" within the dataset.
You will need RStudio to complete this question. In RStudio, there is a built in data set named cars One of the variables in the cars data set is speed What is the 3rd Quartile (3rd Qu.) of the variable speed?
To find answer, use command "summary(cars)" to view the cars data set's, and see info for the two variables "Speed" and "Dist" within the dataset.
You will need to use RStudio for this question. In RStudio, there is a built in data set named "chickwts" The data set "chickwts" contains information recorded about chickens using two variables: weight and feed. For this question, we want to consider the weights of only a specific feed type. What is the 1st Quartile (1st Qu.) weight for chickens who were fed the soybean feed type? Answer with one decimal place.
To find answer, use command "with(chickwts, tapply(weight, feed, summary))".
You will need to use RStudio for this question. In RStudio, there is a built in data set named chickwts The data set chickwts contains information recorded about chickens using two variables: weight and feed. For this question, we want to consider the weights of only a specific feed type. What is the 3rd Quartile (3rd Qu.) weight for chickens who were fed the casin feed type? Answer with one decimal place.
To find answer, use command "with(chickwts, tapply(weight, feed, summary))".
Suppose we have the following model, where the variables Revenue is given in dollars and Temperature is given in degrees Fahrenheit. Suppose we use a large data set to compute the following model: Predicted Revenue (dollars) = 41,500 + 7,500*(Temperature) Whenever the temperature increases 10 degrees Fahrenheit, the revenue increases $________ .
To find this answer, times the increase amount by the slope. Answer: 7500*10 = 75000.
In R, there is a built in data set named cars. What R command do we use to see a preview of the data? Print(cars) View(cars) Preview(cars) Show(cars)
View(cars)
Which of the following is an example of descriptive statistics? A master's degree program using past data to estimate what the starting salary will be for future graduates. A master's degree program calculating the average starting salary of students who graduated from their program last year.
A master's degree program calculating the average starting salary of students who graduated from their program last year.
Suppose we have a very large data set with 20 quantitative variables; there is a variable Y that we wish to predict, and 19 variables that can be used as independent variables (X1, X2, X3, ..., X19) How do we determine which of X1, X2, ..., X19 is the most strongly correlated with Y? Which of the following methods can we use to answer the question above? (a) Make a correlogram and look for the biggest dot that's in the row or the column labeled Y (except the dot on the diagonal). (b) Make a correlogram and look for the biggest dot anywhere in the correlogram (except the dots on the diagonal). (c) Make a multiple linear regression model that includes all 19 independent variables and select the variable that has the lowest p-value. Method (a) only. Method (b) only. Method (c) only. Methods (a) and (b), but not (c). Methods (a) and (c), but not (b). Methods (b) and (c), but not (a). All of methods (a), (b), and (c). None of methods (a), (b), or (c).
Answer: Method (a) only. Explanation: Method (b) would not work because a big dot between two X variables has nothing to do with an X variable being related to Y, so that wouldn't do much for us. Method (c) does not matter either because P-Values don't tell us how strongly X and Y would be correlated, and you never want to compare P-Values to each other, so this wouldn't do anything for us either.
Suppose we have the following multiple linear regression model. The variables Online ad budget and Print ad budget are both given in dollars. Predicted number of units sold = 2000 + 1.6*(Online ad budget) + 1.3*(Print ad budget) Overall, whenever the print ad budget increases $200, the predicted number of units sold ____ . Increases 260 Increases 320 Increases 580 None of the above / not enough information
Answer: None of the above / not enough information. Explanation: Did not control the "Online ad budget" variable, therefore making any conclusions with this model/idea not reliable.
Suppose we have the following multiple linear regression model. The variables Online ad budget and Print ad budget are both given in dollars. Predicted number of units sold = 2000 + 1.6*(Online ad budget) + 1.3*(Print ad budget) When the online ad budget equals $1000, then the predicted number of units sold equals _______ . 1600 3600 4900 None of the above / not enough information.
Answer: None of the above / not enough information. Explanation: Did not receive value for print ad budget, so we can't figure out Y when we didn't get both X values.
True or False: Sometimes polynomial models can be created that are very accurate to model data from a sample, but can be too over complicated to the point that the model is not an accurate representation of trends in the larger population.
Answer: True Explanation: Polynomial models are made for sample specific data, so although it may fit a sample of the data perfectly, it does not mean it will translate over well to the entire population data.
Suppose we create a multiple linear regression model. Suppose this model has high multicollinearity. True or False: Because this model has high multicollinearity, we must be careful about interpreting the coefficients of a model; multicollinearity can cause some misleading interpretations.
Answer: True Explanation: The interpretation of a slope is most accurate when: 1.) The model fits that data, 2.) the P-Value is small (below 0.05) and 3.) There is no multicollinearity in the model. If one of these is false, interpretation is likely to be inaccurate or not meaningful.
Suppose we have a very large data set with variables Sales, Temperature, Windspeed, Humidity, and Budget. We wish to create a model with the following interpretation: When comparing sales on days of the same temperature, an increase in budget by $1000 will increase the predicted sales by _______ . To create a model that will answer this question, how would we proceed? Use Y = Sales. Use two or more X-variables: include Temperature and Budget, and additionally may additionally include Windspeed and Humidity. Use Y = Temperature. Use the X-variables Sales and Budget. Use Y = Sales. Use two or more X-variables: use Temperature and Budget, and may additionally include windspeed and humidity. Use Y = Sales. Use exactly two X-variables: Temperature and Budget. Use Y = Sales. Use X = Budget.
Answer: Use Y = Sales. Use exactly two X-variables: Temperature and Budget.
Suppose we have the model: Y = 4 * X^1.2 What interpretation would we have? (Some number are rounded below - select the best answer) Whenever X increases 20%, the predicted value of Y increases 24% Whenever X increases 20%, the predicted value of Y increases 0.58 units. Whenever X increases 1 unit, the predicted value of Y increases 20% Whenever X increases 1 unit, the predicted value of Y increases 4 units.
Answer: Whenever X increases 20%, the predicted value of Y increases 24%. Explanation: This equation is for Power Law, so it's unnecessary to do the steps to figure out how much it changes, because with Power Law, when X increases/decrease by n%, Y increases/decreases by n%.
In the equation below, "LN" represents the natural log. Suppose we create a trend line whose equation is Predicted Revenue (million dollars) = LN(30) + 1.5*(Budget, million dollars) What is the relationship between X and the predicted value of Y? Whenever the budget increases 2 million dollars, then the predicted revenue increases 3 million dollars. Whenever the budget increases 2 million dollars, then the predicted revenue increases 150% Whenever the budget increases 20%, then the predicted revenue increases 3 million dollars. Whenever the budget increases 20%, then the predicted revenue increases 150%
Answer: Whenever the budget increases 2 million dollars, then the predicted revenue increases 3 million dollars. Explanation: There is no model that regularly has a log of a number in the equation, so to solve for this, just take the LN of 30, which is 3.40. After that, the equation would be: Predicted Revenue (million dollars) = 3.40 + 1.5*(Budget, million dollars), which is just a linear model, so an absolute change in X means an absolute change in Y.
Suppose you have a data set with variables Supply and Demand. You would like to create a model that gives an interpretation of the form: "Whenever supply increases 10K units, the predicted demand decreases ___ units." How would you create a model in R that can be used to answer this question using the methods we've learned in this class? lm(Demand ~ Supply, data=DataName) lm(log(Demand) ~ Supply, data=DataName) lm(Demand ~ log(Supply), data=DataName) lm(log(Demand) ~ log(Supply), data=DataName)
Answer: lm(Demand ~ Supply, data=DataName)
Suppose you have a data set with variables Supply and Demand. You would like to create a model that gives an interpretation of the form: "Whenever supply increases 10K units, the predicted demand decreases ___ %" How would you create a model in R that can be used to answer this question using the methods we've learned in this class? lm(Demand ~ Supply, data=DataName) lm(log(Demand) ~ Supply, data=DataName) lm(Demand ~ log(Supply), data=DataName) lm(log(Demand) ~ log(Supply), data=DataName)
Answer: lm(log(Demand) ~ Supply, data=DataName)
Suppose we wish to create a model that predicts the customer satisfaction score for call center based on the amount of time that customers wait on hold. If we were to create the model described above, what would we use as the Y-variable of the model? The Slope Amount of Time Customers Wait on Hold Customer Satisfaction Score The Intercept
Customer Satisfaction Score