MKT 317

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Dr. Bronlyn creates a linear model in Tableau for the variables X and Y. She views details of the linear model, and Tableau provides the following information. Coefficients Table R-squared for model: 0.3063 To determine if there is a statistically significant correlation between X and Y, we would look at the number ________ that appears in the output above, and make a conclusion about if X and Y are correlated based on if that number is greater than 0.05 or less than 0.05. Please do not round your answer: answer the exact number.

0.0480 #25 see picture

This question requires Tableau or Excel and uses the data set Exam3DataSS22 that is available in D2L (same data set as was used in previous exam questions). Question 4: The rows of data in the data set represent monthly sales, so if we compute the average of the variable sales, we are computing the average monthly sales. What are the average monthly sales in the North? _______

1,858,162

Suppose the odds it rains tomorrow are 3 to 10. This means that the probability it rains tomorrow is ______ % You may round your answer to the nearest whole number.

23

Sapphires are gemstones that are very similar to in hardness to diamonds. Both are common gemstones in jewelry, however sapphires are significantly cheaper than similar-sized diamonds. Let's suppose we have a data set about sapphires. The variables are price (price of the sapphire, in dollars) carat (weight of sapphire, measured in carats). quality (ok, good, better, best) Suppose we use the R commands below to create a regression model: lm( log(price) ~ carat + quality, data=SapphireSales ) The estimated coefficients and p-values associated with the model are provided below: When comparing sapphires of the same carat, on average the best-quality sapphires are _______ % more expensive than the ok-quality sapphires.

58 #24 see picture

Suppose we have the following multiple linear regression model below. Predicted Demand (millions) = 640 + 2.0(X) + 70(North) - 30(South) + 70(East) When X=60, the predicted demand in the North equals _______ .

830

In Module 3, we used the term "absolute change" several times throughout the module. What is an "absolute change?"

A change by addition or subtraction.

Suppose Dr. Bronlyn has a very large data set. There is a dependent variable named Y, as well as 40 independent variables, X1, X2, X3, etc. Which of the following models can Dr. Bronlyn use to determine if there is a statistically significant correlation between Y and X1?

A simple linear regression model whose only independent variable is X1.

What is a correlogram?

A visual way to represent the strength and direction of correlations for several pairs of variables.

Suppose we have the following data: Y = Revenue (dollars) X = Segment (A, B, or C) We would like to determine if there are any differences in average revenue when comparing the three segments. True or False: we can use logistic regression in this situation.

False

Suppose Dr. Bronlyn has a large data set. She uses several variables from that data set to create the multiple linear regression model of the form: Y^=b0+b1(X1)+b2(X2)+b3(X3) She compute the model in R, and obtain the coefficients table below. True or False: From the output above, we can conclude that of the variables X1, X2, and X3, the variable that is the most strongly correlated with Y is X1.

False see picture

Suppose Dr. Bronlyn has a very large data set about a retail organization. She creates a multiple linear regression model where The dependent variable is Revenue (dollars) The independent variables are Google, Facebook, TV, and YouTube Marketing Budget (all measured in dollars) The coefficients table is given below. True or False: From the model output above, we can conclude that whenever the TV Marketing Budget increases $20, the predicted revenue will increase $50,000.

False or not enough information #23 see picture

Suppose we would like to know if the proportion of customers who prefer different ice cream flavors is different for different age groups. Our data set has 10 ice cream flavors and 4 age groups. We compute Pearson's Chi-Squared test, and obtain a p-value of 0.0002 What do we conclude?

We are 95% confident that the proportion of customers who prefer different ice cream flavors is different for different age groups.

Suppose we have the linear regression model below. Temperature is measured in Fahrenheit, and the value for Group is either A, B, or C. Predicted Revenue (thousand dollars) = 50 + 1.4(Temperature) + 4(Group A) + 10(Group B) Compute the predicted revenue for Group A.

We are not able to answer this question with the model provided.

Suppose we have data about a very large retail chain. The data contains information from 11600 customers. Some of the variables in the model are: UnitPrice: the average unit price of items purchased per customer. Sales: the total sales per customer (in dollars) Segment: A, B, or C Location: North, South, East, or West We create a model lm( log(Sales) ~ log(UnitPrice) + Segment ) and obtain the following results Select the best phrase that completes the following sentence: When we control for unit price, the predicted sales in Segment A ________ .

are about the same as in Segment C. #14 see picture

What type of ANOVA tests for an interaction?

factorial ANOVA

Suppose we have have three variables: Customer Satisfaction Score (measured in points) Wait time (measured in minutes) Segment (A, B, or C) Suppose we with to complete the interpretation: "After controlling for segment, whenever the wait time increases 10%, the predicted customer satisfaction score decreases ____ points." What R command would we use to create a model that can complete the interpretation above? Note that the only differences between the options below are which variables use a log and which variables remain in their original format.

lm( CustomerSatisfaction ~ log(WaitTime) + Segment )

Suppose we have have three variables: Customer Satisfaction Score (measured in points) Wait time (measured in minutes) Segment (A, B, or C) Suppose we with to complete the interpretation: "After controlling for segment, whenever the wait time increases 10%, the predicted customer satisfaction score decreases ____ %." What R command would we use to create a model that can complete the interpretation above? Note that the only differences between the options below are which variables use a log and which variables remain in their original format.

lm( log(CustomerSatisfaction) ~ log(WaitTime) + Segment )

Suppose we have a large data set. Some of the variables are are listed below: ID (identification variable). Each ID appears in the data set exactly one time. Supply (in thousands) Demand (in thousands) Location (North, South) Segment (A, B, C, D) Suppose we wish to answer the question: "Is there a difference in predicted demand when comparing the North and South?" What model in R could answer this question?

lm(Demand ~ Location)

Suppose Dr. Bronlyn has data set with variables Sales and Discount. The data is saved in a data set named MKT317ExampleData. She can use Tableau to compute the trend line: Predicted Sales = 30 + 22*(Discount) Suppose Dr. Bronlyn wants some additional statistical output, so she decides to create the same model in R. What commands would she use?

lm(Sales ~ Discount, data=MKT317ExampleData)

Suppose we have a large data set with three variables: Y, X, and Location, where Location is either North, South, East, or West. We create the following regression model, North, South, and East are dummy variables: Predicted Y = 2 + 8*(X) + 8*(North) + 11*(South) + 2*(East) + 3*(North)*(X) + 4*(South)*(X) + 4*(East)*(X) Compute the predicted Y in the West when X = 20

162

Assume we wish to make conclusions at a 95% confidence level. When making computations for a statistical test, sometimes we calculated a test statistic and critical number (using 95% confidence), and other times we calculated a p-value. How are these related?

If the test statistic is larger than the critical number, then the p-value will be less than 0.05

Suppose Dr. Bronlyn has a data set named MKT317ExampleData. One of the variables is Revenue, which records the total revenue (in dollars) for a given day. The variable Revenue is an example of a ______ .

Quantitative variable

Suppose we create a "full" model to predict the variable Y, and we include 40 independent variables in the full model. Suppose the variable X4 has a medium/weak correlation with Y, and many of the other variables are much more strongly correlated with Y. If we create a reduced model, which of the following must be true?

We can not make either of the conclusions above; if X4 is or is not included in the reduced model depends on the data and the method that the model was reduced.

Suppose we have the following variables for a data set about a food truck that sells cupcakes. CupcakeRevenue is measured in dollars. Temperature is measured in Fahrenheit. The weather variable has four values: nice, cloudy, rain (no thunder), or thunderstorms. We use the following model lm(CupcakeRevenue ~ Temperature + Weather, data=CupcakeFoodTruckData) The coefficients table is below Complete the following sentence: When comparing days with the same temperature, on average, the revenue on days when there is rain (no thunderstorms) is

about the same as days when there are thunderstorms. #28 see picture

Suppose you believe that a percentage change in X corresponds with an absolute change in Y. How would you use X and Y in the "lm" command in R?

lm(Y ~ log(X))

Suppose we wish to predict the chances that a sporting event will begin on time based on the temperature and if it's raining. Which method can we use?

logistic regression

Suppose we have two variables about individual customers; Items purchased last year (quantitative) Location (North, South, East, West) Each customer is in exactly one location What method can we use to determine if there are differences in the average number of items purchased last year when comparing the four segments?

none of the above

Suppose we create a multiple linear regression model to predict the variable Y. The independent variables used are X and Segment, where Segment can either be A, B, C, or D. The output of the linear regression model is below: Suppose we would like to simplify this model to only look at Segment B. Predicted Y for segment B = (Intercept for Segment B) + (slope for Segment B)*X What is the value for the intercept for Segment B's trend line?

-20 #5 see picture

You will need R for this question. You do not need to import a data set into R. Question 10: Suppose you are interested in determining if the proportion of people who are satisfied with their recent shopping experience is based on a specific demographic group. A large survey was given, and the results are below. From the data above, we can conclude that there is a statistically significant relationship between demographic group and customer satisfaction level. We can reach this conclusion using a chi-squared test, which results in a p-value of ______ . Round your answer to include four or more decimal places.

0.006458 #10 see picture

This question requires Tableau or Excel and uses the data set Exam3DataSS22 that is available in D2L (same data set as was used in previous exam questions). Question 6: Let's use the data to predict values into the future using year and quarter. We would predict that the sum of sales in the fourth quarter of 2021 in the North (labeled as North 2021 Q4) is $_____ .

25,554,956

This question requires Tableau or Excel and uses the data set Exam3DataSS22 that is available in D2L (same data set as was used in previous exam questions). Question 7: (round your final answer to one or more decimal places) In the year 2020, _____ % of sales were from Segment A.

29.31

This question requires Tableau or Excel and uses the data set Exam3DataSS22 that is available in D2L: in the very bottom folder in the content area, which is named "Exam 3 Part 1 (take-home part)." Question 2: What is the sum of sales in Segment B in the third quarter (Q3) of 2019?

3,841,366

This question requires Tableau or Excel and uses the data set Exam3DataSS22 that is available in D2L: in the very bottom folder in the content area, which is named "Exam 3 Part 1 (take-home part)." Question 1: What is the sum of sales in Segment B?

71,345,910

Suppose Dr. Bronlyn uses Tableau to create a linear regression model for the variables X and Y. Tableau provides the following output: Trend line: Y = 7 + 1.5(X) p-value: 0.9222 R-squared: 0.0003 What can we say about the correlation between X and Y?

It is a very weak correlation.

Suppose we create a regression model. From that model, we compute the predicted values of Y. In the plot below, we are given dots on a scatter plot representing the predicted value of Y and the observed value of Y. The dots on this scatter plot appear to approximately follow the diagonal line y=x. What can we conclude?

The model fits the data / the model is generally accurate at matching the trends in the data. #20 see picture

What is an informal way to think of a test statistic?

The signal-to-noise ratio for the alternative hypothesis

Suppose we create a regression model using the variables: Y, X, Segment (A or B). lm(Y ~ X*Segment ) Information about the coefficients of the model are given below. In the table above, we see the estimate of the coefficient for X:SegmentB equals 4. What does this mean?

The slope of the trend line for Segment B is 4 more than the slope of the trend line for Segment A. #18 see picture

Suppose we would like to estimate the monthly revenue for a specific small business. The variable Revenue is measured in dollars. We will use the independent variables: Advertising budget (measured in dollars) Quarter (Q1, Q2, Q3, or Q4) Suppose we use the command in R: lm(Revenue ~ Budget + Quarter, data=BronlynsData) The coefficients table from the model is given below: For a month in Q4 where the budget equals $5000, the predicted revenue equals $_______ .

$47,500 #10 see picture

In class, we learned that an independent t-test gives equivalent results as a linear regression model whose only independent variable is the "group" dummy variable (the variable that defines the groups being compared in the t-test). Suppose the value for "Group" is either A or B. Model <- lm(Y ~ Group, data=PretendData) Suppose the coefficients table of the linear regression model are below: What is the p-value for the t-test that determines if there is a statistically significant difference in average Y-value when comparing Group A with Group B? Answer with four decimal places.

0.0270 #21 see picture

You will need R for this question. Question 12: Suppose we have counted the number of individuals who purchase a specific new item and separated into segments. Use R to compute a test that compares the difference in population proportions between Segment A and Segment B. The p-value for this test equals _______ . Round your answer to include four or more decimal places.

0.2273 #12 see picture

In this data set, we will use the SOSS_65 data set that was used in the Module 9 Lab. Question 9: Let's explore dog ownership based on the following three variables: Three Age Groups, which lists if an individual is 18-29 years old, 30-49 years old, or 50 or older. Dow now, which lists if an individual currently owns a dog, and Age of dog when adopted, which is a categorical variable describing the age of the dog when adopted. Among individuals in the data set up currently own a dog, how many people in the 50 or older age group indicated that they adopted a dog whose age was in the "birth-6 months" range? _____ Answer as a count (number of people in the data set) and not as a percentage.

146

Suppose we have the following ANOVA table: Complete the table above. The F-statistic equals ___________ . If your answer has decimals, answer with one or more decimal places.

4.32 #29 see picture

Suppose we create a model in R to predict the revenue, in thousands, based on the value of a quantitative variable X. The model that we create is lm( log(revenue) ~ X ) We use the predict() command in R to compute a 95% confidence interval from this model when X = 50. The output is below: When X = 50, we're 95% confident that the revenue will be between ________ .

403 thousand and 1097 thousand #1 see picture

Suppose we would like to compare the average Revenue between North and South We run a t-test, and obtain the following output: Sample average in North: 80 Sample average in South: 75 p-value for t-test: 0.0035 This is equivalent to the linear regression model whose output is outlined below. What is the estimate of the coefficient for the Dummy Variable for North?

5 #22 see picture

Suppose we create a logistic regression model predicting the variable Churn based on a quantitative variable X. The value of Churn equals 1 for customers who churn (stop being a customer), and the value of churn equals 0 for customers who do not churn. Suppose we create a model using the commands glm(Churn ~ X, data=BronlynsData, family=binomial), and we obtain the coefficients table below: Suppose a specific customer has value X = 4. Compute the probability that this customer will churn.

50% #36 see question

In this data set, we will use the SOSS_65 data set that was used in the Module 9 Lab. Question 8: Let's explore dog ownership based on the following three variables: Three Age Groups, which lists if an individual is 18-29 years old, 30-49 years old, or 50 or older. Dow now, which lists if an individual currently owns a dog, and Dog Purebreed, which indicates if their current dog is or is not a purebreed. Among individuals in the data set up currently own a dog, how many people in the 30-49 years age group indicated that their dog is a purebreed? _____Answer as a count (number of people in the data set) and not as a percentage.

59

Suppose Dr. Bronlyn has a data set named MKT317_PracticeData She uses R and Tableau to create the linear regression model with output given below: Equation of model: Predicted Y = 41 + 11.5(X) p-value for the intercept: 0.0409 p-value for the slope: 0.0069 R-squared for the model: 0.8549 From the output above, Dr. Bronlyn would conclude that ________ % of the variability of Y can be explained by this model.

85

Suppose a large data set includes information about the weights (measured in carats) and prices (measured in US dollars) of recent diamond sales. The data produce the linear model below, and the R-squared value for this model is 0.92 Predicted Price = -2,250 + 7,800(weight) What can we conclude from the R-squared value of 0.92?

92% of the variability in prices of recent diamond sales can be explained by the diamonds' weights.

In modules 2 and 3, we created scatter plots with trend lines in Tableau, and we were able to assess if the "model fits the data" based on how closely the trend line matched the general pattern of the dots in the scatter plot. Why did we need to learn more complicated methods to assess model accuracy in Module 7? *Recall that a model "fits the data" when the model's trend line follows the same general pattern/trend in the data.

Because the graphical methods in Module 2 and 3 only work when we have one quantitative x-variable; the plots we learned in Module 7 can be used to assess model accuracy even for models that have many quantitative x-variables.

Suppose we have independent data for the following variables: Y = Demand (number of units) X = Month (January, February, ..., November, December) Using these variables, we compute an ANOVA, and option a p-value of 0.000004 True or False: We can conclude that the average demand is different in each month (and that no two months have the same average demand).

False

Suppose we have the following data: Y = Is a customer satisfied with their current purchase? (yes/no) X = Region (North, South, East, West) We would like to determine if customer satisfaction is independent of region. True or False: We can use linear regression in this situation.

False

Logistic regression can be used to:

Predict the probability that a shipment will arrive on time based on the shipping cost and the weather.

This evening, Dr. Wassink will have three exam grades for every student in the course: Exam 1, Exam 2, and Exam 3. If Dr. Wassink wants to know if there are any statistically significant differences in average exam grades (after controlling for individual variability), what method should she use?

Repeated Measures ANOVA

Suppose that there is a statistically significant interaction between region (North, South) and time (before, after) with respect to sales. What does this mean?

Sales change over time differently in the North than the South.

This semester in MKT 317, we learned about the R-squared associated with a linear regression model: Predicted Y = b0 + b1(X) In general, what is true about the R-squared?

The R-squared is allowed to be any number between 0 and 1.

The plot below shows the average value of Y in the North and South and for groups A, B, and C. If the image does not display properly, the table below lists the heights of the bars (average value of Y for each situation) What does the plot suggest?

The plot suggests there is not a strong interaction between location and group with respect to Y. #33 see picture

Suppose Dr. Bronlyn creates a linear regression model in Tableau. When she places her mouse over the linear trend line, Tableau provides the following output: Y = 10 + 20*(X) p-value: 0.0001 R-squared: 0.1120 What can we conclude?

There is a statistically significant correlation between X and Y, but that correlation is not very strong.

Suppose we have independent data for the following variables: Y = Customer Satisfaction Score (points) X = Time of day (Morning, Afternoon, Evening) Using these variables, we compute an ANOVA, and option a p-value of 0.0007 True or False: We can conclude that the average customer satisfaction score is not the same for all times of day; there are at least two times of day who have different average customer satisfaction scores.

True

For this question, you will need to use R and a built-in data set named longley. Since this is a built-in data set, you do not need to download or import the data into R. This is the data set that we used in the Module 4 Lab Part 2 (however this model is different from what we created in that lab). Use the data set named longley to create a linear model where The dependent variable (y-variable) is Employed The independent variable (x-variable) is Population When creating the model Predicted Employed = b0 + b1(Population) For this model, the slope, b1, equals ________ .

0.485

Suppose we the following two variables: Income group: a categorical variable with 7 different possible income groups. Location: a categorical variable with 3 different possible locations. We compute a Chi-Squared test to determine if the proportion of individuals who live in different locations is related to income group. When we compute Pearson's Chi-Squared test, the "degrees of freedom" for this Chi-Squared test is ______ .

12

Suppose we create the power law model Predicted Y = 5*(X)^(1.3) Whenever X increases 10%, the predicted value of Y increases _________ %.

13

This question requires Tableau or Excel and uses the data set Exam3DataSS22 that is available in D2L (same data set as was used in previous exam questions). Question 3: What is the sum of sales from the North in Segment B in 2018?

13,638,013

Suppose Dr. Bronlyn has a very large data set about a retail organization. She creates a multiple linear regression model where The dependent variable is Revenue (dollars) The independent variables are Google, Facebook, TV, and YouTube Marketing Budget (all measured in dollars) The coefficients table is given below. When controlling for Google, TV, and YouTube marketing budget, whenever the Facebook budget increases $500, the predicted revenue increases _________ .

15000 #20 see picture

Suppose you are creating a multiple linear regression model to predict the average revenue for a proposed new product. Suppose you have a large dataset with 8,590 rows of data and 3 variables. The three variables are described below: Dependent variable: Revenue generated from a product, measured in dollars. Independent variables: RandD: Research and Development budget, measured in dollars. Category (Product Category, categorical variable). Each product has exactly one product category, and there are a total of 41 different product categories in the data. Suppose you create a multiple linear regression model (with no interaction terms) using the methods that we learned in class. We use the R command Model <- lm(Revenue ~ RandD + Category) summary(Model) How many dummy variables will appear in the coefficients table in the model output?

40

Suppose Dr. Bronlyn creates the following model. In the equation below, ln represents the natural log. ln(Y) = ln(60) + 2(X) What describes the relationship between X and the predicted (average) value of Y?

An absolute change in X corresponds to a percentage change in the predicted value of Y.

Suppose we have the following data: Y = Revenue (dollars) X = Segment (A, B, or C) We would like to determine if there are any differences in average revenue when comparing the three segments. True or False: we can use a t-test in this situation.

False

Suppose Dr. Bronlyn has a very large data set. Some of the variables are listed below. The profit and all budget variables are given in dollars. Profit R&D Budget YouTube Marketing Budget TV Marketing Budget Google Marketing Budget Discount Rate Temperature Dr. Wassink wishes to create a model that will complete the following interpretation: After controlling for Discount Rate, Temperature, and Google Marketing budget, whenever the R&D budget increases $5000, the predicted profit increases $___________ . Dr. Bronlyn can complete this interpretation by creating a model whose dependent variable is Profit. What independent variable(s) should she include in her model?

Include exactly the independent variables Discount Rate, Temperature, Google Marketing Budget, and R&D Budget.

Suppose we hear the interpretation "once we control for the discount rate, for every additional 5 degrees, the ice cream sales revenue increases $4,000." What does it mean to "control for discount rate?"

It means that when we are comparing situations with the same discount rate (such a comparison when the discount rate is 10%); we are incorporating the discount rate into the model, and making a conclusion about the additional impact on ice cream revenue based on temperature.

What is a fixed effects model?

It's a type of regression model that "controls for individual variability" when we have panel data.

Suppose we are viewing a plot showing the relationship between variables X and Y. The values on both axes increase by multiplication by 2, and the dots in the appear to follow a general straight-line trend. This plot is telling us that a(n) ____________ model is an appropriate way to model the relationship between supply and demand.

Power law #26 see picture

Suppose you hear the interpretation "In general, a 15% increase in salary corresponds to an 8% increase in recreational spending." Suppose we know that this interpretation was created from one of the model types below. Which model type could produce such an interpretation?

Power law model

Suppose we have the following data: Y = Is an insurance claim fraudulent (yes/no) X = length of time the policy has existed (measured in years) Our goal is to predict the likelihood that an insurance claim is fraudulent based on the length of time the policy has existed. True or False: We can use logistic regression in this situation.

True

Suppose we have the the following variables: Location: North or South Segment: A or B Sales (dollars) Suppose in the North, the average sales are much higher in Segment A than Segment B, however in the South, the average sales are about the same for both segments. True or False: This suggests that there is an interaction between location and segment with respect to sales.

True

Suppose Dr. Bronlyn used Tableau to create a power law regression model predicting the value of Y based on X. Using this model, we have the following two values: When X = 10, the predicted value of Y = 20 When X = 15, the predicted value of Y = 22 Using this power law model, what is a general interpretation that describes the relationship between how a change in X corresponds with a change in the predicted value of Y?

Whenever X increases 50%, the predicted value of Y increases 10%.

Suppose Dr. Bronlyn has a large data set named MKT317_PracticeData (this is a fictional data set that you do not need to import into R). This data set has five variables: Y, X1, X2, X3, and X4. We have the following information: The correlation coefficient for Y and X1 equals -0.22 The correlation coefficient for Y and X2 equals 0.54 The correlation coefficient for Y and X3 equals -0.87 The correlation coefficient for Y and X4 equals 0.01 Among the variables X1, X2, X3, and X4 in the MKT317_PracticeData set, which variable has the strongest correlation with Y?

X3

Suppose Dr. Nannerl is a music publicist (she manages several social media accounts for famous musicians). Dr. Nannerl is interested in knowing if the number of social media reactions in 2021 is related to the number of downloads from the musician's music from Spotify. The first few rows of data are below: What type of data does Dr. Nannerl have?

cross-sectional data #25 see picture

Suppose we are interested in three variables: quantitative variables X and Y, as well as a Group variable (whose values are A or B). Suppose we create a scatter plot where we plot the X variable on the x-axis, the Y variable on the y-axis, and the dots are color-coded by Group (A or B). Trend lines for each of the two groups are added to the plot. Suppose the two trend lines are parallel but not equal (nor approximately equal); it appears that the trend line for Group B is shifted up a large amount from Group A's trend line. What model could we make in R that corresponds to the plot above (with two parallel trend lines)?

lm(Y ~ X + Group) big graph - see picture

Suppose Dr. Bronlyn has a large data set. She uses several variables from that data set to create the multiple linear regression model of the form: Y^=b0+b1(X1)+b2(X2)+b3(X3) She compute the model in R, and obtain the coefficients table below. Whenever X1 increases 2 units, the predicted value of Y ________ .

none of the above; we can not complete this interpretation with the information provided. see picture

You need R for this question. Question 11: Suppose we have counted the number of individuals who purchase a specific new item. We are 95% confident that the proportion of individuals in the population who will purchase the new item is between ___________________

38.8% and 43.6% #11 see picture

Suppose we have data about a very large retail chain. Some of the variables in the model are: UnitPrice: the average unit price of items purchased per customer. Sales: the total sales per customer (in dollars) Segment: A, B, or C Location: North, South, East, or West We create a model lm( log(Sales) ~ log(UnitPrice) + Segment ) and obtain the following results When the unit price equals 1, the predicted sales in Segment B equals about $_______ .

150 #15 see picture

Suppose we would like to estimate the monthly revenue for a specific small business. The variable Revenue is measured in dollars. We will use the independent variables: Advertising budget (measured in dollars) Quarter (Q1, Q2, Q3, or Q4) Suppose we use the command in R: lm(Revenue ~ Budget + Quarter, data=BronlynsData) The coefficients table from the model is given below: For a month in Q1 where the budget equals $71,100, the predicted revenue equals $_______ .

156,650 #11 see picture

This question requires Tableau or Excel and uses the data set Exam3DataSS22 that is available in D2L (same data set as was used in previous exam questions). Question 5: Let's use the data to predict values into the future using year and quarter. We would predict that the sum of sales in the fourth quarter of 2021 (labeled as 2021 Q4) is $_____ .

31,745,800

Suppose you create an exponential model in R using the command lm(log(Y)~X) to get the coefficients table below. The coefficients table: It we wish to simplify this model to the format Y = a*b^X, then the value for "b" in this simplified format would equal _________ .

4.1 #17 see picture

Suppose Dr. Bronlyn has a large data set, which she uses to create the model below. From that data set, she uses variables Profit and Budget to create the linear model below, where the units of Budget and Profit are in dollars. Dr. Bronlyn uses R to obtain the information below: Equation of the model: Predicted Profit = 20,000 + 2.1(Budget). The p-value for the slope equals 0.0107 The p-value for the slope equals 0.0060 The R-squared for the model equals 0.7010 Whenever the budget increases $200, the predicted profit increases $____________ .

420

Suppose Dr. Bronlyn has a large data set, and she uses it to create the model below. From that data set, she uses variables Revenue and Temperature to create the linear model below, where the Revenue is measured in thousand dollars and Temperature is measured in degrees Fahrenheit. Dr. Bronlyn uses R to obtain the information below: Equation of the model: Predicted Revenue (thousand dollars) = 150 + 7(Temperature). The p-value for the slope equals 0.0015 The p-value for the slope equals 0.0029 The R-squared for the model equals 0.7109 When the temperature equals 85 degrees Fahrenheit, the predicted revenue equals _______ thousand dollars.

745

Suppose a large organization has stores in multiple locations. Some locations typically have very high sales, and others have much lower sales. We have the following variables: ID: Store number Revenue: annual revenue, in dollars Time since company was founded, in years (values are 0, 1, 2, 3, 4, 5, ...) Location: Rural, Small Town, Suburban, Urban we use the model lm( Revenue ~ Time + ID ) The output in R is very long; here are the first few rows of the coefficients table: After controlling for individual variability between stores, every year, the predicted revenue increases $____________ .

8,225 #12 see picture

Suppose we create a logistic regression model predicting the probabilities associated with the variable "Win" The variable win equals 1 if a team wins a game, and 0 if they do not win the game. Suppose we create a model using the commands glm(Win ~ X, data=BronlynsGameData, family=binomial), and we obtain the coefficients table below: Suppose a specific team has value X = 19. Compute the probability that this team will win.

85 #24 see picture

Suppose we have a large data set with three variables: Y, X, and Group, where Group can either equal A, B, or C. We create the following regression model, where GroupB and GroupC are dummy variables representing Group B and Group C. Predicted Y = 15 + 2*(X) + 6*(GroupB) + 6*(GroupC) + 5*(GroupB)*(X) + 2*(GroupC)*(X) Compute the predicted Y for Group B when X = 11

98

Suppose that Y is a quantitative variable, and Group is a categorical variable with two possible values: A or B. We compute the linear regression model lm(Y ~ Group, data=BronlynsData) The coefficients table in the model output is given below: For Group A, the average value of Y equals ________ .

986 #31 see picture

Suppose Dr. Bronlyn has a data set named MKT317ExampleData, which contains data about a large retail organization. One of the variables is Transaction Type, which indicates if a transaction was at an in-person store or using the website. The possible values for the variable Transaction Type are either "in-person" or "online." What type of variable is Transaction Type?

Categorical variable

Suppose Dr. Bronlyn is doing an analysis of customer satisfaction scores for a large financial institution. She has data indicating customer incomes, customer location, and customer satisfaction scores. Suppose Dr. Bronlyn uses the data to conclude "there is an interaction between income and location with respect to customer satisfaction scores." What does this conclusion mean?

Changes in income will impact the predicted customer satisfaction scores differently in different locations.

Suppose Dr. Bronlyn has a very large data set about a retail organization. This data set is very large: it has several thousand rows of data, and has approximately 30 independent variables. Her goal is to create a simple linear regression model that predicts the Profit using the independent variable that is the most strongly correlated with Profit. What method should Dr. Bronlyn use?

Create a correlogram and create a model using the independent variable that is associated with the biggest dot in the row or column labeled "Y."

Suppose Dr. Bronlyn has a very large data set about a retail organization. This data set is very large: it has several thousand rows of data, and has approximately 30 independent variables. Her goal is to create a model has a nice general interpretation for how a change in Budget impacts the change in Profit, and she wants to use the model that gives a nice general interpretation that is as accurate as possible. What method should Dr. Bronlyn use?

Create four plots with budget on the x-axis and profit on the y-axis, using every combination of the linear scale and log scale for each axis. Use the type of model (linear, exponential, logarithmic, or power) that corresponds to the plot with the most straight-line pattern in the dots.

Suppose Dr. Bronlyn has a very large data set about a retail organization. She creates a multiple linear regression model where The dependent variable is Revenue (dollars) The independent variables are Google, Facebook, TV, and YouTube Marketing Budget (all measured in dollars) The coefficients table is given below. True or False: From the model output above, we can conclude that whenever the TV Marketing Budget increases $1, the predicted revenue will increase $20.

False or not enough information #24 see picture

Suppose we have a multiple linear regression model predicting Revenue (thousand dollars) using the following independent variables: RandD: Research and Development Budget (thousand dollars) Quarter (Q1, Q2, Q3, or Q4) Weather (nice, cloudy, or rainy). We use the R command lm(Revenue ~ RandD + Quarter + Weather, data=BronlynsData) and obtain the following coefficients table in the model output: Is the following statement true or false: Most revenue is from days when there is nice weather.

False or not enough information #30 see picture

Suppose we are interested in three variables: quantitative variables X and Y, as well as a Group variable (whose values are A or B). Suppose we create a scatter plot where we plot the X variable on the x-axis, the Y variable on the y-axis, and the dots are color-coded by Group (A or B). Trend lines for each of the two groups are added to the plot. Suppose the two trend lines are parallel but not equal (nor approximately equal); it appears that the trend line for Group B is shifted up a large amount from Group A's trend line. What can we say about the statement below: The plot above suggests that the average value of Y for group B is higher than the average value of Y for group A.

False or not enough information. see picture 2nd big graph

Suppose we create a multiple linear regression model to predict the variable Y. The independent variables used are X and Segment, where Segment can either be A, B, C, or D. The output of the linear regression model is below: Suppose we took the data set and created a scatter plot. We plot X on the x-axis, Y on the y-axis, and color-code by segment. We add the trend lines created from this model to the plot. What do the trend lines look like?

Four parallel increasing trend lines. #6 see picture

Suppose after we complete this exam, Dr. Bronlyn releases grade statistics for exam 1, such as the average grade and median grade for students taking MKT 317 in Spring 2022. After the exam has been completed, the exam statistics for Exam 1 in MKT 317 this semester is an example of

descriptive analytics / desctiptive statistics

Suppose we wish to predict the number of people who will attend an MSU tailgating event based on the temperature, if it's raining, and which opposing team MSU is playing. Which method can we use?

Linear regression

Suppose we create the model below (and the model accurately describes the trend in the data): Y^=4+2X+5X2 (Equation: Yhat = 4 + 2X + 5X2) What can we say about the general trend/relationship between X and Y?

None of the above.

Suppose we are interested in three variables: quantitative variables X and Y, as well as a Group variable (whose values are A or B). Suppose we create a scatter plot where we plot the X variable on the x-axis, the Y variable on the y-axis, and the dots are color-coded by Group (A or B). Trend lines for each of the two groups are added to the plot. Suppose the two trend lines are parallel but not equal (nor approximately equal); it appears that the trend line for Group B is shifted up a large amount from Group A's trend line. Which of the following does the plot suggest?

The plot suggests that there is not an interaction between X and Group with respect to Y.

Suppose we are using a logistic regression model to predict the probability that a lender repays their loan on time. We use the quantitative variable X as an independent variable, and the R code: glm(OnTime ~ X, data=BronlynsBankData, family=binomial) We get the coefficient table below: Suppose we would like to predict the probability who has X=2 pays their loan on time. We use the table above to compute 5 - 2(3) = -1 What do we conclude?

The probability that this individual pays their loan back on time is between 1% and 50%. #27 see picture

Suppose we have a multiple linear regression model that is not reduced. What does this mean?

There is at least one X-variable in the model that can be removed from the model without significantly reducing the accuracy.

Suppose a researcher was curious about the amount of time (in minutes) that people spend outside every day. The researcher gathered the following information: time spent outside yesterday (in minutes) currently own a dog (yes or no). What method can the researcher use to determine if there is a statistically significant difference in the average time spent outside yesterday when comparing people who currently own a dog with people who do not currently own a dog?

independent t-test

Suppose we have the following variables for a data set about a food truck that sells cupcakes. CupcakeRevenue is measured in dollars. Temperature is measured in Fahrenheit. The weather variable has four values: nice, cloudy, rain (no thunder), or thunderstorms. We use the following model lm(CupcakeRevenue ~ Temperature + Weather, data=CupcakeFoodTruckData) The coefficients table is below Complete the following sentence: When comparing days with the same temperature, on average, the revenue on nice weather days is $500 higher than the revenue _________ .

on days with thunderstorms. #29 see picture

Suppose a teacher gives a pre-test at the beginning of the course, and wishes to compare these pre-test scores with the final exam score to determine how much students have learned in the course. For every student, the teacher has two grades: pre-test and final exam. What method can the teacher use to determine if their course is effective (i.e. if after controlling for individual variability, there is a statistically significant change in scores between pre-test and final exam)?

paired t-test

Suppose Dr. Aries teaches a course about astronomy. Dr. Aries gives a pre-test at the beginning of the course, and uses the final exam to assess students knowledge at the end of the course. Dr. Aries has a data set that contains each student's name (ID), as well as two scores per individual: a pre-test score and a final exam score. To record these values, we use a Time variable to indicate pre-test vs. final exam, and the score represents the percentage grades. The first few rows of data are below: What type of data does Dr. Aries have?

panel data #26 see picture

Suppose Dr. Bronlyn has a data set named MKT317ExampleData One of the variables in this data set is named "Revenue." What R command did we learn in class that would tell us the average value of Sales in the MKT317ExampleData data set?

summary(MKT317ExampleData)


Kaugnay na mga set ng pag-aaral

more random places that are definitely random and not just copied :] but I am going to see how much I can type in this thingy until it does not let me I am still going and auto correct it the bomb I have made multiple mistakes and it just keeps correcting

View Set

Chapter 13: The Spinal Cord and Nerves

View Set

Cardiovascular and Hematology ATI

View Set

2.08: Geometric Two-Column Proof

View Set