Stats Quiz 5

¡Supera tus tareas y exámenes ahora con Quizwiz!

Match each equation component with to this equation: Sales=764+1323∗Prime+e Baseline offset response variable, categorical predictor variable residual

764: the baseline 1323: the offset Sales: response variable Prime: categorical predictor variable e: residual

Reasons to include an interaction term in our model include which of the following? (1) To estimate context-specific effects of some predictor variable on the outcome (y). (2) If the joint effect of two variables on the outcome can be correctly modeled as the sum of the main effects associated with each variable. (3) Looking at an ANOVA table suggests that an interaction term noticeably improves the predictive power of the model. a. 1 and 3 b. 2 and 3 c. All of the above (1, 2, and 3) d. 1 and 2

a. 1 and 3

Which of the following statements is true of dummy variables? (1) In general, a grouping variable with K categories produces K-1 dummy variables. (2) In a fitted model, the coefficient on a dummy variable represents the average value of the outcome (y) whenever the dummy variable is 1. (3) In a fitted model, the coefficient on a dummy variable represents the difference in the average outcome (y) between two conditions: when the dummy variable is 1, versus when the dummy variable is 0. a. 1 and 3 b. None of the above c. 1 and 2 d. 1 only

a. 1 and 3

The Amazon e-commerce data science team fits a model to gain understanding of the extent to which having an Amazon Prime membership leads customers to buy more from the platform than customers without a Prime membership. The data set includes two variables: total sales revenue for a customer account and a dummy variable representing whether the customer had a Prime membership. The fitted model equation is: Sales=764+1323∗Prime+e What sales revenue do we predict for a customer who is a Prime member? a. 2087 b. 764 c. 559 d. 1323

a. 2087

After fitting a linear model it is common to conduct an analysis of variance. An ANOVA table: a. All of these answers are correct. b. helps us to understand the impact of each variable on the predictive power of a model. c. records changes in R2 , which should increase with each sequential variable added to the model. d. helps us diagnose if the eﬀect of a predictor variable is context-specific with respect to the value of another variable (i.e., detect interactions).

a. All of these answers are correct.

Which of the following statements about correlation are correct? a. Correlation ranges from -1 to 1 b. Correlation, like an ANOVA table, is order-dependent: in general, we would not expect the correlation of (X,Y) to be the same as the correlation of (Y,X). c. Correlation is measured in units of both the X1 and X2 variables. d. The sign of a correlation coefficient gives the direction (positive, negative) of the association between x and y.

a. Correlation ranges from -1 to 1 d. The sign of a correlation coefficient gives the direction (positive, negative) of the association between x and y.

Which of the following are accurate statements about R2? Select all correct answers. a. R2 ranges from 0 to 1. b. If a linear model produces a R2 equal to 0.77, it indicates that 23% of the variability in Y is predicted by factors other than variation in X. c. This measure tells us what fraction of variability in X is predictable in terms of Y. d. Larger values of R2 indicate more systematic variation in X that may be predicted by Y.

a. R2 ranges from 0 to 1. b. If a linear model produces a R2 equal to 0.77, it indicates that 23% of the variability in Y is predicted by factors other than variation in X.

In a linear regression model, we describe data by an equation: yi=β0+β1∗xi+ei Which of the following is accurate for this equation? Select all correct answers. a. The predictor variable is represented by xi b. The slope of the line is β0β0 c. Model error is represented by ei d. The intercept of the line is β1

a. The predictor variable is represented by xi c. Model error is represented by ei

A sample of de-identified medical records provides measurements of cholesterol levels in milligrams per deciliter (mg/dL) as well as weight (in pounds) for 400 adults. A linear model is fitted to predict blood cholesterol levels from weight. What units does the model slope have? a. The slope is expressed in terms of mg/dL per pound. b. None of these answers is correct. c. The slope is expressed in terms of mg/dL per person. d. The slope is expressed in terms of pounds per dL.

a. The slope is expressed in terms of mg/dL per pound.

Professor Galton collects data on the heights of all the students in his class (in inches) as well as their fathers' heights. He regresses the heights of the students on the heights of their fathers using R Studio and calculates R2=0.25 What can we say about this result? (1) There is more systematic variation in Y than there is individual variation in Y. (2) About 75% of the variability in Y is predictable by the regression on X. (3) About 25% of the variability in Y is predictable by factors other than the regression on X. a. 1 and 2 b. None of the above c. 2 and 3 d. 1 and 3

b. None of the above

Professor Galton collects data on the heights of all the students in his class as well as their fathers' heights. He regresses the heights of the students on the heights of their fathers and calculates the following in R Studio: Coefficeients: (Intercept) 33.893 Father_height: 0.514 Which of the following statements are correct based on the results of this model? a. We expect a 33.9-inch increase in a student's height for every one-inch increase in the height of their father. b. Student height is the response variable and father's height is the predictor variable. c. We expect a student with a father who is zero inches tall to have a height of 0.514 inches. d. All of these answers are correct.

b. Student height is the response variable and father's height is the predictor variable.

In a simple groupwise model y = b0 + b1*x with a dummy-variable (0 or 1) predictor x, the coefficient b1 may be interpreted as: a. the predicted value of the outcome variable when the dummy variable is equal to 1. b. the differential effect on the outcome of having the dummy variable equal to 1, rather than 0. c. the improvement in R-squared that we get from including x in the model. d. the predicted value of the outcome variable when the dummy variable is equal to 0.

b. the differential effect on the outcome of having the dummy variable equal to 1, rather than 0.

The average salary of 30 top quarterbacks in the 2017 National Football League was just over $13,000,000. A linear regression to predict Salary from Total QBR (an overall measure of performance based on various game statistics) produced the following equation: Salaryi=140,638+216,373⋅TotalQBRi Tom Brady (then of the New England Patriots) had a Total QBR of 83 and was paid $14,000,000 in 2016. Which of the following can we conclude based on the results of this linear model? (1) In 2016, Brady was underpaid by more than $4 million, versus what we'd expect given his QBR. (2) Top NFL quarterback salaries are, on average, about $216,373 higher for every one-point increase in the player's Total QBR. (3) A quarterback with a Total QBR = 55 has a predicted salary of $12,041,153. a. 1 and 2 b. 1 and 3 c. All of the above (1, 2, and 3) d. 2 and 3

c. All of the above (1, 2, and 3)

The Amazon data science team fits a "baseline/offset" model to learn if customers with a Prime membership buy more, on average, from the platform than customers without a Prime membership. They calculate R-squared to be 0.76: R2=0.76 This measure indicates which of the following? a. None of these answers is correct. b. Individual differences within customer groups are the primary source of variation in customer revenue. c. Differences between customer groups are the primary source of variation in customer revenue. d. Approximately 24% of the variability in customer revenue is explained by whether or not they have a Prime membership.

c. Differences between customer groups are the primary source of variation in customer revenue.

70.9

A data-science team at a large grocery chain observes the quantity of ice cream cartons sold at different levels of outside temperature each day. Their goal is to use a statistical model to understand how changes in temperature predict changes in consumer demand for ice cream, as measured by quantity sold on a given day. Select all correct answers below that describe this model. a. Quantity sold should be the predictor variable, and temperature should be the response variable. b. If the data scientists use a linear model, the model intercept represents what we'd expect the outside temperature to be if the ice cream sales quantity was exactly 0. c. Quantity sold should be the response variable, and temperature should be the predictor variable. d. If the data scientists use a linear model, the model intercept represents what we'd expect ice cream sales quantity to be if the outside temperature was exactly 0.

c. Quantity sold should be the response variable, and temperature should be the predictor variable. d. If the data scientists use a linear model, the model intercept represents what we'd expect ice cream sales quantity to be if the outside temperature was exactly 0.

See Image 2 You collect data on daily sales of pints of guacamole at a local grocery store. The following jitter plots shows sales versus two variables: -whether it's a weekend or a weekday. -whether there are free samples of guacamole. The orange dots show the group means for each situation. Which of the following statements are correct, in light of this picture? (1) The effect of offering free samples looks larger on a weekend than it does on a weekday. (2) The joint effect of the weekend and free sample variables on sales looks separable: that is, equal to the sum of the individual effects. (3) Our model for sales should include an interaction between free samples and weekend variables. a. 1 and 2 b. 2 and 3 c. All of the above (1, 2, and 3) d. 1 and 3

d. 1 and 3

After fitting a linear model it is common to conduct an analysis of variance. In an ANOVA table: a. we track the standard deviation of the residuals, which should increase as variables are added sequentially. b. we track the change in R-squared (R^2), which should decrease as variables are added sequentially. c. we list all the fitted coefficients for all variables in the model. d. None of these answers is correct.

d. None of these answers is correct.

A researcher examines the movie ratings data set for a popular website and observes that the correlation between a movie's budget (in dollars) and the average viewer rating (0--10 points) is r = 0.26. In a report, the researcher reports the budget variable in millions of dollars (instead of the original data reported in dollars). How will this change in units affect the value of the correlation coefficient? a. The correlation will decrease. b. The correlation will increase slightly.. c. The correlation will increase considerably. d. The correlation will not change.

d. The correlation will not change.

Match the R function on the left to its primary purpose in fitting linear models on the bottom. lm coef rsquared residual geom_point summary a. generate error terms for each observed data point b. calculate a measure of model c. produce a regression table with model results d. summarize the trend e. fit the model f. make a scatterplot

lm -> fit the model coef -> summarize the trend rsquared -> calculate a measure of model residuals -> generate error terms for each observed data point geom_point -> make a scatterplot summary -> produce a regression table with model results

Stats Quiz 5

Conjuntos de estudio relacionados

Quiz: Emptying and Changing an Ostomy Appliance

Chapter 3 Review Sensation and Perception

ATI - Practice B

PHR: Module 4: Total Rewards

BP Test 1

ENT Midterm Exam

QA Interview Questions & Answers

RegEx Builder

Accounting

The Iroquois Creation Myth: "The World on Turtle's Back"

CNA chapter 10

TD 5/5

ACC416 ch 1-3 test

chapter 9

ITSY 1300 Chapter 5-8 Test

Missouri Constitution

quiz 5

ch 5 bio quiz

PSY 320 ch 5

Membrane Structure/Function