GB 307

Ace your homework & exams now with Quizwiz!

magnitude

As the cost of living index increases, the values in the groceries index also tend to increase. This indicates that the relationship has a positive magnitude. If the values of one increased as the other decreased, the magnitude would be negative. If there was no relationship between the two, the magnitude would be zero.

Given our regression results, by how much do we expect store sales to increase each week? Round your answer to the nearest dollar.

Based on tren coef, not on intercept

relationship form scatter plot

Linear relationship. Strong correlation (r = 0.97). Positive magnitude/direction. One outlier at (7,9).

variance

R^2

explaining r2

The r2 value is 83.19%. This means that 83.19% of the variance in meeting attendance is explained by the average number of Facebook shares.

correlation

The strength of the correlation does not depend on which variable is plotted on the y-axis and which is on the x-axis. More generally, there is a single correlation value explaining the relationship between two metrics. If X is strongly correlated with Y, then Y is also strongly correlated with X. When someone tries to infer causality from correlation, this can lead to the challenge of reverse causality and is a major reason that correlation does not imply causality, regardless of the strength of the relationship.

Which of the following are true about best subsets and stepwise regression?

They can help sift through a large number of variables to build a predictive model. They will capitalize on any spurious correlation present in the data, leading to concerns about over fitting.

anomalous

This "anomalous" point is an outlier in our data. In this case, PPIT and RI are generally negatively related. However, Doha has both a relatively high PPIT and a high RI, meaning it deviates dramatically from the overall pattern in the data and is an outlier.

normality

This is a judgement call, as is common in real world data. The residual distribution appears left skewed, which means it is not symmetic, and therefore cannot be normal. However, this skew is a result of just a few points, including the large positive residual associated with our outlier. Further, recall that the assumption is over the population errors, not the sample residuals. With so few points, it is possible that the errors are still normally distributed despite the residual distribution. The good news is that linear regression is relatively robust to violations of the normality assumption. Because of the ambiguity here, there were no points awarded for this question regardless of your answer.

. Based on this regression, by how much does expected revenue increase each day? Round your answer to the nearest penny.

This is the definition of the coefficient estimate for Trend.

Now, what does the scatterplot tell you about the magnitude of the relationship? Is there a relationship at all? Is it positive or negative? What does it mean to have a positive or negative relationship? Interpret.

Yes, there appears to be a positive relationship. It means that when the average rating is high, the total number of check-ins tends to be high as well (and vice versa). What it DOES NOT necessarily mean is, "Higher average ratings cause more customers to come." A scatterplot does not say anything about whether it is a CAUSAL relationship; it just says that there is a relationship. Perhaps this relationship is in fact driven by a third variable, e.g. food quality. Higher food quality could lead to BOTH better ratings AND more customers. This is where we need to use our judgment and knowledge of how bars function in general. Therefore, think of this plot as of a good starting point for further analysis.

causality

You should move to a large Californian city if you want to live longer. Performing household chores may boost brain health in the elderly. Exposure to display advertising increases the odds that an individual purchases from the advertiser by 10%.

Even when a relationship is not statistically significant, there is still a good chance that it is practically significant.

false. When a relationship between X and Y is statistically insignificant, we are saying that we can't discern an impact to Y of changing X. A relationship is practically significant when a reasonable change in X results in a managerially meaningful impact to Y. Obviously, it's hard to argue for the latter if you can't be confident that X has any impact on Y.

threshold

he threshold for statistical significance should be set relative to the cost of making a type one error (i.e., saying that the independent and dependent variable are related when they are in fact independent). This cost will be highly dependent on the situation. For example, think about the pharmaceutical ad detailing all of the possible side effects that was shown in class. The efficacy of that drug should have been evaluated using a very low threshold for statistical significance, because if it didn't work but did cause all of those side effects the consequences of taking it to market could be dire.

Reject null

less than P-value (IF IT IS SIGNIFFICANT)

Based only on your analysis above, discuss how passing the Bechdel test relates to a film's revenue. Specifically, how does the effect vary for large budget vs. small budget films?

Films have a higher revenue when they pass the Bechdel test. However, when there is a small budget, the revenue is higher for those that fail. For example, if the budget is $200,000, the film makes $30946362.2 more when it fails. Larger budgets will have a larger revenue when it passes the Bechdel test. Before the interaction term, passing the test decreases the revenue. But after, there is a positive effect from the interaction term, increasing the revenue.

You have run a regression based on the following model: l n ( Y ^ ) = α + l n ( X ) β. The parameter estimate for β came back as 0.10. Which of the following is the correct interpretation of this parameter estimate?

For every one percent increase in X, we expect Y to increase by 0.10%.

Finally, regress the natural log of revenue on the natural log of budget. Interpret the slope parameter estimate. Note, it will be insufficient to say, "We expect a ____ unit change in the natural log of revenue for every one unit change in the natural log of budget." Hint: Remember the special interpretation that results from taking the natural log of both x and y in linear regression.

From this regression, we can interpret that when there is a 1% change in budget, there is an expected .87% change in revenue. The elasticity is .87.

assumptions associated with linear regression

Have mean zero (Linearity) Are probabilistically independent (Independence) Are normally distributed (Normally Distributed) Have equal variance (Equal Variance)

throw away or keep outlier

Ideally, you need to find out where this unusual observation comes from. In real life, data sets VERY often contain typos, etc. Plots like this are a great tool for detecting typos. If it is just a typo, then of course you need to delete it. In our case, this is not a typo, and represents a true outlier in our data. If you do want to remove it, the argument has to be along the lines of this data point being caused by some external factor that makes it irrelevant to the topic under study and would skew the results if left in. For example, think about the UNC Geography graduate net worth and Michael Jordan example in class. Regardless, remember that you should check whether your ultimate findings and recommendations are robust to its inclusion. If they aren't robust, that is if your findings hinge on the presences of a single observation, you should think carefully about the confidence that you have in them.

In this regression, the expected change in revenue associated with passing the Bechdel test depends on the budget. Hint: If you are unsure, calculate the expected change in revenue from passing the Bechdel test for a film with a $10,000,000 budget, and then for one with a $20,000,000 budget.

In a linear regression model, only an interaction term allows the effect of one variable to depend on the value of another. We don't have any interaction terms here.

error in MAPE

In this case, when Y t = 0 the equation is undefined because the inner fraction is divided by zero. Thus, we can only calculate MAPE when the observed outcome variable is never zero.

largest to store

OLE Object

outliers

Outliers should definitely be removed if they represent inaccurate data (e.g., if someone fat fingered an entry). Outliers should generally be left in the data if they accurately represent underlying occurrences and are relevant to the question at hand. If there is potential disagreement about whether the point should be removed (which is generally the case when it is not flawed data), you should run your analysis with and without the point to determine whether your findings are dependent upon its inclusion. Hopefully, they are not.

If revenue and passing the Bechdel test were unrelated (i.e., independent), what is the probability that we would have gotten a parameter estimate for β B e c h d e l at least as extreme as what we observed? Provide your answer as a percent rounded to the nearest integer.

P value

The inclusion of a lagged dependent variable is not without its limitations. Please discuss the implications that including a single day lag has on the usefulness of our model for forecasting hotel revenues.

The biggest concern regarding lagged variables is that you can't predict any farther ahead than the lagged variable goes back, unless you are willing to make predictions based on predictions (i.e., p-th order auto-regressive models). Because predictions based on predictions compound the associated error, there is no clear method for calculating the confidence interval associated with a prediction. However, because we have included a single day lag, we can only predict one period ahead unless we are willing to allow for this additional source of (unmeasurable) error. Both of these can create severe implications for business planning.

The equal variance assumption says

The equal variance assumption says that the variance of the population errors is constant for all values of X. That is, the spread of the points should not differ greatly over values of X (i.e., there is no "fan" or "diamond" shape in the residual plot). There is a clear fan shape in the residuals, indicating the population errors likely violate the equal variance assumption.

The independence assumption says

The independence assumption says that the values of the population errors are independent. That is, knowing the value of one residual tells us nothing about the value of the next. If residuals are positively auto-correlated, then when the last residual was large, we expect the next to be large. If residuals are negatively auto-correlated, when the last was residual was large, we expect the next to be small. This rarely occurs with cross sectional data, and is generally only a concern with time-series data.

The linearity assumption says

The linearity assumption says that for all values of X, the population errors have mean zero. In the residual plot, this is violated if there exist a range of X values over which the residuals consistent fall above or below zero. This frequently occurs when the residuals follow some non-linear pattern. Hence, the "linearity" assumption.

databases

Validation rules help prevent erroneous data entry by defining the requirements for a valid entry. User profiles define what actions a user can take within a database, including which tables they can view and which they can edit.

Which of the following statements regarding Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE) is correct? Select all that apply.

When a model poorly fits the time-series data, the MAPE is large Outliers have a greater impact on RMSE than on MAE RMSE is a good measure of model accuracy, but its magnitude depends on the scale of the outcome variable.

relationship

When the values of PPIT are small, changes in RI have a large effect. However, when the values of PPIT are large, changes in RI have only a minor effect. If this were a linear relationship, the impact of changing PPIT would not depend on the level of RI.

can reject X as a value if the p value is less than the critical value

an reject zero as a possible value for β B e c h d e l at a critical value of p ≤ 0.001.

Taking the natural log of a variable will reduce the variance when values are large, but increase it when values are near zero. This can generally help with heteroskedasticity of the type seen above.

take ln y


Related study sets

Intro to Business Ch. 13 - Promotion and Pricing Strategy

View Set

Unit 5 - Numerical & Algebraic Expressions

View Set

Chapter 5 Security Assessment and Testing

View Set

1: completing the application, underwriting and delivering the policy

View Set

Soc 106 - CH 6 - Significance Tests

View Set

Chapter 1 Smartbook (Operations Management)

View Set

Endocrine NCLEX Practice Questions

View Set

Experiment 5- Limiting reactant

View Set

4734 Sayılı Kamu İhale Kanunu

View Set