Stat 146 Test Two
What are some pitfalls for linear regression?
1) Don't fit linear models to non linear associations 2) Correlation is not causation 3) Beware of outliers 4) Regressions of aggregate data 5) Don't extrapolate
What do you call a decreasing trend?
A negative association or negative trend
What is the correlation coefficient?
A number that measures the strength of the linear association between two numerical variables
What do you call an increasing trend?
A positive association or positive trend
Predicted Time = 0.88 + (1.86 * Thousand Miles) Interpret the slope in the context of the problem. Select the correct choice below and fill in the answer box to complete your choice. (Round to two decimal places as needed.) A. For every additional thousand miles, on average, the time goes up by _____ hours. B. For every additional hour, on average, the number of miles goes up by _____ thousand.
A. For every additional thousand miles, on average, the time goes up by 1.86 hours.
Explain how to find the slope and intercept for the displayed output. Choose the correct answer below. Simple linear regression results: Dependent Variable: Armspan Independent Variable: Height Armspan = -9.2324325 + 2.721622 * Height Sample size: 15 R(correlation coefficient) = 0.9215 R-sq = 0.84924175 Estimate of error of standard deviation: 3.532004 A. In the fourth line, the slope is multiplied by the Height, and the intercept is the constant. B. The value of R is the intercept, and slope is multiplied by the Height in the fourth line. C. The value of R is the intercept, and slope is the constant in the fourth line. D. In the fourth line, the slope is the constant, and the intercept is multiplied by the Height.
A. In the fourth line, the slope is multiplied by the Height, and the intercept is the constant.
If there is a positive correlation between number of years studying reading and thumb length (for children), does that prove that longer thumbs cause more studying of reading, or vice versa? Can you think of a hidden variable that might be influencing both of the other variables? Choose the correct answer below. A. It does not prove causation, because older children have longer thumbs and have studied reading longer. Longer thumbs dodo not cause an increase in years of studying. The hidden variable is age. B. It proves causation, because the two variables are related implying causation. Longer thumbs cause an increase in years of studying. The hidden variable is age. C. It proves causation, because the two variables are related implying causation. Longer thumbs cause an increase in years of studying. There is no hidden variable. D. It does not prove causation, because older children have longer thumbs and have studied reading longer. Longer thumbs dodo not cause an increase in years of studying. There is no hidden variable.
A. It does not prove causation, because older children have longer thumbs and have studied reading longer. Longer thumbs dodo not cause an increase in years of studying. The hidden variable is age.
What is a regression of aggregate data?
Aggregate data is when you take the mean of several large groups and use that data to make a regression line. IE: the mean of each state to track something so that you have 50 data points instead of each persons data point.
Some investors use a technique called the "Dogs of the Dow" to invest. They pick several stocks that are performing poorly from the Dow Jones group (which is a composite of 30 well-known stocks) and invest in these. Explain why these stocks will probably do better than they have done before. Choose the correct answer below. A. The poor historical performance puts the stocks outside of the normal range of explanatory variables. Making predictions based on these values is extrapolation, which may result in predictions that increase in a nonlinear fashion. B. Part of the poor historical performance could be due to chance, and if so, regression toward the mean predicts that stocks turning in a lower-than-average performance should tend to perform closer to the mean in the future. In other words, they should increase. C. The poor historical performance puts the stocks outside of the normal range of explanatory variables. Making predictions based on these values is extrapolation, which may result in influential points which pull the predicted price of the stocks upward. D. The poor historical performance could be due the stock's fundamental weakness, and if so, regression toward the mean predicts that stocks turning in a lower-than-average performance should tend to perform closer to the mean in the future. In other words, they should increase.
B. Part of the poor historical performance could be due to chance, and if so, regression toward the mean predicts that stocks turning in a lower-than-average performance should tend to perform closer to the mean in the future. In other words, they should increase.
The correlation between height and arm span in a sample of adult women was found to be r = 0.944 . The correlation between arm span and height in a sample of adult men was found to be r = 0.859 . Which association, the association between height and arm span for women, or the association between height and arm span for men, is stronger? Explain. Choose the correct answer below. A. The association between height and arm span for men is stronger because the value of r is farther from 0. B. The association between height and arm span for women is stronger because the value of r is farther from 0. C. The association between height and arm span for women is stronger because the value of r is closer to 0. D. The association between height and arm span for men is stronger because the value of r is closer to 0.
B. The association between height and arm span for women is stronger because the value of r is farther from 0.
If the correlation between height and weight of a large group of people is 0.65, find the coefficient of determination (as a percent) and explain what it means. Assume that height is the predictor and weight is the response, and assume that the association between height and weight is linear. Choose the correct answer below. A. The coefficient of determination is 6565%. Therefore, 6565% of the variation in height can be explained by the regression line. B. The coefficient of determination is 42.2542.25%. Therefore, 42.2542.25% of the variation in weight can be explained by the regression line. C. The coefficient of determination is 6565%. Therefore, 6565% of the variation in weight can be explained by the regression line. D. The coefficient of determination is 42.2542.25%. Therefore, 42.2542.25% of the variation in height can be explained by the regression line.
B. The coefficient of determination is 42.2542.25%. Therefore, 42.2542.25% of the variation in weight can be explained by the regression line.
Explain how to find the slope and intercept for the displayed output. Choose the correct answer below. Coefficients Intercept - 9.232432 X Variable 2.721622 A. The slope is the X Variable value divided by the Intercept value, and the intercept is the Intercept value. B. The slope is the X Variable value, and the intercept is the Intercept value. C. The slope is the Intercept value divided by the X Variable value, and the intercept is the Intercept value. D. The slope is not given, and the intercept is the Intercept value.
B. The slope is the X Variable value, and the intercept is the Intercept value.
Explain how to find the slope and intercept for the displayed output. Choose the correct answer below. LinReg y = a + bx a = -9.232432457 b = 2.721621622 r squared = 0.84924175 r = 0.921543135 A. The b-value is the intercept, and the a-value is the slope. B. The a-value is the intercept, and the b-value is the slope. C. The r-value is the intercept, and the b-value is the slope. D. The r-value is the intercept, and the a-value is the slope.
B. The a-value is the intercept, and the b-value is the slope.
How is the coefficient of determination related to the correlation, and what does the coefficient of determination show? A. The coefficient of determination is the square of the correlation, and it shows the strength and direction of the linear association between two variables. B. The coefficient of determination is the square root of the correlation, and it shows the strength and direction of the linear association between two variables. C. The coefficient of determination is the square of the correlation, and it shows the proportion of the variation in the response variable that is explained by the explanatory variable. D. The coefficient of determination is the square root of the correlation, and it shows the proportion of the variation in the response variable that is explained by the explanatory variable.
C. The coefficient of determination is the square of the correlation, and it shows the proportion of the variation in the response variable that is explained by the explanatory variable.
When describing two-variable associations, a written description should always include trend, shape, strength, and which of the following? Choose the correct answer below. A. The number of pairs in the data set B. The name of the person who gathered the data C. The context of the data D. All of the above
C. The context of the data
Explain how to find the slope and intercept for the displayed output. Choose the correct answer below. The regression equation is: Armspan = -9.2 + 2.72 * Height A. The slope is the number being multiplied by Height, and the intercept is the constant divided by the sample size. B. The slope is the constant, and the intercept is the number being multiplied by Height. C. The slope is the number being multiplied by Height, and the intercept is the constant. D. The slope is the constant, and the intercept is the number being multiplied by Height divided by the sample size.
C. The slope is the number being multiplied by Height, and the intercept is the constant.
The scatterplot shows the heights of mothers and daughters. Daughter = 25.53 + 0.628*Mother Interpret the slope. Choose the correct answer below. A. For each additional inch in the daughter's height, the average mother's height increases by about 0.628 inch. B. The height of the average mother is about 0.628 times the height of the daughter. C. The height of the average daughter is about 0.628 times the height of the mother. D. For each additional inch in the mother's height, the average daughter's height increases by about 0.628 inch.
D. For each additional inch in the mother's height, the average daughter's height increases by about 0.628 inch.
If the correlation between height and weight of a large group of people is 0.61, find the coefficient of determination (as a percent) and explain what it means. Assume that height is the predictor and weight is the response, and assume that the association between height and weight is linear. Choose the correct answer below. A. The coefficient of determination is 61%. Therefore, 61% of the variation in height can be explained by the regression line. B. The coefficient of determination is 37.21%. Therefore, 37.21% of the variation in height can be explained by the regression line. C. The coefficient of determination is 61%. Therefore, 61% of the variation in weight can be explained by the regression line. D. The coefficient of determination is 37.21%. Therefore, 37.21% of the variation in weight can be explained by the regression line.
D. The coefficient of determination is 37.21%. Therefore, 37.21% of the variation in weight can be explained by the regression line.
What is a regression line?
It is a tool for making predictions about future observed values.
What is the simplest shape for a scatterplot?
Linear, which is depicted by a straight line.
Does changing the order of variables change the r (correlation coefficient)?
No, the order should have no effect on r, because the strength is a linear relationship.
Can you use the correlation coefficient with linear and non linear scatterplots?
No, this is only used with linear data
Does having a correlation coefficient close to +1 or -1 tell you whether the relationship is linear or non linear?
No, you must plot/graph it out to see if it's linear or non linear.
Can you use the correlation coefficient to show causation?
No. This is observational data, not a controlled experiment.
What do you call the y-variable?
Response variable, Predicted variable, Dependent variable
What is the coefficient of determination?
Simply put, it's the correlation coefficient squared. This is often called the r-squared. This is usually then multiplied by 100 to change it to a %.
What does a curved line denote on a scatterplot?
Something that is nonlinear
What do you call the x-variable?
The Explanatory variable, Predictor variable or Independent variable
What is the law of large numbers?
The law of large numbers is a principle of probability according to which the frequencies of events with the same likelihood of occurrence even out, given enough trials or instances. As the number of experiments increases, the actual ratio of outcomes will converge on the theoretical, or expected, ratio of outcomes.
What are the two kinds of probabilities?
Theoretical probability which are based on specific assumptions and Empirical probability which are based on observation.
What does random mean?
There is no predictable pattern occurring.
What is an influential point?
This is a data point that can greatly effect your conclusion. You should try the data with and without the data point and comment on how it changes the conclusion.
What does a large amount of scatter depict in a scatterplot?
This shows a weak association
How do you find the intercept for the regression line?
To find the intercept (a), we must find the means of the variables x and y.
What do we look for in a scatterplot?
We look at trend (similar to center), strength (similar to spread) and shape.
How do you find slope for the regression line?
b = r(sy/sx) The slope of the regression line is the ratio of the standard deviations of the two variables multiplied by the correlation coefficient.
What are the two categories that you can break the numerical category into?
discrete outcomes/variables; continuous outcomes/variables
What is a statisticians equation for a line?
y = a + bx a is the intercept b is the slope y is the y intercept