Stat 146 Test Two

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

What are some pitfalls for linear regression?

1) Don't fit linear models to non linear associations 2) Correlation is not causation 3) Beware of outliers 4) Regressions of aggregate data 5) Don't extrapolate

What do you call a decreasing trend?

A negative association or negative trend

What is the correlation coefficient?

A number that measures the strength of the linear association between two numerical variables

What do you call an increasing trend?

A positive association or positive trend

Predicted Time = 0.88 + (1.86 * Thousand Miles) ​Interpret the slope in the context of the problem. Select the correct choice below and fill in the answer box to complete your choice. ​(Round to two decimal places as​ needed.) A. For every additional thousand​ miles, on​ average, the time goes up by _____ hours. B. For every additional​ hour, on​ average, the number of miles goes up by _____ thousand.

A. For every additional thousand​ miles, on​ average, the time goes up by 1.86 hours.

Explain how to find the slope and intercept for the displayed output. Choose the correct answer below. Simple linear regression​ results: Dependent​ Variable: Armspan Independent​ Variable: Height Armspan = -9.2324325 + 2.721622 * Height Sample​ size: 15 ​R(correlation ​coefficient) = 0.9215 ​R-sq = 0.84924175 Estimate of error of standard​ deviation: 3.532004 A. In the fourth​ line, the slope is multiplied by the​ Height, and the intercept is the constant. B. The value of R is the​ intercept, and slope is multiplied by the Height in the fourth line. C. The value of R is the​ intercept, and slope is the constant in the fourth line. D. In the fourth​ line, the slope is the​ constant, and the intercept is multiplied by the Height.

A. In the fourth​ line, the slope is multiplied by the​ Height, and the intercept is the constant.

If there is a positive correlation between number of years studying reading and thumb length ​(for children), does that prove that longer thumbs cause more studying of reading, or vice​ versa? Can you think of a hidden variable that might be influencing both of the other​ variables? Choose the correct answer below. A. It does not prove​ causation, because older children have longer thumbs and have studied reading longer. Longer thumbs dodo not cause an increase in years of studying. The hidden variable is age. B. It proves​ causation, because the two variables are related implying causation. Longer thumbs cause an increase in years of studying. The hidden variable is age. C. It proves​ causation, because the two variables are related implying causation. Longer thumbs cause an increase in years of studying. There is no hidden variable. D. It does not prove​ causation, because older children have longer thumbs and have studied reading longer. Longer thumbs dodo not cause an increase in years of studying. There is no hidden variable.

A. It does not prove​ causation, because older children have longer thumbs and have studied reading longer. Longer thumbs dodo not cause an increase in years of studying. The hidden variable is age.

What is a regression of aggregate data?

Aggregate data is when you take the mean of several large groups and use that data to make a regression line. IE: the mean of each state to track something so that you have 50 data points instead of each persons data point.

Some investors use a technique called the​ "Dogs of the​ Dow" to invest. They pick several stocks that are performing poorly from the Dow Jones group​ (which is a composite of 30​ well-known stocks) and invest in these. Explain why these stocks will probably do better than they have done before. Choose the correct answer below. A. The poor historical performance puts the stocks outside of the normal range of explanatory variables. Making predictions based on these values is​ extrapolation, which may result in predictions that increase in a nonlinear fashion. B. Part of the poor historical performance could be due to​ chance, and if​ so, regression toward the mean predicts that stocks turning in a​ lower-than-average performance should tend to perform closer to the mean in the future. In other​ words, they should increase. C. The poor historical performance puts the stocks outside of the normal range of explanatory variables. Making predictions based on these values is​ extrapolation, which may result in influential points which pull the predicted price of the stocks upward. D. The poor historical performance could be due the​ stock's fundamental​ weakness, and if​ so, regression toward the mean predicts that stocks turning in a​ lower-than-average performance should tend to perform closer to the mean in the future. In other​ words, they should increase.

B. Part of the poor historical performance could be due to​ chance, and if​ so, regression toward the mean predicts that stocks turning in a​ lower-than-average performance should tend to perform closer to the mean in the future. In other​ words, they should increase.

The correlation between height and arm span in a sample of adult women was found to be r = 0.944 . The correlation between arm span and height in a sample of adult men was found to be r = 0.859 . Which association, the association between height and arm span for​ women, or the association between height and arm span for men, is ​stronger? Explain. Choose the correct answer below. A. The association between height and arm span for men is stronger because the value of r is farther from 0. B. The association between height and arm span for women is stronger because the value of r is farther from 0. C. The association between height and arm span for women is stronger because the value of r is closer to 0. D. The association between height and arm span for men is stronger because the value of r is closer to 0.

B. The association between height and arm span for women is stronger because the value of r is farther from 0.

If the correlation between height and weight of a large group of people is 0.65​, find the coefficient of determination​ (as a​ percent) and explain what it means. Assume that height is the predictor and weight is the​ response, and assume that the association between height and weight is linear. Choose the correct answer below. A. The coefficient of determination is 6565​%. ​Therefore, 6565​% of the variation in height can be explained by the regression line. B. The coefficient of determination is 42.2542.25​%. ​Therefore, 42.2542.25​% of the variation in weight can be explained by the regression line. C. The coefficient of determination is 6565​%. ​Therefore, 6565​% of the variation in weight can be explained by the regression line. D. The coefficient of determination is 42.2542.25​%. ​Therefore, 42.2542.25​% of the variation in height can be explained by the regression line.

B. The coefficient of determination is 42.2542.25​%. ​Therefore, 42.2542.25​% of the variation in weight can be explained by the regression line.

Explain how to find the slope and intercept for the displayed output. Choose the correct answer below. Coefficients Intercept - 9.232432 X Variable 2.721622 A. The slope is the X Variable value divided by the Intercept​ value, and the intercept is the Intercept value. B. The slope is the X Variable​ value, and the intercept is the Intercept value. C. The slope is the Intercept value divided by the X Variable​ value, and the intercept is the Intercept value. D. The slope is not​ given, and the intercept is the Intercept value.

B. The slope is the X Variable​ value, and the intercept is the Intercept value.

Explain how to find the slope and intercept for the displayed output. Choose the correct answer below. LinReg y = a + bx a = -9.232432457 b = 2.721621622 r squared = 0.84924175 r = 0.921543135 A. The​ b-value is the​ intercept, and the​ a-value is the slope. B. The​ a-value is the​ intercept, and the​ b-value is the slope. C. The​ r-value is the​ intercept, and the​ b-value is the slope. D. The​ r-value is the​ intercept, and the​ a-value is the slope.

B. The​ a-value is the​ intercept, and the​ b-value is the slope.

How is the coefficient of determination related to the​ correlation, and what does the coefficient of determination​ show? A. The coefficient of determination is the square of the​ correlation, and it shows the strength and direction of the linear association between two variables. B. The coefficient of determination is the square root of the​ correlation, and it shows the strength and direction of the linear association between two variables. C. The coefficient of determination is the square of the​ correlation, and it shows the proportion of the variation in the response variable that is explained by the explanatory variable. D. The coefficient of determination is the square root of the​ correlation, and it shows the proportion of the variation in the response variable that is explained by the explanatory variable.

C. The coefficient of determination is the square of the​ correlation, and it shows the proportion of the variation in the response variable that is explained by the explanatory variable.

When describing​ two-variable associations, a written description should always include​ trend, shape,​ strength, and which of the​ following? Choose the correct answer below. A. The number of pairs in the data set B. The name of the person who gathered the data C. The context of the data D. All of the above

C. The context of the data

Explain how to find the slope and intercept for the displayed output. Choose the correct answer below. The regression equation is: Armspan = -9.2 + 2.72 * Height A. The slope is the number being multiplied by​ Height, and the intercept is the constant divided by the sample size. B. The slope is the​ constant, and the intercept is the number being multiplied by Height. C. The slope is the number being multiplied by​ Height, and the intercept is the constant. D. The slope is the​ constant, and the intercept is the number being multiplied by Height divided by the sample size.

C. The slope is the number being multiplied by​ Height, and the intercept is the constant.

The scatterplot shows the heights of mothers and daughters. Daughter = 25.53 + 0.628*Mother Interpret the slope. Choose the correct answer below. A. For each additional inch in the​ daughter's height, the average​ mother's height increases by about 0.628 inch. B. The height of the average mother is about 0.628 times the height of the daughter. C. The height of the average daughter is about 0.628 times the height of the mother. D. For each additional inch in the​ mother's height, the average​ daughter's height increases by about 0.628 inch.

D. For each additional inch in the​ mother's height, the average​ daughter's height increases by about 0.628 inch.

If the correlation between height and weight of a large group of people is 0.61​, find the coefficient of determination​ (as a​ percent) and explain what it means. Assume that height is the predictor and weight is the​ response, and assume that the association between height and weight is linear. Choose the correct answer below. A. The coefficient of determination is 61​%. ​Therefore, 61​% of the variation in height can be explained by the regression line. B. The coefficient of determination is 37.21​%. ​Therefore, 37.21​% of the variation in height can be explained by the regression line. C. The coefficient of determination is 61​%. ​Therefore, 61​% of the variation in weight can be explained by the regression line. D. The coefficient of determination is 37.21​%. ​Therefore, 37.21​% of the variation in weight can be explained by the regression line.

D. The coefficient of determination is 37.21​%. ​Therefore, 37.21​% of the variation in weight can be explained by the regression line.

What is a regression line?

It is a tool for making predictions about future observed values.

What is the simplest shape for a scatterplot?

Linear, which is depicted by a straight line.

Does changing the order of variables change the r (correlation coefficient)?

No, the order should have no effect on r, because the strength is a linear relationship.

Can you use the correlation coefficient with linear and non linear scatterplots?

No, this is only used with linear data

Does having a correlation coefficient close to +1 or -1 tell you whether the relationship is linear or non linear?

No, you must plot/graph it out to see if it's linear or non linear.

Can you use the correlation coefficient to show causation?

No. This is observational data, not a controlled experiment.

What do you call the y-variable?

Response variable, Predicted variable, Dependent variable

What is the coefficient of determination?

Simply put, it's the correlation coefficient squared. This is often called the r-squared. This is usually then multiplied by 100 to change it to a %.

What does a curved line denote on a scatterplot?

Something that is nonlinear

What do you call the x-variable?

The Explanatory variable, Predictor variable or Independent variable

What is the law of large numbers?

The law of large numbers is a principle of probability according to which the frequencies of events with the same likelihood of occurrence even out, given enough trials or instances. As the number of experiments increases, the actual ratio of outcomes will converge on the theoretical, or expected, ratio of outcomes.

What are the two kinds of probabilities?

Theoretical probability which are based on specific assumptions and Empirical probability which are based on observation.

What does random mean?

There is no predictable pattern occurring.

What is an influential point?

This is a data point that can greatly effect your conclusion. You should try the data with and without the data point and comment on how it changes the conclusion.

What does a large amount of scatter depict in a scatterplot?

This shows a weak association

How do you find the intercept for the regression line?

To find the intercept (a), we must find the means of the variables x and y.

What do we look for in a scatterplot?

We look at trend (similar to center), strength (similar to spread) and shape.

How do you find slope for the regression line?

b = r(sy/sx) The slope of the regression line is the ratio of the standard deviations of the two variables multiplied by the correlation coefficient.

What are the two categories that you can break the numerical category into?

discrete outcomes/variables; continuous outcomes/variables

What is a statisticians equation for a line?

y = a + bx a is the intercept b is the slope y is the y intercept


Ensembles d'études connexes

Earth Science: Earth Layers & Minerals

View Set

Vocabulary Workshop Level G Units 1-4

View Set

Unit 3: Interests in Real Estate

View Set

Robert Oppenheimer - Direct & Cross Examination

View Set

CPT E/M for Inpatient Neonatal Intensive Care Services and Pediatric & Neonatal Critical Care Services

View Set

How to Read Charts and Graphs - InQuizitive Answers

View Set

VIERNES 5/5-Comparaciones de igualdad (WRITE)

View Set

Pathophysiology Exam 1 Questions (Mentimeter)

View Set