Statistics Chapter 5

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

lurking variable

A caution about correlation and regression... a variable not among the explanatory or response variables in a study and yet may influence the interpretation of relationships among these variables.

extrapolation

A caution about correlation and regression... the use of a regression line for predictions outside the range of x-values that you used to obtain the line... Such predictions are often not accurate.

-.5 r^2 tells us the percentage of variation in final grades that is explained by number of absences, so r^2 = 0.25. To get r we take the square root of 0.25 to get 0.5. Since more absences are associated with lower final grades, the sign on r is negative. Hence, r = -0.5.

A college professor investigated the relationship between the number of absences and the final grade for the course. He found that more absences were associated with lower final grades. Number of absences explained 25% of the variation is final grades. The correlation coefficient for this relationship will be ______. Numeric answer needed.

(3, 72). A regression line always passes through the point: (x¯,y¯). So, the point (3, 72) must be on the regression line.

A college professor investigated the relationship between the number of absences and the final grade for the course. He found that more absences were associated with lower final grades. The average number of absences was 3 and the average final grade was 72. Thus, the point (___, ___) must be on the regression line.

prediction "error"

A residual is the remaining _______________ "_______" after we have calculated the regression line.

one, r

Along the regression line, a change of ____ standard deviation in x corresponds to a change of ____ standard deviations in y.

association

An ________________ between an explanatory variable x and a response variable y, even if it is very strong, is not by itself good evidence that changes in x actually cause changes in y.

0.8. If 64% of the variation is explained, this is r^2. The correlation is the square root of this.

An investigation of the relationship between height of a child and age (in months) found that age explained 64% of the variation in height. Knowing this, the correlation between age and height for children is ___. Round your answer to the nearest tenth: X.X

influential

An observation is _______________ for a statistical calculation if removing it would markedly change the result of the calculation.

No

Are correlation and least-squares regression lines resistant to outliers and influential observations?

linear

Correlation and regression lines describe only __________ relationships.

y = 20 + 4x If Dan starts with $20 in his box, then the y-intercept is 20. Adding $4 per week gives a slope of 4. So the regression line is ŷ = 20 + 4x.

Dan's grandmother gave him $30 for his birthday. He decides to put $20 in a box for saving. Every week his parents give him $5 for his allowance. He decides to start adding $4 of his allowance to the box every week. What is the regression line for predicting the amount of money (ŷ ) that will be in his box from x weeks of adding to the box?

r Interchanging x and y always changes slope and y-intercept, but not another statistic.

Expressing the regression equation in terms of the x variable instead of the y variable will not cause the ____ to change.

To model how a response variable y changes as an explanatory variable x changes. We use a regression line to model the linear relationship between a response variable y as an explanatory variable x changes. We often use this regression line to predict the value of y for a given value of x. Note: We do not use the regression line to predict the value of x for a given value of y. The explanatory variable x is used to predict the response variable y.

The general form for a linear equation is given as: y = a + bx. What is the purpose of this equation?

same sign

The slope and the correlation always have the ________ _______.

correlation, slope

There is a close connection between _______________ and the ________ of the least-squares line.

True. Slope = r (sy / sx). From this we see that when r = 1, a change in the predicted y (in standard deviation units) is the same as the change in x.

True or false? A change of one standard deviation in x corresponds to a change of r standard deviations in y.

False. A regression line of this form always uses a straight line to describe the relationship between x and y.

True or false? A regression model given in general form as y = a + bx may be used if a curved or straight line describes the relationship between the variables x and y.

True. The coefficient of x in the model tells us how y changes when x increases by 1 unit. Here, x represents week, so for each additional week, his saving increases by $8.

True or false? Dan has been saving money each week in a box under his bed. The equation that predicts how much money he has is ŷ = 20 + 8x, where x is the number of weeks. This equation tells us that each week his savings increase by $8.

False. The size of slope has no impact on whether the prediction of y is accurate or not. The correlation between the x and y variables gives information on whether the prediction is accurate. The size of slope depends on the units in which we measure the two variables.

True or false? If the value for slope is smaller than 0.5, then the least squares regression line will not give accurate predictions for y.

False. The correlation is r and the percent of variation explained by the model is r^2. If r = 0.25, r^2 is 0.252, or 0.0625 (6.25%).

True or false? In a model to relate price of a luxury house and the square footage, we found a correlation of 0.25. Square footage explained 25% of the variation in price.

True. The eyes can be fooled at times with graphs. Deleting the point and determining how it does or does not affect your results is the only way to really decide if it is influential.

True or false? The best way to decide if a point is really influential is to delete it and recompute all the statistics.

True. Although other factors such as age and lot size affect the price, size can be considered to cause the increase in price. This is because the effect is seen repeatedly and has a plausible physical relationship (larger houses take more materials and more labor to build.)

True or false? Thinking simply, examining the relationship between the price of a house and the size of a house, we can say that size does cause price to increase.

zero

We can calculate the residual for each individual in the data set. Because of the mathematics of least-squares regression, the sum (and therefore the mean) of the residuals is always _______.

residual plot

We hope to see no significant patterns -- i.e., random scatter that is evenly spread about the residual = 0 line -- in the ___________ _______.

A. A valid experiment. Lurking variables are always potential problems in observational studies. A valid experiment is necessary to draw conclusions about the explanatory variable causing changes in the response variable.

What type of study must be conducted in order to establish that the explanatory variable causes changes in the response variable? A. A valid experiment. B. A multistage random sample. C. A sample survey. D. An observational survey.

B. 98%. r^2 gives the percentage of variation in y that is explained by the least squares regression line. 98% is the largest of these r^2 values; it is associated with the line explaining the most variation in y.

Which one of the following r^2 values is associated with the line explaining the most variation in y? A. 84% B. 98% C. 76% D. 57%

D. y = 5.8 + 0.15x, r^2 = 0.9 This line has the highest r^2 and hence will give the best predictions.

Which regression line will give the best predictions? A. y = 0.3 + 150x, r^2 = 0.5 B. y = 100 + 150x, r^2 = 0.7 C. y = 0.3 + 5x, r^2 = 0.8 D. y = 5.8 + 0.15x, r^2 = 0.9

sign

The _______ of the residual reflects the observation's position relative to the regression line.

r

In the equation b = r (sy / sx), ____ is the correlation.

sx

In the equation b = r (sy / sx), ____ is the standard deviation of x.

sy

In the equation b = r (sy / sx), ____ is the standard deviation of y.

a

In the equation y = a + bx, ____ is the intercept, the value of y when x = 0.

y

In the equation y = a + bx, ____ is the predicted y-value.

b

In the equation y = a + bx, ____ is the slope.

x, y

In the equation z = y - bx, ____ and ____ are the sample means of their respective variables.

least-squares regression

Mathematically, the ________-__________ ______________ line of y on x is the line that minimizes the sum of the squared vertical deviations between the data points and the line.

positive, negative

Observations above the regression equation have ____________ residuals. Observations that are below the regression equation have ____________ residuals.

x

Points that are outliers in the ____ direction are often influential for the least-squares regression line.

strength

The correlation describes the ____________ of a straight-line relationship in a specific way: the square of the correlation r^2, is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x.

regression

The distinction between explanatory and response variables is essential in _______________. If we reverse the roles of the two variables, we get a different least-squares line.

intercept

The equation a = y - bx is used to calculate the _____________ a.

slope

The equation b = r (sy / sx) is how we calculate the ________.

Yes. Weight can be a lurking variable. Just FYI, there is a strong association between being overweight and lower joint pain that is consistently seen. Further, being overweight precedes problems with knees, hips, and ankles and the cause is plausible - carrying the extra weight puts stress on the legs and feet.

Researchers interviewed a group of women with knee pain awaiting knee replacement surgery. They also interviewed a group of women from the same geographical area with no knee pain. These researchers reported that wearing high-heeled shoes caused the knee pain which required surgery. Could a woman's weight be a reasonable cause to explain their knee pain rather than high-heeled shoes?

strength, direction

Scatterplots and correlation coefficients are helpful in describing the ____________ and ______________ of the relationship between two quantitative variables.

ecological correlation

a caution about correlation and regression... a correlation based on averages rather than on individuals... typically stronger than correlation for individuals

residual plot

a scatterplot of the regression residuals against the explanatory variable... help us assess how well a regression line fits the data... good diagnostic tools for least-squares regression

regression line

a straight line that describes how a response variable y changes as an explanatory variable x change... we often use this line to predict the value of y for a given value of x.

slope

b... the amount by which y changes when x changes by one unit.

greater

r^2 can never be _______ than 1, although it could be 1 in a perfect linear relationship with no scatter.

observed, predicted

residual = ___________ y - ___________ y

residual

the difference between an observed value of the response variable and the value predicted by the regression line


Ensembles d'études connexes

Chapter 3 Policies, Procedures, and Awareness

View Set

UNCC BLAW exam 3 (8,9,10,20,34,35)

View Set

Chapter 5 Therapeutic Relationships NCLEX

View Set

History, Chapter 1, Unit 1, 1.06 The Nile River Valley

View Set

study guide flash cards: cog exam 3

View Set