MATH-164 - Chapter 4
The linear correlation coefficient is always between
-1 and 1 The linear correlation coefficient is always between negative −1 and 1, inclusive.
In a scatter diagram, the _______ variable is plotted on the horizontal axis and the _______ variable is plotted on the vertical axis.
Explanatory, response
The closer r is to +1, the _______ the evidence is of ________ association between the two variables.
Stronger, Positive
Two variables that are linearly related are negatively associated when above-average values of one variable are associated with below-average values of the other variable.
That is, two variables are negatively associated if, whenever the value of one variable increases, the value of the other variable decreases.
What does it mean to say that two variables are negatively associated?
There is a linear relationship between the variables, and whenever the value of one variable increases, the value of the other variable decreases.
What does it mean to say that two variables are positively associated?
There is a linear relationship between the variables, and whenever the value of one variable increases, the value of the other variable increases.
lurking variable
a variable other than x and y that simultaneously affects both variables, accounting for the correlation between the two. is an explanatory variable that was not considered in the study, but affects the value of the response variable. In addition, lurking variables are typically related to explanatory variables considered in the study.
positive linear correlation coefficient
means that the sum of the products of the z-scores for x and y must be positive.
response (dependent) variable
variable of interest (measures the outcome of a study) is the variable whose value can be explained by the value of the explanatory (or predictor or independent) variable.
Confounding
in a study occurs when the effects of two or more explanatory variables are not separated. Therefore, any relation that may exist between an explanatory and response variable may be due to some other variable or variables not accounted for in the study.
Match the linear correlation coefficient to the scatter diagram. The scales on the x- and y-axis are the same for each scatter diagram. (a) r=−1, (b) r=−0.049 (c) r=−0.0810
(a) Scatter diagram I. (b) Scatter diagram II. (c) Scatter diagram III.
Confounding Variables (CV)
AKA the extraneous variables, these variables cannot be controlled by the researcher and could influence any change in the Dependent Variables (DV). This is the third variable the mediator variable that can adversely affect the relation between the independent variable and dependent variable which then causes a bias to the experiment. is an explanatory variable that was considered in the study whose effect cannot be distinguished from a second explanatory variable in the study.
Two variables that are linearly related are positively associated when above-average values of one variable are associated with above-average values of the other variable (or below-average values of one variable are associated with below-average values of the other variable).
That is, two variables are positively associated if, whenever the value of one variable increases, the value of the other variable also increases.
A pediatrician wants to determine the relation that may exist between a child's height and head circumference. She randomly selects 8 children, measures their height and head circumference, and obtains the data shown in the table. (a) If the pediatrician wants to use height to predict head circumference, determine which variable is the explanatory variable and which is the response variable.
a) The explanatory variable is height and the response variable is head circumference. Draw scatter diagram. x-height, y is head circumference
The accompanying data represent the number of days absent, x, and the final exam score, y, for a sample of college students in a general education course at a large state university. Number of absences, x Final exam score, y 0 88.6 1 85.6 2 82.7 3 80.7 4 77.5 5 73.6 6 63.5 7 71.2 8 66.1 9 66.7 Complete parts (a) through (e) below. (a) Find the least-squares regression line treating number of absences as the explanatory variable and the final exam score as the response variable. (b) Interpret the slope and the y-intercept, if appropriate. Choose the correct answer below and fill in any answer boxes in your choice. (c) Predict the final exam score for a student who misses five class periods. (d) Draw the least-squares regression line on the scatter diagram of the data. Choose the correct graph below. (e) Would it be reasonable to use the least-squares regression line to predict the final exam score for a student who has missed 15 class periods? Why or why not?
(a) Find the least-squares regression line treating number of absences as the explanatory variable and the final exam score as the response variable. y= − 2.707x+87.8 b. For every additional absence, a student's final exam score drops 2.707 points, on average. The average final exam score of students who miss no classes is 87.8. c.=74.27 (Round to two decimal places asneeded.) y= − 2.707(5)+87.8 Compute the residual. -.67 Round to two decimal places as needed.) number of absences from data given 73.6-74.27 (calculated data) Is the final exam score above or below average for this number of absences? Below (73.6 is below the calculated average for 5 absences) e. No, because 15 absences is outside the scope of the model. (given data only goes up to 9 absences)
scatter diagram (scatterplot)
A plot of paired (x,y) data with a horizontal x-axis and a vertical y-axis. Data is paired in a way that matches each value form one data set with a corresponding value from a second data set. Helps to determine whether there is some relationship between two variables. is a graph that shows the relationship between two quantitative variables measured on the same individual. Each individual in the data set is represented by a point in the scatter diagram. The explanatory variable is plotted on the horizontal axis (X), and the response variable is plotted on the vertical axis (y).
bivariate data
Consists of two variables, an explanatory and a response variable, usually quantitative. data in which two variables are measured on an individual. For example, we might want to know whether the amount of cola consumed per week is related to a person's bone density. The individuals would be the people in the study, and the two variables would be the amount of cola consumed weekly and bone density.
Determine whether the following statement is true or false. If r is close to 0, then little or no evidence exists of a relation between the two quantitative variables.
False. A value of r close to zero does not imply no relation, just no linear relation.
T/F Which of the following is true of the least-squares regression line y=b1x+b0?
The sign of the linear correlation coefficient, r, and the sign of the slope of the least-squares regression line, b1, are the same. The predicted value of y, y, is an estimate of the mean value of the response variable for that particular value of the explanatory variable. The least-squares regression line always contains the point x,y. The least-squares regression line minimizes the sum of squared residuals.
The least-squares regression line minimizes the sum of the squared errors (or residuals).
This line minimizes the sum of the squared vertical distance between the observed values of y and those predicted by the line, yˆ (read "y-hat"). We represent this as "minimize∑residuals2
Remember the idea of the slope of a line from algebra?
Two variables that are positively associated can be described by a line with positive slope; two variables that are negatively associated can be described by a line with negative slope.
A student at a junior college conducted a survey of 20 randomly selected full-time students to determine the relation between the number of hours of video game playing each week, x, and grade-point average, y. She found that a linear relation exists between the two variables. The least-squares regression line that describes this relation is y = -0.0572x + 2.9205a) a. Predict the grade-point average of a student who plays video games 8 hours per week. b) Interpret the slope. c) If appropriate, interpret the y-intercept. (d) A student who plays video games 7 hours per week has a grade-point average of 2.65. Is the student's grade-point average above or below average among all students who play video games 7 hours per week?
a) y = (-0.0572*8) + 2.9205 = 2.46 b) For each additional hour that a student spends playing video games in a week, the grade-point average will decrease by 0.0572 points, on average. c) The grade-point average of a student who does not play video games is 2.9205. (x=0) d)The student's grade-point average is above average for those who play video games 7 hours per week. (y = (-0.0572*7) + 2.9205 = 2.52
Is there a relation between the age difference between husband/wives and the percent of a country that is literate? Researchers found the least-squares regression between age difference (husband age minus wife age), y, and literacy rate (percent of the population that is literate), x, is y=−0.0424x+8.2. The model applied for 17≤x≤100. Complete parts (a) through (e) below. (a) Interpret the slope. Select the correct choice below and fill in the answer box to complete your choice. (b) Does it make sense to interpret the y-intercept? Explain. Choose the correct answer below. (c) Predict the age difference between husband/wife in a country where the literacy rate is 43 percent. (d) Would it make sense to use this model to predict the age difference between husband/wife in a country where the literacy rate is 11%? (e) The literacy rate in a country is 98% and the age difference between husbands and wives is 2 years. Is this age difference above or below the average age difference among all countries whose literacy rate is 98%? Select the correct choice below and fill in the answer box to complete your choice.
a. For every unit increase in literacy rate, the age difference falls by 0.0424 units, on average. b. No—it does not make sense to interpret they-intercept because anx-value of 0 is outside the scope of the model. c. 6.4 years y=−0.0424(43)+8.2 y=−1.8232+8.2 y=6.3768 or 6.4 d. No—it does not make sense because anx-value of 11 is outside the scope of the model. (11% is more than 8.2) e. Below—the average age difference among all countries whose literacy rate is 98% is 4.0 years. y=−0.0424(98)+8.2
A pediatrician wants to determine the relation that exists between a child's height, x, and head circumference, y. She randomly selects 11 children from her practice, measures their heights and head circumferences and obtains the accompanying data. Complete parts (a) through (e). (a) Find the least-squares regression line treating height as the explanatory variable and head circumference as the response variable. (b)Use the regression equation to predict the head circumference of a child who is 25 inches tall. (c) Compute the residual based on the observed head circumference of the 25-inch-tall child in the table. Is the head circumference of this child above average or below average? (d) Draw the least-squares regression line on the scatter diagram of the data and label the residual from part (c). Choose the correct graph below. Is the head circumference of this child above average or below average? (e) Notice that two children are 26.75 inches tall. One has a head circumference of 17.3 inches; the other has a head circumference of 17.5 inches. How can this be?
a. The least-squares regression line is y=0.1863x+12.3728 b. The predicted value of the head circumference of a child who is 25 inches tall is 17.03 inches. c. The residual based on the observed head circumference of the 25-inch-tall child is −.13 inches. (to calculate look in the table for data for 25 inches tall data, then subtract 17.03) d. select the correct graph according to statcrunch. Below average (if the residual results are negative, the answer will be below average) e. For children who are 26.75 inches tall, head circumference varies.
Lyme disease is an inflammatory disease that results in a skin rash and flulike symptoms. It is transmitted through the bite of an infected deer tick. The following data represent the number of reported cases of Lyme disease and the number of drowning deaths for a rural county. Complete parts (a) through (c) below. (a) Draw a scatter diagram of the data. Choose the correct graph below. (b) Determine the linear correlation coefficient between Lyme disease and drowning deaths. (c) Does a linear relation exist between the number of reported cases of Lyme disease and the number of drowning deaths?
a. use applets to graph, X is Lyme disease, y is drowning b. The linear correlation coefficient between Lyme disease and drowning deaths is r=0.957 c. The variables Lyme disease and drowning deaths are positively associated because r is positive and the absolute value of the correlation coefficient, 0.957, is greater than the critical value, 0.576. An increase in Lyme disease does not cause an increase in drowning deaths. The temperature and time of year are likely lurking variables.
A pediatrician wants to determine the relation that exists between a child's height, x, and head circumference, y. She randomly selects 11 children from her practice, measures their heights and head circumferences, and obtains the accompanying data. Height (inches), x Head Circumference (inches), y 27.5 17.8 24.5 17.3 25.5 17.3 26 17.8 24.25 17.1 28 17.9 26.5 17.6 27.25 17.8 26 17.5 26 17.7 28 17.8 Complete parts (a) through (g) below (a) Find the least-squares regression line treating height as the explanatory variable and head circumference as the response variable. (b) Interpret the slope and y-intercept, if appropriate. First interpret the slope. Select the correct choice below and, if necessary, fill in the answer box to complete your choice. Interpret the y-intercept, if appropriate. Select the correct choice below and, if necessary, fill in the answer box to complete your choice. (c) Use the regression equation to predict the head circumference of a child who is 24.25 inches tall. (d) Compute the residual based on the observed head circumference of the 24.25-inch-tall child in the table. Is the head circumference of this child above or below the value predicted by the regression model? (e) Draw the least-squares regression line on the scatter diagram of the data and label the residual from part (d). Choose the correct graph below. (f) Notice that two children are 26 inches tall. One has a head circumference of 17.5 inches; the other has a head circumference of 17.7 inches. How can this be? (g) Would it be reasonable to use the least-squares regression line to predict the head circumference of a child who was 32 inches tall? Why?
a. y=0.183x+12.8 b. For every inch increase in height, the head circumference increases by 0.183 in., on average. It is not appropriate to interpret the y-intercept. c. y=17.24 in. y=0.183(24.25)+12.8 d. The residual for this observation is −.14, meaning that the head circumference of this child is below the value predicted by the regression model. f. For children with a height of 26 inches, head circumferences vary. No—this height is outside the scope of the model. (look at the data all subjects were under 28-inch height)
Lyme disease is an inflammatory disease that results in a skin rash and flulike symptoms. It is transmitted through the bite of an infected deer tick. The following data represent the number of reported cases of Lyme disease and the number of drowning deaths for a rural county. Cases_of_Lyme_Disease Drowning_Deaths Month 3 0 J 1 1 F 3 2 M 4 1 A 5 2 M 15 10 J 22 16 J 13 5 A 6 3 S 5 3 O 4 1 N 1 0 D Critical Values for Correlation Coefficient n 3 0.997 4 0.950 5 0.878 6 0.811 7 0.754 8 0.707 9 0.666 10 0.632 11 0.602 12 0.576 13 0.553 14 0.532 15 0.514 16 0.497 17 0.482 18 0.468 19 0.456 20 0.444 21 0.433 22 0.423 23 0.413 24 0.404 25 0.396 26 0.388 27 0.381 28 0.374 29 0.367 30 0.361 Complete parts (a) through (c) below. (a) Draw a scatter diagram of the data. Choose the correct graph below. (b) Determine the linear correlation coefficient between Lyme disease and drowning deaths. (c) Does a linear relation exist between the number of reported cases of Lyme disease and the number of drowning deaths? Do you believe that an increase of Lyme disease causes an increase in drowning deaths? What is a likely lurking variable between cases of Lyme disease and drowning deaths?
b. The linear correlation coefficient between Lyme disease and drowning deaths is r=0.964 c. The variables Lyme disease and drowning deaths are positively associated because r is positive and the absolute value of the correlation coefficient, 0.964, is greater than the critical value, 0.576. (Round to three decimal places as needed.) (look up the critical value for the sample size from the data given) d. An increase in Lyme disease does not cause an increase in drowning deaths. The temperature and time of year are likely lurking variables.
The linear correlation coefficient, or Pearson product moment correlation coefficient
is a measure of the strength and direction of the linear relation between two quantitative variables. The Greek letter ρ (rho) represents the population correlation coefficient, and r represents the sample correlation coefficient. We present only the formula for the sample correlation coefficient.
Suppose the line y=2.8333x−22.4967 describes the relation between the club-head speed (in miles per hour), x and the distance a golf ball travels (in yards), y. (a) Predict the distance a golf ball will travel if the club-head speed is 100 mph. (b) Suppose the observed distance a golf ball traveled when the club-head speed was 100 mph was 265 yards. What is the residual?
(a) The golf ball will travel 260.8 yards. y=2.8333x-22.4967 y=2.8333(100)-22.4967 y=283.33-22.4967 y=260.8 (b) The residual is 4.2 residual=observe y - predicted y y=265-260.8 y=4.2