2.5 Case Q→Q: Correlation Coefficient-r

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Properties of r We will now discuss and illustrate several important properties of the correlation coefficient as a numerical measure of the strength of a linear relationship.

1. The correlation does not change when the units of measurement of either one of the variables change. In other words, if we change the units of measurement of the explanatory variable and/or the response variable, this has no effect on the correlation (r). To illustrate this, below are two versions of the scatterplot of the relationship between sign legibility distance and driver's age:

Remember, to get the correlation values in the calculator, you must first turn Diagnostic On. This only has to be done once, unless you reset your calculator.

2nd → Catalog (0) Scroll down to Diagnostic On, Enter Press Enter. Screen should say Done.

Example: Statistics Courses

A statistics department is interested in tracking the progress of its students from entry until graduation. As part of the study, the department tabulates the performance of 10 students in an introductory course and in an upper-level course required for graduation. What is the relationship between the students' course averages in the two courses? Here is the scatterplot for the data:

Observe that the relationship between gestation period and longevity is linear and positive. Now we will compute the correlation between gestation period and longevity.

Choose Stat → Calc → 8:LinReg(a+bx). Select Xlist: L1, longevity and Ylist: L2, gestation. Scroll Down to Calculate, Enter A lot of information should be displayed on the screen. For right now, we are interested in the value r. SCATTER DIAGRAM: https://www.youtube.com/watch?v=oCHk4xjnH7o

Example: Highway Sign Visibility

Earlier, we used the scatterplot below to find a negative linear relationship between the age of a driver and the maximum distance at which a highway sign was legible. What about the strength of the relationship? It turns out that the correlation between the two variables is r = -0.793.

Remember that the correlation is only an appropriate measure of the linear relationship between two quantitative variables. First produce a scatterplot to verify that gestation and longevity are nearly linear in their relationship.

Enter the longevity in L1 Enter the gestation in L2 making sure each gestation is matched with the animals corresponding longevity. To get the scatterplot: 2nd → statplot→1. Use Xlist: L1, longevity, Ylist:L2, gestation. Press Zoom 9

Comment In the last activity, we saw an example where there was a positive linear relationship between the two variables, and including the outlier just "strengthened" it. Consider the hypothetical data displayed by the following scatterplot:

In this case, the low outlier gives an "illusion" of a positive linear relationship, whereas in reality, there is no linear relationship between X and Y.

Interpretation

Once we obtain the value of r, its interpretation with respect to the strength of linear relationships is quite simple, as this walkthrough will illustrate:

===== INSTRUCTIONS=====

Open the file Math 146 gestation data.xls at this [LINKOpens externally]. This data should still be entered in your calculator: Stat --> Edit, look at the data in L1 and L2 Scroll down to row 15 of the data. You will see that this contains the values of the variables for the elephant. Delete the value for longevity for elephant (645), and delete the corresponding vlaue for gestation for elephant (40) You have removed the elephant's values from the data lists. Recompute the correlation between gestation and longevity again. Choose Stat → Calc → 8:LinReg(a+bx). Select Xlist: L1, longevity and Ylist: L2, gestation. Scroll Down to Calculate, Enter

2. The top scatterplot displays the original data where the maximum distances is measured in feet. The bottom scatterplot displays the same relationship, but with maximum distances changed to meters. Notice that the Y-values have changed, but the correlations are the same. This is an example of how changing the units of measurement of the response variable has no effect on r, but as we indicated above, the same is true for changing the units of the explanatory variable, or of both variables. This might be a good place to comment that the correlation (r) is "unitless". It is just a number.

Our data describe a fairly simple curvilinear relationship: the amount of fuel consumed decreases rapidly to a minimum for a car driving 60 kilometers per hour, and then increases gradually for speeds exceeding 60 kilometers per hour. The relationship is very strong, as the observations seem to perfectly fit the curve.

Learn By Doing: Linear Relationships In this activity we will: Learn how to compute the correlation (r). Practice interpreting the value of the correlation. See an example of how including an outlier can increase the correlation.

Recall the following example: The average gestation period, or time of pregnancy, of an animal is closely related to its longevity—the length of its lifespan. Data on the average gestation period and longevity (in captivity) of 40 different species of animals have been recorded.

Introduction

So far we have visualized relationships between two quantitative variables using scatterplots, and described the overall pattern of a relationship by considering its direction, form, and strength. We noted that assessing the strength of a relationship just by looking at the scatterplot is quite difficult, and therefore we need to supplement the scatterplot with some kind of numerical measure that will help us assess the strength. In this part, we will restrict our attention to the special case of relationships that have a linear form, since they are quite common and relatively simple to detect. More importantly, there exists a numerical measure that assesses the strength of the linear relationship between two quantitative variables with which we can supplement the scatterplot. We will introduce this numerical measure here and discuss it in detail. Even though from this point on we are going to focus only on linear relationships, it is important to remember that not every relationship between two quantitative variables has a linear form. We have actually seen several examples of relationships that are not linear. The statistical tools that will be introduced here are appropriate only for examining linear relationships, and as we will see, when they are used in nonlinear situations, these tools can lead to errors in reasoning.

Comment Note that in both examples we supplemented the scatterplot with the correlation (r). Now that we have the correlation (r), why do we still need to look at a scatterplot when examining the relationship between two quantitative variables?

The correlation coefficient can only be interpreted as the measure of the strength of a linear relationship, so we need the scatterplot to verify that the relationship indeed looks linear. This point and its importance will be clearer after we examine a few properties of r.

Although the relationship is strong, the correlation r = -0.172 indicates a weak linear relationship. This makes sense considering that the data fails to adhere closely to a linear form:

The correlation is useless for assessing the strength of any type of relationship that is not linear (including relationships that are curvilinear, such as the one in our example). Beware, then, of interpreting the fact that "r is close to 0" as an indicator of a "weak relationship" rather than a "weak linear relationship." This example also illustrates how important it is to always "look at" the data in the scatterplot, since, as in our example, there might be a strong nonlinear relationship that r does not indicate. Since the correlation was nearly zero when the form of the relationship was not linear, we might ask if the correlation can be used to determine whether or not a relationship is linear.

The Correlation Coefficient—r

The numerical measure that assesses the strength of a linear relationship is called the correlation coefficient, and is denoted by r. We will: - Give a definition of the correlation r, - Discuss the calculation of r, - Explain how to interpret the value of r, and - Talk about some of the properties of r.

Q1

The purpose of this example was to illustrate how assessing the strength of the linear relationship from a scatterplot alone is problematic, since our judgment might be affected by the scale on which the values are plotted. This example, therefore, provides a motivation for the need to supplement the scatterplot with a numerical measure that will measure the strength of the linear relationship between two quantitative variables.

3. The correlation by itself is not enough to determine whether or not a relationship is linear. To see this, let's consider the study that examined the effect of monetary incentives on the return rate of questionnaires. Below is the scatterplot relating the percentage of participants who completed a survey to the monetary incentive that researchers promised to participants, in which we find a strong curvilinear relationship:

The relationship is curvilinear, yet the correlation r = 0.876 is quite close to 1. In the last two examples we have seen two very strong curvilinear relationships, one with a correlation close to 0, and one with a correlation close to 1. Therefore, the correlation alone does not indicate whether a relationship is linear or not. The important principle here is: Always look at the data!

4. The correlation is heavily influenced by outliers. The way in which the outlier influences the correlation depends upon whether or not the outlier is consistent with the pattern of the linear relationship. Even one extreme outlier can dramatically change the correlation coefficient, so it's important to look at the data.

VIDEO: https://www.youtube.com/watch?v=oCHk4xjnH7o

Link

https://www.youtube.com/watch?v=9FWnFN9ShYw

scatter language

https://www.youtube.com/watch?v=oCHk4xjnH7o

correlation coefficient (r)

numerical measure that measures the strength and direction of a linear relationship between two quantitative variables.

Calculation:

r is calculated using the following formula:

The purpose of this example

was to illustrate how assessing the strength of the linear relationship from a scatterplot alone is problematic, since our judgment might be affected by the scale on which the values are plotted. This example, therefore, provides a motivation for the need to supplement the scatterplot with a numerical measure that will measure the strength of the linear relationship between two quantitative variables.


Set pelajaran terkait

Unit 32: Pediatric Musculoskeletal, Neuromuscular, Neurocognitive Disorders/Cognitive Impairment

View Set

Financial Accounting Exam 2 Chapters 4-6

View Set

Intro to Business Chapter Thirteen Quiz

View Set

VN 131 STUDY GUIDE CH. 16,17,18 CARDIO

View Set

UNIT 5: other investment vehicles

View Set

RENAISSANCE GOVERNMENT: FRANCE AND ENGLAND

View Set

Lecture 2 : Diagnostic Testing : Sensitivity and Specificity

View Set