Learning Path 11 EXPLORING TWO QUANTITATIVE VARIABLES: CORRELATION

Ace your homework & exams now with Quizwiz!

When interpreting a correlation coefficient, the closer the correlation coefficient, the following guidelines can be used (but are not set-in-stone!):

Absolute value of r > 0.8: strong relationship Absolute value of r between 0.6 and 0.8: fairly strong relationship Absolute value of r between 0.5 to 0.6: moderate relationship Absolute value of r between 0.3 and 0.5: fairly weak relationship Absolute value of r between 0.1 and 0.3: weak relationship Absolute value of r < 0.1: virtually no relationship

Note: (outliers)

There is no rule about how outliers will affect the correlation coefficient. Every situation is unique. Generally, though, if we see a strong relationship without the outliers, the outliers will make the relationship appear weaker (i.e. correlation coefficient with outliers will be closer to 0.) if we see a weak relationship without the outliers, the outliers will make the relationship appear stronger (i.e. correlation coefficient with outliers will be closer to 1 or -1). Remember, never remove outliers just because they are outliers. Only remove outliers for a "non-statistical" reason. That is, only remove outliers if you find they are from a different population. If you cannot identify a different population to which the outliers belong, do an analysis with and without the outliers and report both.

If we see a weak relationship without the outliers, the outliers will tend to make the relationship appear stronger (i.e. correlation coefficient with outliers will be closer to 1 or -1) True False

True If a graph shows a weak relationship, similar to the graph below, we would assume that the correlation coefficient is something close to 0. With an extreme outlier (as seen in the graph below), a line that is fit between the points actually gets "pulled" toward the outlier and will change r in such a way to indicate a stronger relationship.

properties of correlation coefficient

−1≤r≤+1−1≤r≤+1 The sign of the correlation coefficient indicates the directions of the association A correlation coefficient of zero indicates no relationshipA negative sign indicates a negative relationship between the two variablesA positive sign indicates a positive relationship between the two variables. A correlation coefficient of +1 or -1 indicates a perfect linear association. Therefore, the closer the correlation coefficient is to +1 or -1, the stronger the relationship. Or you may consider the closer the correlation coefficient is to 0, the weaker the relationship. The correlation coefficient is a measure of the strength of a linear association. It should not be used when the relationship is non-linear! The correlation coefficient does not make a distinction between the response variable and the explanatory variable. That is, the correlation of x with y is the same as the correlation of y with x. There are no units for the correlation coefficient The correlation coefficient is sensitive to (i.e. affected by) outliers

State whether each is a legitimate correlation coefficient. 1.28 0 -0.5

1.28 - no - cannot be greater than 1 0 - yes -0.5 - yes, cannot be less than -1

Important point!

As discussed in Learning Path 5, causation can never be inferred in observational studies because there could be other confounding variables that are the explanation for any observed association. Therefore, even when we see have a correlation coefficient that indicates a strong association, we do not want to infer causation by saying the explanatory variable is the cause of the strong association!

Notation for correlation coefficient

If we have data on an entire population, we would use the Greek letter rho (ρ) to represent the population correlation coefficient. If we have data from a sample, which is typical, we would use the Latin letter r to represent the sample correlation coefficient.

Researchers observe that the average height of a kitchen cabinet in Germany is 6 feet above the ground and they also observe that the average torso length of women in Germany is 2 feet 10 inches. Furthermore, the researchers observe that the correlation coefficient between kitchen cabinet height and torso length is 0.894. Is it plausible to say that having higher kitchen cabinets has caused German women to have longer torsos? No, the researchers cannot imply causation because they conducted an observational study. Yes, the correlation coefficient tells us that the relationship between kitchen cabinet height and torso length is strong and positive therefore one increasing will cause the other one to increase. No, since the correlation coefficient is not equal to 1 we cannot say that there is a causal relationship.

No, the researchers cannot imply causation because they conducted an observational study.

Recall the Old Faithful example introduced on the previous page titled "The effect of deviations on the correlation coefficient." We saw a strong relationship between the duration of a current eruption and the wait time until the next eruption (r = 0.87). Does this mean that a longer eruption is the reason why we have to wait longer until the next eruption? Not necessarily Yes

Not necessarily Since this was an observational study, causation cannot be inferred. In this example, there could be some other geological factor that is affecting both the duration and the wait time and is the real cause of the relationship we see.

correlation coefficient

We can assign a numerical value to the strength of the relationship

clusters present in data

When there are clusters in the data the relationship between X and Y is being affected by some other factor that we may not know about so it is best to separate the relationship for the two clusters rather than analyzing the data all together. In this way we can determine the correlation coefficient for the two clusters and get a more precise estimate.

Is it possible to calculate the correlation coefficient for data that appear to have a non-linear relationship? Select the best answer. Yes - it is possible and perfectly reasonable to calculated the correlation coefficient when the relationship is curved as long as the two variables are quantitative Yes - it is possible BUT NOT REASONABLE to calculate the correlation coefficient when the relationship is curved because the r should only be used to describe the strength and direction of a linear relationship. No - the formula will not return an r between -1 and 1 when the relationship has a non-linear relationship.

Yes - it is possible BUT NOT REASONABLE to calculate the correlation coefficient when the relationship is curved because the r should only be used to describe the strength and direction of a linear relationship. If you notice, the formula for r only requires x-values summaries as well as y-values and summaries. It is possible to calculate the correlation coefficient for any type of relationships BUT it should only be used if a linear relationship exists.

A positive correlation means that as x increases, y shows a tendency to increase decrease stay the same

increase If the correlation coefficient is positive, then the relationship has a positive direction (going up from left to right). This means that as x increases, the value of y tends to increase as well.

If a scatterplot shows a trend of points that are tightly packed around a straight line, pointing down as you move from left to right on the x-axis, the approximate value of the correlation coefficient is most likely between -1 and 0.7 it is impossible to tell without looking at a scatterplot greater than 1 less than -1 most likely between 0.7 and 1

most likely between -1 and 0.7 Since the description of the relationship is negative ("pointing down as you move from left to right on the x-axis), the correlation coefficient must be negative. The description also mentions that the points are "tightly packed around a straight line" indicating a strong relationship. We would estimate r to be between -1 and -0.7.

A psychologist was interested in the relationship between number of hours of sleep per night and grade point average for middle school students. She surveyed every tenth student on an alphabetical list of students at a local middle school and found a correlation coefficient of 0.24. What is the notation for this correlation coefficient? p r

r Since the psychologist surveyed only a subset from the population to which she wanted to make a conclusion (all middle school students), the correlation she calculated is a sample correlation coefficient, which has a notation of r.

Suppose the weights and fuel efficiencies of a sample of automobiles were measured. The weights measurements were originally measured in kilograms and the correlation was found to be 0.78. Researchers wanted to display a scatterplot using pounds, instead of kilograms, as the units for car weights. What effect would this have on the correlation coefficient? (Note that 1 kilogram is 2.205 pounds) r would remain unchanged at 0.78 r would increase by a factor of 2.205 r would increase by 2.2% r would either increase or decrease by 2.2%

r would remain unchanged at 0.78 Since r has no units, changing the units of the data will have no effect on the correlation coefficient

The correlation coefficient is a measure that describes the ___________- and ___________________of a Choose _______________________ relationship between two quantitative variables.

strength, direction, linear

A psychologist was interested in the relationship between number of hours of sleep per night and grade point average for middle school students. She surveyed every tenth student on an alphabetical list of students at a local middle school. Suppose the psychologist wanted to make a conclusion to only students at this local middle school. Which of the following best describes how she collected her data? simple random sample completely randomized experiment systematic random sample prospective observational study convenience sample

systematic random sample Selecting every kth case from the population is an example of a systematic sample. Recall what makes a systematic sample random: which case to start with is randomly chosen.

Note:

that it is only appropriate to use the term correlation to describe the relationship between two quantitative variables. If one or both variables are not quantitative, the term "association" should be used instead of correlation. the correlation coefficient should only be used if there is a linear relationship between the two quantitative variables. If the relationship is non-linear, the correlation coefficient can still be calculated, but it may give a very misleading representation of the strength of the relationship. that from the correlation coefficient, we will know both the strength AND direction of the relationship between the two quantitative variables, as long as it is a linear relationship. Note: the words "relationship" and "association" are used interchangeably. Most of the time, we'll use the word "relationship", but occasionally, we'll use the word "association". They mean the same thing.


Related study sets

CH:13 Nervous System: Brain and Cranial Nerves

View Set

Pharmacological and Parenteral Therapies - ML4

View Set

Intermediate Macroeconomics Test 2

View Set

Avancemos 1: Unit 2 Lesson 2 Vocabulary

View Set

management information systems chapter 6

View Set

Ch. 39 Activity & Exercise & Ch. 28 Immobility

View Set