ch. 7
the sum of the X scores times the sum of the Y scores and
(ΣX)(ΣY)
the square of the sum of the X scores
(ΣX)₂
• is the square of the sum of the Y scores
(ΣY)₂
Cohen's (1988) Guidelines for Small, Medium, and Large Correlation Coefficients. Correlation Negative Positive Small ____ to ___ ___ to ___ Medium -.30 to -.49 .30 to .49 Large -1.00 to -.50 .50 to 1.00
-.29, -.10, .10, .29
•Cohen's (1988) Guidelines for Small, Medium, and Large Correlation Coefficients. Correlation Negative Positive Small -.29 to -.10 .10 to .29 Medium ___ to ___ , __ to __ Large -1.00 to -.50 .50 to 1.00
-.30, -49, .30, .49
Correlation coefficients may range between __ and __. The closer to 1 (__ or __) the coefficient is, the stronger the relationship; the closer to 0 the coefficient is, the weaker the relationship.
-1,+1,
•Cohen's (1988) Guidelines for Small, Medium, and Large Correlation Coefficients. Correlation Negative Positive Small -.29 to -.10 .10 to .29 Medium -.30 to -.49 .30 to .49 Large ____ to ___, __ to ___
-1.00, -.50, .50, 1.00
T/F? There is no such thing as zero correlation
False
Spearman's rho (rs) is likewise an estimate of the_________ _______rho, but denoted as Ρs
population value
What kind of correlation is this an example of? In other words, the higher a person's childhood IQ, the longer we expect them to live.
positive correlation
Instead of an independent and dependent variable, we refer to X as the __________variable and to Y as the _________ variable
predictor, criterion
To compute the proportion of variance accounted for, we find __ . That is, the square of the Pearson correlation coefficient.
r2
One of the requirements for Pearson's r is that a __________sampling method should be used.
random
if r = .45, r² = ....
.20
Your survey results showed that there was a correlation of +.36 between number of alcoholic drinks consumed per day and number of absences from work per year. Which one(s) of the following sets of arrows depicts that relationship? A. two arrows going up B. two arrows going down C. one arrow going up and one arrow going down.
A and B
the difference between each X score and its associated Y score (X - Y )
D
While r2 = .50 appears to be twice as strong as r = .25, it in fact explains four times as much variation (25% versus 6%). Much of the interpretive power of r therefore lies in the coefficient of determination. We should always bear in mind that if the value of r is lower than ___ the variation in x will explain less than 10% of the variation in y (and vice versa).
.31
The computational formula for the proportion of variance in Y that is not accounted for by a linear relationship with X is _ __
1-r2
A correlation coefficient of 0 indicates no relationship is present
Association
Your survey results showed that there was a correlation of -.24 between employment income and number of absences from work per year. Which one(s) of the following sets of arrows depicts that relationship? A. two arrows going up B. two arrows going down C. one arrow going up and one arrow going down.
C
T/F? A correlation coefficient distinguishes between independent and dependent variables.
False
refers to our ability to draw a straight line through the data points
Linear
These are which formula's requirements? Random sampling Continuous interval or ratio level data Normally distributed variables An absence of significant outliers A linear association between the variables
Pearson's r
a correlation coefficient of +1 or -1 describes a perfectly consistent, maximum strength, linear relationship
Perfect Association
_______ is a logical extension of Pearson's r, going a step further by allowing us to make more specific predictions about values of y when we know values of x.
Regression
Pearson or Spearman? Score on depression index and score on self-esteem index
Spearman's Rho
These are the requirements for... Random sampling Both variables must be at least ordinal The variables must increase monotonically with one another
Spearman's rho
.50 is considered what kind of strength in correlation?
Strong
.85 is considered what kind of strength in correlation?
Strong
To use the regression equation, we first need to determine the slope (_) and the intercept (_).
b,a
•In a linear relationship, as the X scores increase, the Y scores tend to ________ in only one direction
change
The term, _______ is synonymous with relationship
correlation
the descriptive statistic that summarizes and describes the important characteristics of a relationship
correlation coefficient
relatively far from the majority of data points in a scatterplot is referred to as an outlier
data point
Correlation and regression are similar to one another in the sense that they both __________ _____________ between variables.
describe relationships
•In a negative linear relationship, as the X scores___________, the Y scores tend to ________
increase, decrease
In a positive linear relationship, as the X scores ________, the Y scores also tend to________
increase, increase
The Pearson correlation coefficient describes the linear relationship between two _______ variables, two _____ variables, or one (same term) and one (same term) variable.
interval , ratio
The requirements for using Spearman's rho differ from Pearson's r in that the variables do not need to be _______/____ level, nor do they need to be linearly associated. However, they do need to meet the following requirements:
interval/ratio
The stronger the correlation and the larger the sample size, the more likely the coefficient is to be statistically significant. Even relatively weak Pearson's r or Spearman's rho values are likely to be significant if the sample size is reasonably______
large
The sign of the correlation coefficient indicates the direction of a _________ relationship (either positive or negative)
linear
For example, if we found a correlation coefficient of +.35, we would say that there was a__________ _________relationship between IQ and life expectancy.
moderate positive
When a set of x and y values increase or decrease together but never increase and subsequently decrease together, we say that they are
monotonic
inverse correlation is also known as...
negative
Pearson's r cannot detect ___-______ associations, which can be quantified by other measures such as Spearman's rank correlation,
non-linear
However, to be monotonic, scores cannot increase together and then decrease; those are ___-_________ associations, like the one shown by the scatterplot at the far right. In those cases, neither Pearson's r nor Spearman's rho are appropriate to use.
non-monotonic
If you were trying to draw a scatterplot of the relationship between Race and Salary, it would be impossible to determine a best fit line because it would be _____ to suggest, for example, that salary increases as race increases. Race cannot increase or decrease because it is nominal (categorical)
nonsensical
Where ŷ = ...... x = the value of the predictor variable; a = the intercept; b = the slope of the regression line, or its steepness
our predicted value of the response (or "outcome") variable;
Suppose that we used administrative data for 10 randomly drawn freshman students at a medium-sized college where the minimum requirement for admission is a score of 1200. We produced a scatterplot of the association between SAT scores and first-year GPA (Figure 14.11). Notice that as SAT scores rise, so do first year GPA scores. However, there is one notable exception to this pattern: a student who had a somewhat low SAT for that institution (1400) but earned a very high GPA of 3.9. This individual is an ______ in the sense of having a relatively low SAT score but a relatively high GPA.
outlier
correlation coefficients are sensitive to________.
outliers
outliers are extreme scores on a variable that can potentially distort your statistics so that they appear to be much higher or lower than they would otherwise be.
outliers
Pearson's r is a sample-based estimate of the __________ ________ rho, denoted as the Greek symbol Ρ
population value
The Spearman rank-order correlation coefficient describes the linear relationship between two variables measured by_____ _______.
ranked scores
Spearman rank correlation is often described as a form of Pearson correlation in that its calculation is similar, but instead of using raw data values it uses________ _______.
ranked values
Before we actually calculate correlation coefficients, we produce a ____________ in order to get an initial sense of what the association, if any, looks like
scatter plot
a graph that shows the location of each data point formed by a pair of X-Y scores
scatterplot
__________ ________________ refers to the probability that results as large as we observed are due to sampling error, or chance. This statement implies that there exists a degree of uncertainty that any relationships that we observe in a sample actually exist in the population from which the sample was drawn.
statistical significance
The coefficient of determination can be viewed not only as a measure of explained variation, but also as a measure of the _________ of the association. What might appear at first glance to be a relatively strong association has a way of diminishing in magnitude when r is squared. For example: If r=.70, r² = .49 (or 49%) If r =.50, r² = .25 (or 25%) If r = .25, r² = .06 (or 6%)
strength
•The extent to which one value of Y is consistently paired with one and only one value of X
strength of a relationship
You can therefore have relatively strong positive or negative relationships, as well as relatively weak ones. A correlation coefficient of -.35 is equally as _____ as a correlation coefficient of +.35; they differ only in terms of their direction. Think of them in terms of being on a continuum, as shown here in Figure 14.3:
strong
The best fit line indicates not just linearity, but also direction, and a ballpark idea of its strength. The more closely the data points "hug" the best fit line, the________ the association
stronger
the point at which the regression line crosses the y axis when x=0. To calculate the intercept, we use the following formula:
the intercept
Where ŷ = our predicted value of the response (or "outcome") variable; x = the value of the predictor variable; a =____ ________ b = the slope of the regression line, or its steepness
the intercept;
x̅ =
the mean of all x values in the distribution, and
y mean =
the mean of all y values in the distribution
the steepness of the best fit line; it represents the amount of change in y for every unit of change in x. In that sense, the slope is a type of descriptive statistic.
the slope
Where ŷ = our predicted value of the response (or "outcome") variable; x = the value of the predictor variable; a = the intercept; b = _......
the slope of the regression line, or its steepness
b =
the slope, i.e., the steepness of the best fit line.
xy= bx+a Where ŷ = our predicted value of the response (or "outcome") variable; x = ..... a = the intercept; b = the slope of the regression line, or its steepness
the value of the predictor variable;
But in terms of correlation, the word "negative" really means
to go apart
However, the fact there is a relationship between____ ___________does not mean that changes in one variable cause the changes in the other variable
two variables
The remaining percentage (i.e., 100 - r²) is the __________ variation. It means that the variation in y is accounted for by some other variable(s) other than x.
unexplained
The coefficient of determination (r²) refers to the amount of __________ in one variable that is explained or accounted for by the (same term) in another variable
variation
r² is also known as a measure of explained ______
variation
"eyeballing" the association, is aka...
visual estimation
As the variability in the Y scores at each X becomes larger, the relationship becomes _______.
weaker
the sum of the X scores
ΣX
indicates you are to multiply each X score times its associated Y score and then sum the products
ΣXY
the sum of the squared X scores
ΣX₂
the sum of the Y scores
ΣY
the sum of the squared Y scores
ΣY₂
summarizes a relationship by passing through the center of the scatterplot.
regression line
As you probably gathered, the concept of statistical significance does not mean whether or not a particular result is "important," "meaningful," "worth noting," and so on. Those things are known as....
substantive significance
For the same data as the previous question, do we reject or accept the null hypothesis at p<.05?
Reject
Pearson or Spearman? GPA and # of hours worked per week
Pearson's r
a correlation coefficient whose absolute value is less than 1 has less consistency in the Y scores at each value of X and, therefore, more variability among the Y scores at each value of X
Intermediate Association
correlation is not _________
causation
•Correlational analysis requires scores from how many variables?
2
Suppose that you have access to hospital records on patient age and their number of days spent in hospital. Your most appropriate measure of association would likely be ________.
Pearson's r
.32 is considered what kind of strength in correlation?
Moderate
___________ refers to whether or not one set of scores tends to increase or decrease alongside another set
Monotonicity
is the number of pairs of scores in the data
N
The more paid work that parents do, the less sleep that they get.
Negative Correlation
.05 is considered what kind of strength in correlation?
Weak
computation formula for spearman r, where N is the number of pairs of ranks and D is the difference between the two ranks in each____ pair
X-Y
•Usually, each pair of __ scores is from the same participant.
XY
The ________ _____ of the correlation coefficient indicates the strength of the relationship
absolute value
Suppose that we calculated r = .21 (n=15). Do we reject or accept the null hypothesis at p<.01?
accept
