Ch. 3-Scatterplots and Correlation
What should you do to examine a scatterplot?
1. Look for the overall pattern and for striking deviations from that pattern, overall pattern of a scatterplot can be seen through direction, form and strength of the relationship, one important deviation is the outlier
How should you draw the axes of the scatterplots?
Both axes should be given the same emphasis, resulting in plots that are square rather than rectangular in shape. When creating or studying a scatterplot, keep in mind that our eyes can be fooled by changing the plotting scales or the amount of white space around the cloud of points.
How can you describe the overall pattern of a scatterplot?
By its direction, form, and strength of the relationship
How is correlation of averaged data different from correlation of raw individual data points?
Correlation of average are stronger than correlation of raw individual data points because these averages values mask some of the individual to individual variations that make up scatter in the scatterplot.
If there is no explanatory-response distinction, where do variables go?
Either variable can go on the x-axis
How is the strength determined of a relationship?
How closely do the points follow a clear form.
When is a linear relation strong and weak?
If the points lie close to a straight line, it is strong. If it is widely scattered about a line, then it is weak.
How to find correlation?
If we have data on variables x/y for n individuals. The values for the first individual are x1/y1, the values for the second individual are x2/y2 and so on.
What does correlation not able to distinguish?
It can not distinguish between explanatory and response variables. It makes no difference which variable you call x and which you call y in calculating the correlation
What is a linear relationship?
It follows a straight line.
Even after the relationship between the variables is established as linear, what you should you keep in mind?
It is not a complete summary of two-variable data. You should give means, sd of both x/y along with correlation.
What does a positive r mean? Negative r?
It means there is a positive association between the variables. Negative r means there is a negative association.
What does correlation do and not do?
It only measures strength of the linear relationship between two variables. It does not describe curved relationship between variables, no matter how strong they are.
How is correlation similar to mean and standard deviation?
Like the other mean/standard deviation, it is not resistant, it is strongly affected by a few outlying observations. Be cautious of using r when they are outliers.
What should you look for in any graph of data?
Look for overall pattern and striking deviations from that pattern.
What happens to r if we chance the measurements of x,y, or both for correlation?
R does not change if we chance the units of measurement. R itself has no unit of measurement, it's just a number
What values does r take? What do they mean?
R is always between -1 and 1. R=0 means very weak relationship. Values close to -1 or 1 mean points lie close to a straight line. -1/1 exactly occur only on the case of a perfect linear relationship
What is the most useful graph for displaying the relationship between two quantitative variables?
Scatterplot
What is the goal of many studies?
Show that changes in one or more explanatory variables actually causes changes in a response variable.
Scatterplot
Shows the relationship between two quantitative variables measured on the same individuals. The values of one variable appear on the horizontal axis, and the values of the other appear on the vertical axis. Each individual in the data appears as the point in the plot fixed by the values of both variables for that individual.
Where should explanatory variable be plotted for the scatterplot?
The x-axis
What do scatterplots display?
They display direction, form, and strength between two quantitative variables.
How should you create scatterplots? (what should you avoid)
They should focus on relationships, and avoid extra blank space
When are two variables positively associated?
Two variables are positively associated when above-average values of one tend to accompany above-average values of the other, and below average values tend to occur together
When are two variables negatively associated?
When above-average values of one tend to accompany below-average values of the other and vice versa/
What does the formula for correlation help us see?
When r is positive, there is a positive association between the variables, when r is negative there is a negative association.
How do you add a categorical variable to a scatterplot?
You can use a different plot color or symbol for each category. Doing so lets us examine visually the effect of a third variable on the relationship between x/y.
In order to compute correlation, what kind kind of variables do you need?
You need to have quantitative variables in order to do arithmetic of the formula for r.
Explanatory Variable
may explain or influence changes in a response variable (independent variable)
Response Variable
measures an outcome of a study (dependent variable)
Correlation (r)
the measure of the direction and strength of the linear relationship between two quantitative variables. It is written with r.