Scatterplots Part 2
standardization
allows us to compare correlations between data sets where variables are measured in different units or when variables are different
categorical data
can be added to scatterplots to reveal trends and group the data into categories
categorical data
does not have an order on the x axis- we would end up with different forms
perfect linear relationship
extremes values where r= -1 and r= 1
outlier
falls outside the overall pattern or relationship
weak relationship
for any x, you might get a wide range of y values
r
has no units and does not change when we change the units of measurement of x, y, or both
strength of linear relationship
increases as r moves away from 0 and toward -1 or 1
r less than 0
indicates a negative association
r greater than 0
indicates a positive association
r
is always a number between -1 and 1
correlation r
is it calculated using the mean and the standard deviation of both x and y variables, not resistant to outliers
correlation
makes no distinction between explanatory and response variables, requires both variables to be quantitative
correlation r
measures the strength of the linear relationship between two quantitative variables
values of r
near 0 indicate a very weak linear relationship
strength
of the relationship between the two variables can be seen by how much variation, or scatter, there is around the main form
correlation r
r= 1/n-1
scatterplot
relationships require that both variables be quantitative (the order of data points is defined entirely by their value)
both variables
should be given a similar amount of space: plot roughly square, points should occupy all the plot space (no blank space)
strong relationship
you can get a pretty good estimate of y if you know x