Chapter 4 Notes: Describing Bivariate Data

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Pearson product-moment correlation coefficient

measure of the strength of the linear relationship between two variables referred to as Pearson's correlation or simply as the correlation coefficient If the relationship between the variables is not linear, then the correlation coefficient does not adequately represent the strength of the relationship between the variables. symbol for Pearson's correlation is "ρ" when it is measured in the population and "r" when it is measured in a sample. Because we will be dealing almost exclusively with samples, we will use r to represent Pearson's correlation unless otherwise noted Pearson's r can range from -1 to 1. An r of -1 indicates a perfect negative linear relationship between variables, an r of 0 indicates no linear relationship between variables, and an r of 1 indicates a perfect positive linear relationship between variables. A scatter plot for which r = 0. Notice that there is no relationship between X and Y. With real data, you would not expect to get values of r of exactly -1, 0, or 1. Pearson's correlation is symmetric in the sense that the correlation of X with Y is the same as the correlation of Y with X A critical property of Pearson's r is that it is unaffected by linear transformations. This means that multiplying a variable by a constant and/or adding a constant does not change the correlation of that variable with other variables.

Galileo found that the relationship between release height and distance traveled is..

non-linear...like a parabola

Computation of Pearson's correlation: general

σ^2 (X +/- Y)=σ^2 (X) + σ^2 (Y) +/-2ρσxσy ρ is the correlation between X and Y in the population. s^2 (X +/- Y)=s^2 (X) + s^2 (Y) +/-2rsxsy If the variances and the correlation are computed in a sample,

Scatter plot

A scatter plot of two variables shows the values of one variable on the Y axis and the values of the other variable on the X axis. Scatter plots are well suited for revealing the relationship between two variables.

Bivariate Data

Bivariate data is data for which there are two variables for each observation. As an example, the following bivariate data show the ages of husbands and wives of 10 married couples; consists of two quantitative variables for each individual We can learn much more by displaying the bivariate data in a graphical form that maintains the pairing: scatter plot

Deviation Scores

Scores that are expressed as differences (deviations) from some value, usually the mean. To convert data to deviation scores typically means to subtract the mean score from each other score. Thus, the values 1, 2, and 3 in deviation-score form would be computed by subtracting the mean of 2 from each value and would be -1, 0, 1

Negative Association

There is a negative association between variables X and Y if smaller values of X are associated with larger values of Y and larger values of X are associated with smaller values of Y.

Linear Relationship

There is a perfect linear relationship between two variables if a scatterplot of the points falls on a straight line. The relationship is linear even if the points diverge from the line as long as the divergence is random rather than being systematic. Not all scatter plots show linear relationships. Scatter plots that show linear relationships between variables can differ in several ways including the slope of the line about which they cluster and how tightly the points cluster about the line.

Positive association

There is a positive association between variables X and Y if smaller values of X are associated with smaller values of Y and larger values of X are associated with larger values of Y.

Computation of Pearson's correlation: independent variables

We begin by computing the mean for X and subtracting this mean from all values of X. The new variable is called "x." The variable "y" is computed similarly. (Notice that the means of x and y are both 0) Next we create a new column by multiplying x and y. If there were no relationship between X and Y, then positive values of x would be just as likely to be paired with negative values of y as with positive values. This would make negative values of xy as likely as positive values and the sum would be small. If there is a positive relationship, in which high values of X are associated with high values of Y and low values of X are associated with low values of Y. You can see that positive values of x are associated with positive values of y and negative values of x are associated with negative values of y. In all cases, the product of x and y is positive, resulting in a high total for the xy column. Finally, if there were a negative relationship then positive values of x would be associated with negative values of y and negative values of x would be associated with positive values of y. This would lead to negative values for xy. r=(Σxy)/sqrt(Σx^2*Σy^2) scores will be restricted when only the lower (or upper) half of the students are considered. This leads to a smaller correlation. Selecting the extreme values at both ends of a distribution generally increases the correlation.

Univariate Data

single variable data


Kaugnay na mga set ng pag-aaral

Management of Patients with non-malignant Hematologic disorders

View Set

Chapter 6: The Five Nines Concept

View Set

NURS 3107 - Exam 3 - EAQs: Eye and Ear Assessment

View Set

AHP Chapter 7:11 Digestive System

View Set