CORRELATION

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

The degree of the relationship

A correlation also measures the "strength" of the relationship between X and Y. A correlation will have a value between -1 and +1. A correlation of 0 means that there is no relationship. A +1 means that there is a positive "perfect correlation" between two, and a -1 means that there is a negative perfect correlation.

Linear Equation

b: slope: how much Y changes when X increases by 1 point a: Y-intercept: value of y when X=0.

Outliers

Extreme datapoints - outliers - can have a dramatic effect on the value of a correlation.

Phi Coefficient

Measures the association between two nominal variables. Measure of effect size for chi-square Can only be used if both nominal variables only have two categories.

Interpreting Pearson

Now that we know how to compute a correlation, we need to consider how we interpret it. We already know the basics: 1) The direction of the relationship - positive or negative 2) The form of the relationship - linear or non-linear 3) The degree of the relationship - the "strength" of the relationship But there are some additional things that we need to consider. 4) Correlations describe a relationship between two variables, but DOES NOT explain why the variables are related 5) Correlations are greatly affected by the range of scores in the data 6) Extreme scores can have dramatic effects on correlations 7) When considering "how good" a relationship is, we really should consider r2, not just r.

Corr. and causation

One of the most common errors made in interpreting correlations is to assume that a correlation necessarily implies a cause-and-effect relationship between the two variables. Simply stated: Correlation is NOT Causation!

Corr. and restricted range

Restricted range for correlation- It refers to the fact that the value of correlation depends on the range of your variables. e.g., Correlation between age and heights; It's a strong positive correlation when your subjects are aged 5-15 (Children grow constantly), but the correlation will be much weaker if you include adults as your subjects (Adults' heights don't increase)

The sum of the products of deviations

SP = this is the definitional formula in other words, we figure out the average values for X and Y. Then we figure out, for each point, how far away from these means each point is, then multiply the X and Y deviations, and then add them all up. The sum of products, which is used to measure the variability shared between two variables. Note that the name is short for the sum of the products of corresponding deviation scores for two variables. To calculate the SP, you first determine the deviation scores for each X and for each Y, then you calculate the products of each pair of deviation scores, and then (last) you sum the products.

SPEARMAN

Spearman's correlation coefficient is a statistical measure of the strength of a monotonic relationship between paired data. In a sample it is denoted by and is by design constrained as follows: -1<= rs<=1 Two situations: (1) When one or both variables are measured on an ordinal scale. (2) When a researcher has ratio/interval data, knows that the correlation between the two variables in probably not linear, but wants to measure the consistency in their relationship.

Point-Biserial Correlation

When one variable is dichotomous and the other is interval/ratio A special version of a Pearson correlation. This situation is very similar to a t-test. t-test: dichotomous IV, interval/ratio DV Effect size = r2

hypothesis

Two-tailed: H0: r = 0, there is no relationship between X&Y H1: r not equal to 0

The least square solution

We are looking for a linear equation of the form: Ŷ = bX + a For each value of X in the data, this equation determines the point on the line (Ŷ) that gives the best prediction of Y. Find specific values for a and b that make this the best fitting line.

the coefficient of determination

When considering "how good" a relationship is, we really should consider r2, not just r. What it basically measures is how much of the variability in one variable can be determined by the other variable. In other words, suppose that we find that the correlation (r) between height and weight is 0.76. We can use this information to predict a person's weight, if we know their height. But, notice that the correlation is not perfect, so we know that we may be off by a bit. But we also know that we'll be close. The r2 for this relationship is (0.76)2 = .578. What we can conclude from this is that 57.8% of the variability in weight can be accounted for from the relationship that it has with height. notice that if we do have a perfect correlation (r = ą 1.0), then r2 = 1.02 = 1.0. So 100% of the variance in Y can be accounted for by X. r2 = .01 = small effect = no confidence r2 = .09 = medium effect = somewhat confident r2 = .25 = large effect = very confident

Correlation

a statistical technique used to measure & describe a relationship between 2 variables. One main purpose: prediction. Observe what occurs naturally.

Prediction

if we know that two variables are strongly related, then we may be able to predict the value of one, based on the value of the other. e.g., if you know that ultrasound measurements of a baby's head are positively correlated with birth weight, then you can make an educated guess of the baby's birth weight by measuring the baby's head from an ultrasound

Pearson correlation coefficient

is a measure of the strength of a linear association between two variables and is denoted by r. Basically, a Pearson correlation attempts to draw a line of best fit through the data of two variables, and the Pearson correlation coefficient, r, indicates how far away all these data points are to this line of best fit (i.e., how well the data points fit this new model/line of best fit). Product of the z-scores determines the strength and direction of the correlation

Theory Verification

many theories will predict that a relationship exists between different variables. So you can then go out, collect some data, and see if such a relationship exists.

Regression

stats technique to find the best fitting straight line for a set of data

negative correlation:

the two variables tend to move in the opposite directions. That is, as one gets larger, the other gets smaller.

positive correlation:

the two variables tend to move in the same direction. That is, as one gets larger, so does the other.


Ensembles d'études connexes

Economics - 8th - Chapter 6 - Section 1 - Combining Supply and Demand

View Set

Health and Accident Insurance Ch. 1

View Set