Correlation

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Kendall Rank Correlation

A nonparametric test that does not make any assumptions about the distributions

Pearson Correlation Coefficient (r)

1. A measure of the sign and strength of a linear association between two variables 2. Indicates how far away the data points fall of a best fit line describing relationship between variables

Covariance - How to

1. Calculate deviations between each score for first variable (x) and their mean 2. Calculate deviations between each score for second variable (y) and their mean 3. Multiply these two deviation values to calculate the cross product deviations 4. Cov (x, y) = (∑(x_i - xhat)(y_i-yhat))/(N-1)

R^2

1. Coefficient of Determination - the proportion of total variance accounted for by the regression model 2. Ranges from 0 (none) to 1 (all) 3. R^2 = SS_M / SS_T

Direction of Causality

1. Correlation coefficients say nothing about which variable causes the other to change 2. Determining cause-effect relationships requires controlled experiments

What is a correlation?

1. Correlation refers to departure of two variables from independence 2. Quantifies extent to which two variables are related 3. Measures concurrent changes in the variables 4. Specifically, it refers to several types of relationships between variable values

Correlation: Scaled Covariance

1. Covariance of X and Y divided by S.D. in X and S.D. in Y 2. Quantifies intensity associated between two variables 3. r = Covariance / Sqrt((Variance X)(Variance Y)) 4. r = (∑(X_i - Xhat)(Y_i - Yhat)) / Sqrt(∑(X_i - X)^2∑(Y_i - Y)^2)

Measuring the Pearson correlation

1. Determine whether as one variable increases, the other one increases, decreases or stays the same 2. This is done by calculating the covariance a) Determine how much each observation (x, y) deviate from the mean b) If both variables deviate from their means by similar amounts, they are correlated

Measuring Correlations

1. Deviations: Observed Value - Group mean 2. Positive/Negative 3. Sum (deviations) = 0

F Test

1. F statistic = ratio of model MS (regression variance) divided by residual MS (error variance) 2. Ranges from 0 to a very large number a) The larger the F value, the stronger the model 4. F = MS_M / MS_R

The third-variable problem

1. In any correlation, causality between two variables cannot be assumed because there may be other measured or unmeasured variables affecting the results a) Causality between any two variables cannot be assumed because there may be other variables affecting the results b) Partial correlations with other variables c) Beware of unmeasured variables

Which nonparametric correlation to use...

1. Kendall's Tau works better with smaller samples 2. Spearman's Rank works better with ties in data

Describing a straight line

1. Model uses linear relationship between X and Y 2. Y_i = b_0 + b_1X_i + Error a) Error - unexplained portion of the variation ~N(mu, sigma) b) b_i - Gradient (slope) of regression line, direction/slope of relationship, regression coefficient for predictor variable c) b_0 - intercept (value of Y when X = 0), point where regression line crosses Y axis

What is simple linear regression?

1. Most simple version of regression: a) One independent variable b) Linear relationship 2. Tests hypothetical models of a linear relationship between two variables: a) Dependent (outcome): Y axis b) Independent (driver): X axis

Do the two variables have to be measured in the same units?

1. No, the two variables can be measured in entirely different units 2. The calculations for Pearson's correlation coefficient are designed so the units of measurement do not affect the calculation

Can you use any type of variable for Pearson's correlation coefficient?

1. No, the two variables have to be measured on either an interval or ratio scale 2. Both variables don't need to be measured on the same scale

Correlation vs. Causation

1. One of the most common errors in statistics 2. Changing one variable can change another one, or they are both correlated with another 3. If one change causes another, then they are correlated a) Just because two events occur together does not mean one caused the other

Three-Way Correlations

1. Partial correlation: a) Measures relationship between two variables,controlling for the effect that a third variable has on both of them 2. Semi-partial correlation b) measures relationship between two variables controlling for the effect that a third variable has on only one of them

Correlation vs. Regression

1. Pearson correlation is bidirectional, regression is not 2. Pearson correlation does not measure the slope of the best fit line, regression does

Variance

1. Quantifies how single variable scores deviate from mean 2. s^2 = 1 / (n-1) ∑ (y_i - Yhat)^2 3. Sum of squared deviations from mean / degrees of freedom 4. Based on the sum of squares

Linear Regression - Assumptions

1. Random sampling 2. Variables are either interval or ratio measurements 3. Normally distributed 4. Linear relationship between the two variables

Pearson correlation assumptions

1. Random sampling 2. Variables either interval or ratio measurements 3. Variables are normally distributed 4. The two distributions have equal variances 5. There is a linear relationship between the two variables

How good is the fit of the model?

1. Regression line is based on observations a) Sum of Squares: Sum of the squared deviations (both positive and negative) b) Mean Square: Sum of squares divided by the degrees of freedom

Nonparametric Correlation -Hints

1. Relationship between variables is not linear 2. Only one coefficient: sign / strength of association 3. Coefficient must always be in the range from -1 to 1

Nonparametric Correlations

1. Spearman's Rho (Pearson's correlation on ranked data) 2. Kendall's Tau (better than Spearman's for small samples)

Pearson Correlation - Hints

1. The Pearson correlation does not take into consideration whether a variable is classified as dependent or independent variable 2. The Pearson correlation treats all variables equally (y and x axis)

Interpreting of Pearson Correlation

1. The correlation coefficient (r) indicates the strength and the direction of a linear relationship between two random variables a) -1 < r < 1 2. The coefficient of determination (r^2) indicates the % of the variance in one variable that is explained by the other b) 0 < r^2 < 1

What is a regression?

1. The generic term regression refers to methods that allow the prediction of the value of one (dependent) variable for another (independent) 2. Regression methods are based on various models of the relationship between two variables

Calculating Slope of Best - Fit Line

1. The regression coefficient = slope of best-fit line 2. Covariance between X and Y divided by the variance in X 3. Quantifies best-fit slope of line relating X and Y variables a) Y = a + bX + e 4. b = [Covariance = (∑(X_i - Xhat)(Y_i - Yhat))/(n - 1)] / [Variance = (∑(X_i - Xhat)(X_i - Xhat)) / (n-1)]

Covariance vs. Variance

1. The variance is used to assess variability in one variable 2. Covariance is used to quantify variability shared by two variables

Kendall Correlation - Interpretation

Coefficient must be in the range -1 <= Tau <= 1 a) If the agreement between the two ranking is perfect (two rankings equal), Tau = +1 b) If the disagreement between two rankings is perfect (one ranking is the reverse of the other), Tau = -1 c) If X and Y are independent, Tau = 0

What happens with tied pairs in Kendall's Correlation?

Discarded

SS_M

Model Sum Squares - squared difference between predicted Y values from regression model and mean of Y data Sum Squares: ∑(Y predicted - Y mean)^2 Mean Squared: Sum Squares / 1 Degrees of Freedom: 1 (linear model)

Calculating the covariance

Quantifies how scores of two variables differ from their respective means

SS_R

Residual (Error) Sum Squares - squared difference between predicted Y values from regression model and observed Y data Sum Squares: ∑(Y_i - Y predicted)^2 Mean Squared: Sum Squares / d.f. Degrees of Freedom: sample size - 2

Kendall Correlation - Formulation

Tau = (number of concordant pairs) - (number of discordant pairs) / .5n(n-1) Where: a) concordant pairs have the same relative rankings b) discordant pairs have different relative rankings c) n is the number of observations d) and the total number of pairs comapred = .5n(n-1)

SS_T

Total Sum Squares - squared difference between observed Y values and their mean calculated from the data Sum Squares: ∑(Y_i - Y mean)^2 Mean Squared: Sum Squares / d.f. Degrees of Freedom: sample size - 1

Mean

Xbar = (∑ x_i)/n

Covariance (r)

r = Cov (x, y) / s_xs_y = (∑(x_i - xhat)(y_i - yhat))/((N-1)s_xs_y)


Ensembles d'études connexes

Pharmacology 5 ati proctored peds

View Set

COSC 3360 - Quiz 3 (Memory Management, 7 & Virtual Memory, 7)

View Set

Sample NCLEX Style Questions: LVN - RN Transition Midterm

View Set

Astronomy Chapter 4 (Motion, Energy, Gravity)

View Set

Basic Networking and Communications

View Set

Integumentary Pre-Lab/Tissue Review

View Set