Exam 2

Ace your homework & exams now with Quizwiz!

Correlation

a quantity measuring the extent of interdependence of variable quantities. - measure the linear relationship between X and Y. - We cannot conclude that X causes Y, nor that Y causes

Bivariate Data: Options

‣ Correlation - How strongly and in what way are two variables related? ‣ Regression - Use KNOWN information to predict UNKNOWN information - Mathematically defines a line that best represents the relationship between the two variables ‣ Hypothesis Testing - Information regarding Correlations and/or Regression used to test hypothesis.

Interpreting the Correlation Coefficient

‣ For HUMAN behaviour (because it is so variable), the general guidelines are as follows: - No relationship: < | .10| - weak: |.10-.30| - moderate: |.30-.50| - strong: >|.50| - correlations are bound between -1 and 1

How do we REPORT Y'? And how do we COMPUTE Y'?

we REPORT the EQUATION as: Y'= 3.62 + 0.79(X) we COMPUTE the equation as: Y'= 3.6249 + 0.7917(4)= 6.7917 -> we report Y' as Y'= 6.79

Non-Linear/Curvilinear Relationships

‣ A correlation coefficient based on a LINEAR function underestimates the REAL relationship between variables in a curvilinear relationship. ‣ Computing a correlation without first graphing your data is dangerous as you may end up missing a very real relationship. ‣ Remember - the interocular trauma test (it's worth it)

Spearman rho p or rs

‣ Application - One or both variables are an ordinal scale of measure or - A weak curvilinear relationship in score data or - Heteroscedasticity in score data ‣ Why would we turn score data into ranks? - Converting score data to ranks removes outliers in the data and therefore reduces variability in the data ‣ This is also known as Pearson Correlation on Ranks.

Point Biserial Correlation — rpb

‣ Application - X variable = Nominal Data - Y variable = Score Data (Interval/Ratio) • Apply Pearson r formula: r= cov/ SDx SDy - if randomly sampled and experimental variable manipulated the rpb ‣ Describes the strength of relationship between group membership (X) and behaviour (Y scores). example X: 1 = Control; 2 = Experimental Y: Score on Happiness Test rpb= .92 (try this on your own)

Error in Predictions: Standard Error of Estimate, Defining the Amount of Prediction Error

‣ Conceptual(Y-Y') and "Residuals(leftovers)-> how much error is still there" - The deviation, or distance, between each observed score (YO) from its predicted value on the regression line - Mathematical measure symbolized as:SDy-y' or SDy' - Called: "Standard deviation of prediction errors in linear regression" or simply standard error of estimate or error of prediction

Proportion of Unexplained Variability

- (1-r2)=.50 (Proportion of Unexplained Variability) - Approximately 50% of the P's behaviour (number of hours watching TV) has nothing to do with the P's POODTSE score. There are other, unrelated, unknown reasons for why each P watches TV. ‣ A proportion (0.50) of their TV watching behaviour is unexplained. ‣ (1-r2) provides information about error in prediction ‣ Thus, it is used in an alternate formula to compute standard error of estimate because both forms are a measure of unexplained error!! - This is why we can use as a measure of unexplained error.

Regression: Measuring Prediction in Y'

- Because of the correlation between X and Y, we can measure - How much better we can predict, and... - How much error in prediction still exists ‣ Related to the model: XO=XT + XE Data= Model + Error - what we observe= what we wanna explain + difference we cannot explain

From a Scatterplot we can discern the following information

- Direction: (+/-) - Shape: Linear(straight line) or Curvilinear (non-linear) - Strength: Weak - Moderate - Strong (a subjective measure)

Correlation Coefficient & Scales of Measurement If scale X variable interval/ratio and scale Y variable nominal, ordinal and interval/ratio use.....?

- If scale X variable is interval/ratio and scale Y variable is nominal use: Point Biserial - If scale X variable is interval/ratio and scale Y variable is ordinal use: spearman rho - If scale X variable is interval/ratio and scale Y variable is interval/ratio use: Pearson rxy

Correlation Coefficient & Scales of Measurement If scale X variable ordinal and scale Y variable nominal, ordinal and interval/ratio use.....?

- If scale X variable is ordinal and scale Y variable is nominal use: Spearman rho - If scale X variable is ordinal and scale Y variable is ordinal use: Spearman rho - If scale X variable is ordinal and scale Y variable is Interval/ratio use: Spearman rho

Correlation Coefficient & Scales of Measurement If scale X variable nominal and scale Y variable nominal, ordinal and interval/ratio use.....?

- If scale X variable nominal and scale Y variable nominal then use: Phi - if scale X variable nominal and scale Y variable ordinal use: Spearman rho - If scale X variable nominal and scale Y variable Interval/ratio use: Point biserial

Bivariate Data steps and what graph do you use

- Involves two variables (X & Y) - Graph: Scatterplot or Scatter Diagram of pairs of scores or values ‣ Steps for Computing Descriptive Statistics with Bivariate Data - 1st: Plot Data! - 2nd: Apply descriptive statistics (typically Mean and Standard Deviation) to EACH variable individually. - 3rd: Apply descriptive statistics to both variables together.

From a Correlation Coefficient

- Magnitude: Size (an objective measure based on computed r value) - Direction: (+/- ) - Shape: only if strong linear relationship.

But... is there a "SD of Regression"?

- NO! • We know how many observed Y there are (N), but how many Y' - these are lines tho thus there is infinite N and no way to compute a SD of a regression line

To describe a bivariate relationship, we focus on four different terms

- Shape - Direction - Strength - Magnitude

Regression

- all about prediction. - What does knowing something about X tell us about Y - If I know someone's height, what can I predict about their weight? ‣ To understand regression, we start with a scatterplot and assume... - X & Y are both score data - The relationship between X and Y is linear - The Pearson r describes the relationship best ‣ For now, assume that X is our predictor (IV), & Y is our criterion (DV) - we cant use phi, score bivariate or spearmans rho

- What is the effect on by when is greater than SDx? - What is the effect on by when is greater than SDy?

- as r gets closer to 0 it flattens out-> r= 0 then slope= 0 flat line - if SDx is greater than SDy the slope will be shallow because SDx is greater than SDy (SDx= 0-100, SDy= 0-10) - If SDy is greater than SDx the slope will be steep (more straight up) because SDy is greater than SDx (SDy=0-100, SDx= 0-10)

#5 — Heterogeneous Samples

- by combining 2 heterogenous samples for example boys and girls together you are mixing two subgroups which could make it look like there is no correlation even though there is - two subgroups could also have a weak correlation but when you combine them it becomes a strong correlation that was weak by itself

Regression Lines: 5 Points

1. It represents the mean for bivariate data 2. Need only 3 points to plot both regression lines: To plot Y', use values for (0, ay) and (x mean, y mean) To plot , use values for (ax, 0) and (x mean, y mean) 3. The regression lines intersect at the mean for X and Y 4. If the angle between the two regression lines is small, then the correlation between the two variables is high - if the correlation is r=+/- 1.00 , then the regression lines overlap (perfect correlation) 5. As the value for r approaches 0, the angle between & increases - if the correlation is 0.00, then the regression lines are at a 90° angle

Factors that Contribute to a Large r

1. Relationship is REAL and strong 2. Sampling Error: due to chance/ randomness 3. Unmeasured Third Variable: something else happening 4. Heterogeneous Sample: we've got subgroups - Impact of sub-groups 5. Sampling from a Restricted (truncated) range: meauring from two restricted points/ not the full data= diff correlation - Sampling from one part of a curvilinear relationship may produce a linear relationship even though its a curve!

Factors that Contribute to a Small r

1. Relationship is REAL, but weak. 2. Sampling Error: randomness, due to chance 3. Unmeasured Third Variable. * 4. Non-Linearity: Relationship is curvilinear, correlations only detect linear relationships (Pearson r underestimates this relationship) 5. Heterogeneous samples: Impact of sub-groups 6. Sampling from a Restricted (truncated) range - Sampling from a linear relationship results in a random pattern for the data because of low co-variability * 7. Heteroscedasticity in the data *= unique to small r problem

Regression Line: 9 Attributes

1. The regression line represents... - Bivariate data in a linear relationship - Predicted scores based on observed data 2. Defined by linear equation for a straight line - Theoretical equation: Y= a+ bx - Prediction equation: Y'=ay + by(x) or X'=ax + bx(Y) 3. Two regression lines represent bivariate data - (given X) and (given Y) 4. Does not predict values outside the range of data (cannot extrapolate) 5. Is the... - Best descriptor of bivariate data - "2-Dimensional Centre" for bivariate data - Mean of bivariate data 6. Reflects the Method (or Criterion) of Least Squares - Sum(Y-Y')2= a minimum - Gives the smallest value for error in prediction when based on a mathematically defined line for Y' 7. Always have some "error of prediction" present (unless r=+/- 1.00 ) - closer each line is= higher predictive causation power - Measured as "Standard Error of Estimate" - SDy-y'=SDy(squareroot) 1-r2 8. Is a "Traveling Normal Distribution with a Moving Mean 9. Allows separate measures of SSy, SSy'-ymean, SSy-y' - Total Variability: SSy-ymean=Sum(Y-Ymean)2 - Explained Variability: SSy-ymean= sum(Y-Ymean)2 - Unexplained Variability: SSy-ymean= Sum(Y-Y')2 - SSy=SSy'-ymean+ SSy-y'

Understanding Correlations (steps)

1. plot the data in scatter plot 2. compute univariate statistics (mean, median, mode, (freq-mean or median) AD(Always adds to 0) or MAD SD= sum of squares/ N) of both variables (x and y) 3. You can make a quadrant of where the mean on the Y axis is and the mean on the X axis is to help you see a visual direction of the relationship 4. Compute Bivariate Statistics: In univariate statistics, we computed how much each score deviated from its mean (the deviation score). ‣ In bivariate statistics, we compute the relationship between deviation scores to determine if scores deviate in the same or opposite direction. We computed this as (freq of x - x mean) x (freq of y - y mean)

Proportion of Explained Variability

Applied to our data set 30.0000/30.0000= 15.0429/30.0000 + 14.9585/30.0000 1= 0.5014 + 0.4986 - r2= .50 (proportion of explained variability); alternatively we can represent this as r2(100)= 50% ‣ Because of the strong association between a P's POODTSE score and number of hours P's watch TV talk shows (r=.71 ), we can explain 50% of this TV watching behaviour. ‣ r2 as a measure is easily understood and widely used... BUT...

Components of regression

Components of Regression ‣ For Regression (Conceptually) XO=XT + XE (Y-Ymean)= (Y'-Ymean) + (Y-Y') Total variability= explain variability + unexplained variability - total variability for a single variable= (Y-Ymean) - explain variability (Y'-Ymean) - unexplained variability (Y-Y') -> what still cannot be accounted for ‣ Numerically: 3 units of total variability in Y, made up of 2.38 units of explained variability through prediction, & 0.62 units of unexplained variability Y′

Computing Measures of Proportion of Variability

New terms: r2= proportion of Explained Variability (1-r2)= proportion of unexplained variability - SST/SST= SSR/SST + SSE/SST - sum(Y-Ymean)2/Sum(Y-Ymean)2= sum(Y'-Ymean)2/sum(Y'-Ymean)2 + Sum(Y-Y')2/Sum(Y-Y')2 1= r2 + (1-r2) - Total proportion= Explained Proportion + Unexplained Proportion

Intercept (a)

Raw Score Regression Coefficient (a) ‣ The intercept is also known as the constant in the equation. -we know that the point will go through mean of x and y - count backwards from slope and bring x all the way back to 0 ‣ It adjusts the placement of the line along the axis relative to the slope(b) and mean of each variable. ‣ When using X to predict Y (computing intercept for Y') -ay is the point where the regression line intercepts the Y-axis: ay= Mean y-by(xmean) - When using Y to predict X (computing intercept for ) -ax is the point where the regression line intercepts the X-axis: ax= (x mean) - bx(Y mean)

Slope (b)

Raw Score Regression Coefficient (b) ‣ A constant value that changes proportionally across values of X and Y ‣ For any given pair of predicted values... b= y/x= rise/run - rise measures changes in rise and run measures changes in x - its constant and remains in our equation but changes to the values we attach to it Influence by amount of variability in data - Angle of slope is adjusted by degree of variability in data ‣ Values (when computed from raw scores) - Can be positive or negative/ depending on what two points you pick - Magnitude reflects values in data set - Heteroscadasticity in data "flattens" angle of the slope

Three Measures of Variability

Start with -> Sum(Y-Ymean)2= Sum(Y'-Ymean)2 + Sum(Y-Y')2 - SST (SSy) becomes SDy (the standard deviation of Y) - SSE (SSy-y') becomes SDy-y' (the standard error of prediction for Y') -When r=0, then SDy-y'= SDy, when r= +/- 1, Then SDy-y'=0 - already know how to compute SDy and SDy-y'

Steps to Defining the Regression Line

Steps to Defining the Regression Line ‣ For defining Y'=ay + byX ay= intercept (a= axis a crosses the axis), by=slope 1. Plot your data!; Compute your mean and SD for each variable 2. Computer Pearson r(based on SD and covariance) 3. Compute the slope (b): by= r SDy/SDx 4. Compute the intercept(a): ay= Y(mean)- byX(mean) 5. Use values ay and by and given value for X to find Y' value 6. Use two sets of points to plot the regression line, Y' - for defining X'= ax + bx(Y) repeat steps 3-6 substituting for X and Y

Covariance

The mean value of the product of the deviations of two variables from their respective means. - if the covariance is 0 then there is no correlation between two variables

Defining Relationships

The regression line reflects a mathematical, linear relationship between two variables. - linear= Y= a + bX - a= intercept and b= slope quadratic slope= curve

Formulae for Slope

We can't use pairs of points from the data sets because of the variability in scores (we could, however, if r=+/- 1.00) - if r=1.00 then perfect correlation no outliers, correlation is 1.00 so slope is 1.00 - if correlation= 0 then flat ‣ We can, however, use a measure in the change of scores (SD) ‣ When using X to predict Y (computing slope for y') by= r(SDy/SDx) ‣ When using Y to predict X (computing slope for ) bx=r(SDX/SDy)

The regression line is always gonna go through what

the mean of x and the mean of y

Correlations and Test Reliability

‣ For test reliability (e.g., Test/Retest; inter-rater; internal consistency), we need much higher values for r. - they are observing the same thing - if have a score like -.7 then measuring opposites ex. depression and happiness-> want positive correlations - Very Desirable: r= .89, .93, .96 Poor Reliability: r= < .70 ‣ Technically, we would not want negative correlations in this case! If the test and retest are taken at the beginning and at the end of the semester, it can be assumed that the intervening lessons will have improved the ability of the students (positive skew)

Computing a Universal Measurement of covariance

‣ Fortunately, we already have a measure that accounts for both the scale magnitude and the units of the scale. ‣ The standard deviation is in meaningful units, and is in the proper scale; e.g., the SD for GST is 1.10 units on the GST scale, and the SD for SSS is 2.00 units on the SSS scale ‣ Thus, to turn our measure of covariability into a UNIVERSAL measure, we can divide it by the product of the standard deviations. ‣ This is our Pearson r r= Cov/ SDx SDy - then its no longer in units - standardized constrained units between -1 and 1 - transformed to universal scale, standard interpretation

#7 — Heteroscedasticity + Homoscedasticity

‣ Homoscedasticity (A Good Thing) - Variability of Y scores remains constant across changing values of X (as we go from one X score to another) r = + .93 ‣ Heteroscedasticity (Not a Good Thing) - Variability of Y scores changes across increasing values of X - increases SD, when SD increases r goes down - caused by a skew in one or both variables - variance across a variable ex. as it goes from 1 to 5 the variable becomes more and more spred - High value for SD in variable with skew r = + .67

Relationship Between r, a, and b

‣ If r(correlation)=0 for Y'(regression), by(slope)=0 - regression line is parallel to X-axis, ay(y intercept)= Y(mean) ‣ If for r(correlation)=0 for X'(intercept), bx(slope)=0 - regression line is parallel to Y-axis, ax(x intercept)= X(mean) - if correlation is 0 the lines go straight across the graph starting from the mean ‣ As correlation (r) increases, the value for b(slope) increases - If b is +, a moves below the mean and towards the origin (0, 0) - If b is -, a moves above the mean and away from the origin.

Properties of Covariance

‣ If the Cross Products (Covariance) vary consistently in the same direction across Ps, then the average amount of covariability is positive (+ve). ‣ If some pairs vary in the same direction, but some vary in the the opposite direction, the the value for covariance is close to zero (0). ‣ If the Cross Products vary consistently in the opposite direction across Ps, then the average amount of covariability is negative (-ve) ‣ The magnitude (size) of covariance depends on the the units of measurement, so is difficult to compare across studies ‣ Furthermore, covariance is not in any meaningful units as it is based on the product of two different scales (such as GST and SSS) - We need to remove the units for universal meaning & interpretation - anything correlated with itself will always equal 1

Sample Size and Size of r

‣ Incorrect Assumption - The larger the sample size, the larger the value of r ‣ FACT: Size of N (or sample size n) is irrelevant to the magnitude of r/ no relationship between correlation and sample size - sample size just tells us how close we are to the true population - Example: (for grade in Ψ300A & Ψ300B) n= how close you are to the underlying true population • Randomly sample n=10 grades & compute r; repeat 4 times • Randomly sample n=20 grades & compute r; repeat 4 times - bigger n just makes ur range bigger cus more scores

Converting to Ranks

‣ Is this a good thing to do? - It reduces the skew in the distribution (which is good) but it is a problem because data that are ranks are less sophisticated. - Ranks only tell you the order in which Ps score, but does not tell you the magnitude of difference among the values. - if tied ranks then give them a rank of ex. 3.5

Pearson Product-Moment Correlation Coefficient

‣ Known as the "Pearson r" ‣ It is symbolized as rxy or more commonly as r ‣ Application - Bivariate data - Linear relationship between variables - Both variables measured on Interval or Ratio scales • The Pearson r only applies to score data-> rank would be like nominal ex. how many people ranked cats as their fave

What Are We Doing in Regression?

‣ Mathematically defining a line that best represents bivariate data in a linear relationship. ‣ This line is called the Regression Line: (y' or x') ‣ We define this line by adapting a theoretical equation that defines a linear relationship: Y'= ay + byX - How do we use this equation? - Define the two regression coefficients (a, b), then with a given value for X, solve the equation for - Identify two pairs of data points that allow accurate placement of regression line on scatter plot.

Predicting Y Given X

‣ Objective - Apply equation to define the linear relationship • Must compute two regression coefficients (slope and intercept) - y' is a predictor - Define a line that best describes bivariate data: the 'best-fitting line' or regression line - sum of (y-y')2 is a minimum, cant get any other line that will give you a smaller number • is a minimum: Minimizes errors of prediction (deviations around the regression line) - Note: We can only predict scores within our observed range of data-> dont know what the data does outside our line • Interpolation, not Extrapolation(guessing what goes in our minimum and maximum)

How to write a Formal Report

‣ One way of reporting... For the five, 18 year old participants, there was a strong, negative relationship,r= -.64 , between a student's Galton Studmuffin Test (GST) score (x mean= 3.00, SD=1.10) and scores on the Studmuffin Success Scale (SSS) (Y mean= 7.00, SD= 2.00). As scores on the GST increased, scores on the SSS decreased.

#4 — Non-Linearity

‣ Pearson r measures linear relationships ‣ Pearson r underestimates the true relationship between variables when data form a curvilinear relationship ‣ For curvilinear relationship, use eta (n) -> applies to moderate to strong curve linear relationships n = rlinear + rquadratic + rcubic .78 = .06 + .71 + .01 ‣ For the data to the right, r = + .12 which is a weak correlation, but obviously, there is a fairly strong pattern to the data

The Problems with Indices

‣ SSR is a squared value (about 15.04 units) of explained variability ‣ Problem!!! - Indices are not measures - Indices give no useful information ‣ Solution!! - Convert each SS term into a meaningful measure.

Other Applications of Pearson Correlation

‣ So far, we have only considered correlations for score (interval/ratio) data ‣ What about the other scales of measurement (nominal, ordinal)? - What happens if X is interval, and Y is ordinal? - What happens if both X and Y are nominal? ‣ Same formula applied, but unique name given to the correlation ‣ The symbol for r is adapted to reflect type of data ‣ Four most frequently reported correlations: Pearson r Point Biserial Spearman rho Phi

Variability Around the Regression Line Homoscedasticity & Heteroscedasticity in Regression

‣ Standard Error of Estimate - Provides objective support to subjective visual inspection of the data - When homoscedasticity is present... • Value for SDy is larger than the value for SDy-y' ‣ When heteroscedasticity is present... - Value for SDy is about the same as SDy-y' [what happens to r?] our correlation is effected, SD is bad - There are options to fix heteroscedasticity (such a ranking data), but these limit our option to perform regression. if fix hetero-> loose regression - r=cov/SDxSDY -> denominator becomes larger and the slope decreases getting closer to the mean and loosing predictability

Sum of Products (SP) and Covariance (Cov)

‣ The Sum of Products is computed as SP= sum of (x minus x mean) times( y minus y mean) - The SP is an index of covariability. • "What was the pattern for how scores deviated (covaried) from each mean" • Did each Person deviate in the SAME direction or the OPPOSITE direction? ‣ The Covariance is simply computed as SP/N - The Cov is a measure of covariability • "what is the average extent for which scores on two variables correspondingly vary or COVARY from their respective means across the entire group of scores."

The Pearson r

‣ The numerator of the Pearson r measures the degree that two variables (X & Y) COVARY. ‣ The denominator adjusts this covariability by the amount of variability in each variable (SDx & SDy ) ‣ The Pearson r transforms covariance to an universal scale - It gives information about the relationship between the variables regardless of the units of measure for each variable. r= cov/ SDx SDy= -1.4000/1.0954x2.0000= -0.6390= -.64

The Spearman Sum of Differences Formula

‣ The original formulation for the Spearman rho is as follows: rS = 1-6(ΣD2)/N(N2 − 1) ->where D is the difference between the two ranks. ‣ This formula, however, only works if the original data are integers (ranks) and there are no tied ranks in the data. That is, every datapoint is unique. - its not precise, and very uncommon ‣ Consequently, it is advisable to use the Pearson formula when computing Spearman rho as it is more accurate. - If there are no tied ranks, and the data are integers, then both formulae give the exact same answer.

Regression Line as a Minimum

‣ The value for sum of (Y-Y')2 , computed from a mathematically defined regression line, is smaller that a value computed from any other randomly placed line. ‣ Called "Least Squares Regression Line" or "Best Fitting Line" because it minimizes the errors associated with prediction. - No other line produces a least-squares minimum ‣ We will define the regression line by "solving" (finding the values) for two regression coefficients, the intercept (a), and the slope (b). - red line= y' (mathimatically defined line) and it is a minimum NO line you can define will be smaller thus it is our best predictor - figure out its intercept and slope

Step 6: Plot the Regression Line

‣ To plot the regression line, we need two points fortunately, we already have these points. ‣ The first point we have is the intercept. - For Y', it is the Y-intercept, (0, ay) - For X', it is the X-intercept, (ax,0) ‣ The second point we have is the mean of X and Y ‣ The regression lines for and will ALWAYS pass through the means of X and Y, that is... (X(mean), Y(mean) ‣ Remember, the regression lines are limited to the range of our data. - if the correlation is perfect then X will perfectly predict Y and Y will perfectly predict X

Variability in Regression — Sum of Squares

‣ Variability is expressed mathematically as SS (indices(listing) of variability). SST=SSR + SSE Sum(Y-Ymean)2= Sum(Y'-Ymean)2 + Sum(Y-Y')2 - total variability: SST or SSY= Sum(Y-Ymean)2 - Explained Variability: SSR or SSY'-Y mean= Sum (Y'-Ymean)2 - Unexplained variability: SSE or SSy-y'= Sum(Y-Y')2 - Remember it is variability or variation, NOT variance! - SS are indicies not measures

Sample Size Summary

‣ When computing descriptive measures on a data set, size of N does not directly influence the magnitude of Pearson r (a measure of the strength of the linear relationship between two variables) ‣ When using a sample of data from a population of values and applying descriptive measures, the size of sample, n, does not directly influence the magnitude of Pearson r, as long as the sample was randomly sampled from the population. ‣ Only in hypothesis testing, a test of how meaningful the value of Pearson r is, will the size of N be important. ‣ When is the size of N directly relevant to r? - In hypothesis testing of sample r versus population p

Measuring Prediction in Y' when r doesnt =0

‣ When r doesnt = 0 (i.e., r<0 or r>0 ) - Values on X are useful for predicting Y - Best predictor of Y: the regression line Y' — i.e., Y' doesnt = Y(mean) - Different scores on X predict unique Y-values. - We still make errors when predicting (unless r= +/- 1 ) - Prediction is more accurate because we use data from two sources: (X & Y) - SDy-y' will be smaller than SDy - Measure of prediction of error:

Measuring Prediction in Y' when r=0

‣ When r=0 - Values on X not useful for predicting Y.-> y'=y(mean)= all error - x is NOT a predictor when r=0-> there is no correlation at all between the two variables - Best predictor of Y: mean of the Y variable — i.e., Y'=Y(mean) thus Large error possible in prediction - Different scores on X predict the same Y-values - Measure of prediction error: SDy-y'=SDy

#6 — Truncated Range

‣ When we restrict (truncate) a range of data, we reduce the measure of covariance because of the low co-variability among the scores ‣ For example, let's assume that the true correlation between SAT Scores and IQ is r = + .93 ‣ If we restrict our range such that we only look at IQ scores greater than 130, then our correlation between SAT Scores and IQ is only: r = + .15 - If we look at the scatterplot to the right, the data look almost random

Predicted Variability

‣ r : A standardized measure of association between X and Y that varies between -1 and +1 ‣ r2: The proportion of variability in Y accounted for by variability in X - It is a ratio (proportion) that varies between 0 and +1, so we can make comparisons between studies make comparisons between studies ‣ r2(100): Percent of variability in Y accounted for by variability in X - Often, the (100) is dropped, and a percent symbol (%) used instead. ‣ APA Journals - reported as percentage: r2=98% - reported as proportion: r2=.98


Related study sets

Canadian Health Care Delivery System

View Set

Chapter 9 Thermo, Nader, Helseth

View Set

Respiratory System - Post Lab Quiz

View Set

Texas life insurance exam part 3

View Set

Reading Comprehension: Pobre Pablo [DEPORTES]

View Set