211 Test 4

Ace your homework & exams now with Quizwiz!

Correlation coefficient example -3 people: Person K, Zx = 0.5, Zy = -0.7 Person L, Zx = -1.4, Zy = -0.8 Person M, Zx = 0.9, Zy = 1.5

r = SIGMA (Zx)(Zy) / N 1. Get our cross-products. (0.5)(-0.7), (-1.4)(-0.8), (0.9)(1.5) 2. Add those cross-products up. 3. Divide by number of participants we have. -We see the lowest X went with the lowest Y and the highest X went with the highest Y. Middles went together. We should see some correlation. r = 0.71 Pretty high, approaching the +1 area. If you were to graph the data, we would see the dots going together a little bit.

Prediction/regression

- How psychologists construct mathematical rules to predict people's scores on one variable from knowledge of their score on another variable -Example: Predicting college grades from SAT scores. Universities use standardized testing like SAT to predict how a student is going to perform in college. -You can use the information provided by r to predict values of one factor, given known values of a second factor -Prediction is also called regression [used interchangeably].

Effect size for the chi-square test for independence: phi coefficient (φ)

-2x2 contingency table -Square root of the result dividing the sample's chi-square by the total number of people in the sample φ = square root X^2/N small φ = 0.10 medium φ = 0.30 large φ = 0.50

The Yerkes-Dodson Law

-A common way we see a curvilinear correlation. A real life example of a curvilinear correlation. -Looks at a person's level of arousal and their performance on a task. -Can extend this when thinking about a person's level of stress and their level of performance as well. -Example: In a stats exam... -If you did not care a whole lot, you did not pay much attention, you were not interested in the subject, your likelihood to do well on the exam is low (low arousal goes with low performance). -If you did care and you wanted a good grade, you were more intentional and more interested in your studies, you will likely have a higher performance rating (moderated/medium arousal goes with peak performance). -HOWEVER, the more and more stressed you get, the greater potential your performance would be impaired. -When a person is lower stressed or lower arousal, their performance typically is weaker (not paying much attention, not interested) as they become more stressed/aroused, we see that is when they peak at their optimal level of performance (attention has increased, interest has increased, motivation for performing well has increased). However, once we get too aroused or have too high of stress about performance we then start to perform more poorly because of the experience of anxiety.

Interpreting a correlation and correlation recap and regression lines

-A correlation is strong and positive if highs on one variable go with highs on the other, and lows with lows -A correlation is strong and negative if lows go with highs, and highs with lows -There is no correlation if sometimes highs go with highs and sometimes with lows x x x x x x x x x x x x x x x x x x x x x x x x x x x -Correlation coefficient describes the strength and direction of the relationship between 2 factors -Correlation coefficient (r) ranges from -1 to +1 -Closer to +/- 1, the stronger the correlation Sign (+ or -) indicates slope -Regression line: Is best fitting straight line for a set of data points. -Minimizes the distance all the data points fall from it. Can draw for + and - correlations. -Example image: Dots are the average distance from this regression line, Graph 1: the less consistent the data due to the wide variability of the scatter. Graph 2: More consistent data, the easier for the line to go straight through this data. Graphs 1 and 2: positive correlations (positive r) Graphs 3 and 4: negative correlations (negative r)

Correlation and causality continued, reverse causality, confounding variables

-A significant correlation DOES NOT show that one factor causes change in a second factor. -Reverse causality is a problem that arises when the direction of causality between two factors can be in either direction (Mood can influence eating behaviors but we also know that EBs can influence mood or can work together in a cyclical fashion). -A confounding variable or third variable is an unanticipated variable that could be causing changes in one or more measured variables. -Parasympathetic Activity in our PNS could influence both mood and EBs.

Prediction in research articles -r = correlation coefficient -Beta = prediction.

-Bivariate prediction models [prediction of scores on 1 variable based on scores of one other variable] rarely reported -Multiple regression results commonly reported [Prediction of scores on a criterion variable from scores on 2 or more variables] -Most of the time if you d some kind of prediction you bump up to multiple regression, but you can see this is a multiple regression that is predicting Avg. Intra-Group Effect Size at Post-Assessment. -Predictor variables [Independent variable] Beck Depression Inventory, Age, No. of Sessions, Duration of disorder -r = correlation coefficient -Beta = prediction. How these 4 predictor variables could be predicted after following some type of treatment of agoraphobia.

Multiple correlation... -Multiple correlation coefficient (R) -Squared multiple correlation (r^2) -Effect size for multiple regression (R^2)

-Cannot use the corr. coeff. (r) or the standardized corr. coeff. (beta) -Multiple correlation coefficient (R) = correlation between the criterion variable and ALL the predictor variables taken together -Problem with multiple correlation is typically smaller than all of those individual r's added together due to the typical overlap that exists among the predictor variables. -Example, there is probably some overlap between number of hours of sleep and SES depending on a person's potential work requirements, etc. We can't just add them all together, thus we must SQUARE the R. -Squared multiple correlation (r^2) = proportionate reduction in error or proportion of variance accounted for in the criterion variable by all the predictor variables taken together -Allows us to take that R sum and take it down a notch to where it is realistic of what might be happening. -Example: Add all the correlations together for each of our variables and get that R [mult. corr. coeff.] of 0.4, but we take that R and square it to 0.16. Therefore, that 0.16. 16% of that criterion variable predicted by the predictor variables we're using. That 0.16 tells us about 16% of the variance in our predictor variables is predicted by all of the predictor variables that we included. -R^2 measure of effect size for multiple regression Small = 0.02 Medium = 0.13 Large = 0.26

Effect size for the chi-square test for independence: Cramer's phi (Cramer's φ)

-Contingency tables larger than 2x2 -Cramer's phi = square root of X^2 / (N)(DFsmaller) -Square root of the chi-square / (number of participants) (df of the smaller category) Example: 2x3 contingency table, use the DF for the smallest category [which would be 2, DF = 2 - 1 = 1, in this example] -Could go up to 3x4, with 3 being our smallest category [DF = 3-1 = 2] -DF = NCategories - 1 -Example image: Table of Cohen's conventions for Cramer's Phi (Cramer's φ) -Smallest side of contingency table: 2, 3, 4 = 2, 3, 4 categories -When you increase the number of categories, you decrease the effect size conventions for the same size of an effect [small, medium, large] -Example: 4 categories [DF = 3] would give you a 0.17 for a medium effect size, whereas 3 categories [DF = 2] gives you a 0.21 for a medium effect size

Controversy of regression

-Controversy about how to judge the relative importance of each predictor variable in predicting the dependent variable/criterion variable. -One way to address that is to consider both the r's [corr. coeffs.] and the β's [stand. regression coeffs.]

Correlation and causality

-Correlation DOES NOT equal causality [one variable causing the other] ever. -Sexual orientation status (LGBTQ+/non-straight) have higher levels of suicide attempts and completions. Correlation between non-straight identification and suicide attempts. -But a person's orientation does not cause them to have higher suicidal rates - correlation does not equal causation. -There are lots of other factors happening that could be related. There could be confounding or third variables that we might not know of yet, for example discrimination, prejudice, social isolation. Those are what actually lead to being more depressed and having more suicide attempts. -Example image: Correlation between exciting activities and marital satisfaction. Association between more exciting activities and higher marital satisfaction. Low EA goes with lower MS. High EA goes with higher MS. -We will never know the direction or which one caused the other (chicken or the egg). -Engaging in EA improves MS. -Could be that people who had higher MS wanted to engage in more EA together. -There could be confounding variable that we are not even talking about. Could be a relationship between EA and MS but maybe that is because of low work stress (lower work stress, more likely to engage in EA and have the time and energy to do so. Also maybe low work stress is going to be related to higher levels of MS). -Other example: correlation between number of homicides and ice cream sales due to summertime (more people out of their house. Not because ice cream makes people more homicidal).

Correlation: statistical procedure vs. research design

-Correlation as a statistical procedure: calculating r-values and looking at significance of these r-values (correlation coefficients). -Correlation as a research design: any research design other than a true experiment (if we cannot randomly assign participants to different conditions). -Not necessarily statistically analyzed using correlation coefficient (correlation as a research design). -Some studies using experimental research designs WILL BE ANALYZED using a correlation coefficient (r).

Linear correlation

-Describes a situation where the pattern of dots fall roughly in a straight line. You could for the most part draw a straight line through those dots. -Example image: People who slept longer tended to have higher ratings of happy mood. People who slept fewer hours tended to have the lowest ratings of their happy mood. -The more hours someone slept, the more highly correlated that was going to be with having a happy mood.

Chi-Square distributions

-Our chi-square distributions are determined by DF. -DF = NCategories - 1 How we know where to look for our cutoff scores.

Standardized regression coefficient (beta)

-Expressed in standard deviation units -Shows the predicted amount of change in stand. dev. units of the criterion variable if the value of the predictor variable increases by 1 SD. -Example: Sleep and happiness ratings using a scale of 1-8 and a later example uses a scale of 0-20. -If we want to compare these, we would want to standardize that coefficient. Option to report regression coefficient as a stand. coefficient similar to moving raw scores to z-scores because we wanted to be standardized and expressed in st. dev. units. -Example: beta = 0.85, for every 1 SD increase in the predictor variable (X) (sleep), corresponds to a 0.85 increase in the criterion variable (Y) (happy mood) -Example 2: Measuring aggression and two different researchers measure aggression using a scale of 1-10 and 1-30. Those changes of the slope of the line will look different because of the range of scores. -Standardize the regression coefficient so we can compare across those studies. -With one predictor variable, the st. regression coefficient is equal to the correlation coefficient (r) They are NOT THE SAME if there is more than one predictor variable (multiple regression).

Scatter Diagram

-For graphing correlations. The graph associated with correlations. Steps for making a scattergram: 1. Draw aces and assign variables to them (the vertical axis = y-axis, the horizontal axis = x-axis). -We can assign for example the price on the vertical axis and the quality of an item on the horizontal axis and using both of those axes we'll be able to plot where an item would fall according to those variables. 2. Determine range of values for each variable and mark on axes. 3. Mark a dot for each item's/person's pair of scores. Example image of scatter diagram: -Graph 1: We've drawn out our x-axis (hours slept last night) and our y-axis (happy mood ratings). -Graph 2: Put out our range of values on either axis. -Graph 3: Take a person's pair of scores like (5,2) and plot. -Graph 4: Plot remaining scores - look for the pattern of dots that might tell you something about how these data could be correlated.

The effect [when multiplying those cross-products] on the correlation of different patterns of raw scores and z scores (The degree of linear correlation)

-How these patterns play out according to where the numbers are. -If Z scores are both high, both positive Z scores. Z scores can be positive or negative too. -The higher the Z score, the more likely it is to be positive, the lower the more likely it is to be negative. -Two high raw scores (+) -> Two positive Z scores -> A positive cross-product -> A positive correlation -Two low raw scores (-) -> Two negative Z scores -> A positive cross-product -> A positive correlation [High scores go with high scores, low scores go with low scores] -One high raw score (+), one low raw score (-) -> One positive and one negative Z score -> A negative cross-product -> A negative correlation -One low raw score (-), one high raw score (+) -> One positive and one negative Z score -> A negative cross-product -> A negative correlation [Lows go with highs, highs go with lows] -A middle raw score (about 0) and a high OR low raw score (+/-) -> A Z score of 0 and a +/- Z score -> A cross-product of 0 -> No correlation

Chi-Square Test for Independence: Independence

-Independence: No relationship exists between the variables in a contingency table. -Is there any relation between gender of the characters and whether they are adults/children? -Figure differences between observed and expected for each combination of categories; i.e. for each cell of the contingency table

The regression line

-Indicates the relation between predictor variable and predicted values of the criterion variable -Slope of the regression line (equals b, the raw-score regression coefficient). -Intercept of the regression line (equals a, the regression constant). -Steps on drawing the regression line: 1. Draw and label the axes for a scatter diagram 2. Figure predicted value on criterion variable for a low value on predictor variable - mark the point on graph 3. Repeat step 2 with a high value on the predictor variable (Example a low SAT and a high SAT with the predicted GPA for each) 4. Draw a line passing through the two marks -Example image: Step 1: Drawn and labeled axes Step 2: X=3 low value for predictor variable (number of hours slept last night) and plotted mood Step 3: X=11, higher value for predictor value Step 4: Drew line between 2 points -If we kept going with the line we would eventually get the point where it crosses the y-axis (a, the regression constant value). Can see and sort of look at the slope (b), it is pretty steep.

Observed and expected frequencies

-Key idea is the comparison of observed and expected frequencies in each category -Example: We might have the assumption that things are going to break evenly across categories, so 50% of 1 category and 50% of the other category. Then we'd actually collect data and see if that's what we actually observed in the participant pool. -Observed frequency: -# of people actually found in the study to be in a category or cell -Expected frequency: -# of people in the category expected if the null hypothesis were true

Correlation coefficients

-Numerically instead of graphically finding correlations/associations among variables through a scattergram. -Describe the degree and direction of linear correlation. -Such a measure would be especially useful if it has the following properties: 1. Numeric on a standard scale for all uses 2. A positive number for positive correlations and a negative number for negative correlations 3. A zero for no correlation -Degree of linear correlation... 1. Figure the correlation using Z-scores. Z = X - M / SD, where SD = square root of SD^2 = square root of SIGMA (X-M)^2 / N = square root SS / N 2. Cross-product of Z-scores -Multiply Z score on one variable by the Z score on the other variable (Zx)(Zy) 3. Correlation coefficient -Average of the cross-products of Z scores SIGMA (Zx)(Zy) / N

Chi-Square statistic

-Observed frequency (O) -Expected frequency (E) -Chi-square stat. (X^2) X^2 = SIGMA (O-E)^2/E -X^2 = Squared differences between observed and expected frequency distributions, then we divide those sq. diffs. by the expected frequencies and LASTLY add them all up. 1. (O-E)^2 2. (O-E)^2/E 3. SIGMA (O-E)^2/E

Chi-Square Test for Independence

-Oftentimes we are not just looking at one nominal variable like gender ... usually it's 2 or more at once. -Two nominal variables, each with several categories. E.g. gender AND age of characters [adult or child] -CONTINGENCY TABLE - a table showing the distribution of one variable in rows and another in columns, used to study the association between the two variables: -Seeing big discrepancies with adult men being the more likely option for these cereal box characters. -Of the 222 characters, 28 are child boys, 30 are child girls, 125 are adult men, and 39 are adult women.

Negative correlation

-Pattern of dots goes DOWN from left to right. -High scores on one variable (x-axis) go with low scores on the other variable (y-axis) -Low scores on one variable go with high scores on the other variable -Middle scores go with middle scores (same as positive correlation). -For positive or negative correlation, you always look from left to right and see if the dots go up (+) or down (-). -Negative correlation goes in the opposite direction of positive correlation. -Example image: Again, the upper graph is stronger correlated as the lower graph is more diffuse and thus having a weaker correlation. -The highest Y goes with the lowest X. The highest X goes with the lowest Y.

Positive correlation

-Pattern of dots goes UP from left to right. -High scores on one variable (x-axis) go with high scores on the other variable (y-axis) -Low scores on one variable go with low scores on the other variable -Middle scores go with middle scores -Example image: The upper graph is stronger (the dots are a little tighter in together). You see the highest X goes with the highest Y. -Lower graph is more diffuse, weaker correlation (more spread out).

Curvilinear corrleation

-Pattern of dots is curved, not in a straight line. -Example image: Y-axis - Rate of substitution digits for symbols (if you had 1-5 and they were assigned to different symbols like 1 = triangle, 2 = circle, and you gave someone a pattern of symbols, how quickly could they substitute the digits for those symbols? X-axis - Motivation ratings

Predictor and criterion variables and the questions answered by linear regression Q1: Is a pattern evident in a set of data points? [Similar to correlation in the last chapter] Q2: Does the equation of a straight line describe the pattern? Q3: Are the predictions made from this equation significant?

-Predictor variables are designated by X: variable with values that are known & can be used to predict values of another variable. The variable that does the prediction. -Criterion variables are designated by Y: variable with unknown values that can be predicted or estimated given known values of the predictor variable. The variable we are trying to estimate/know. -College GPA and SAT score example: Our SAT scores [predictor variable, X] predict college GPA [criterion variable, Y]. -We have the information of the SAT scores to predict the unknown information of college GPA. x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Q1: Is a pattern evident in a set of data points? [Similar to correlation in the last chapter] A1 [Method of answering the question]: Observe a graph of a set of data points to see what pattern emerges. Q2: Does the equation of a straight line describe the pattern? A2: Method of least squares/least squared error principle [line we draw through our dots, our regression line, how we get to that] Q3: Are the predictions made from this equation significant? A3: Analysis of regression

Linear prediction rule example

-Regression constant (a) = 0.3 -Regression coefficient (b) = 0.004 -What would we predict someone's GPA to be if their SAT score (X) was 700? -Y hat = 0.3 + (0.004)(700) = 3.1 -If you predicted someone's GPA if their SAT is a 700, their GPA would be a 3.1 -SAT scores/other standardized tests - new inform coming to light in how these tests are biased. People can still perform very well if they have low SAT scores.

Regression line example

-Regression constant (a) or our y-intercept equals 0.3, the point where our line crosses the y-axis. -X=700 on our predictor variable, Y=3.1 on our criterion variable.

The regression constant and regression coefficient

-Regression constant (a): Predicted raw score on criterion variable when raw score on predictor variable (X) is 0. -Also known as the y-intercept (where straight line crosses Y-axis on the graph). Where the regression line to better see the relationship in our correlations is when that line crosses the y-axis. -What is the SAT score for someone who has a 0 GPA or does really poorly, when college GPA is the y-axis and SAT is the x-axis. -Regression coefficient (b): How muhc the predicted criterion variable increases for every increase of 1 on the predictor variable -Also known as the slope of the straight line (angle of that line, whether it is steep or not steep) -When X and Y change in the same direction, slope is positive [line going up from left to right], when X and Y change in opposite directions, the slope is negative [line going down from left to right].

Limitations of regression [When is regression inaccurate]

-Regression is inaccurate if... a. Correlation is curvilinear [Only for linear correlations - we cannot have a solid prediction with one constant if our correlation curves on us somewhere in the middle of our chart.] b. Restriction in range is present [Example: Not the entire range of a variable is taken into account in a study. Like looking at intelligence but we only used a college student population. Our range in intelligence would be restricted in that study b/c there aren't gonna be a lot of people who test at low or below avg. intelligence in a college population. c. Unreliable measures are used. [If different people could share similar characteristics we're looking at but have very different scores on measures.] -Oftentimes researchers look for reliable and valid measures. Also when collecting data measures. Look for the correlation - see if there's any curvilinear pattern, any restriction of range, etc.

Correlation in research articles and correlation matrix

-Scatter diagrams occasionally shown, but most of the time it is a correlation matrix: shows the correlation coefficient among several variables; table in which the variables are named on top and along the side and the correlations AMONG THEM are all shown. -Some correlations that are not necessarily significant. The higher a correlation is, the more likely it is to be significant. Assuming you have an appropriate number of participants/items/observations. -Example image: Correlations between outside temp. and recorded behaviors of birds. -Though we're really looking at temp. and recorded behaviors, you can also look at how the recorded behaviors are potentially associated with one another. Look at the way in which each of the variables can be correlated with one another. -Panting and temperature, we see a higher, positive, linear correlation. The higher the temp., the more likely the bird is to be panting. -Asterisks indicate a significant correlation (as shown in a t-test) through the 0.05 or 0.01 level. 0.333*, positive linear correlation significant at the 0.05 level 0.365**, positive linear correlation significant at the 0.01 level for panting and yawning behaviors -0.59, negative linear correlation [the invest of panting versus stretching behaviors seen], not significant

Chi-Square Test for Goodness of Fit Example

-X^2 is big [264.96], but we need to compare it back to our cutoff score. -DF = NCategories - 1 df = 2 [male or female] - 1 = 1 -Cutoff X^2 = 3.841 -REJECT the null hypothesis; the research hypothesis that the populations are different is supported. Characters on cereal boxes are more likely to be men than women.

Chi-Square Test for Goodness of Fit

-Single nominal variable we are looking at [single category like gender] -Hypothesis testing procedure that tells us how well our observed frequency distribution of a nominal variable fits in with what we expected to happen in that pattern. -Hypothesis-testing procedure that examines how well an observed frequency distribution of a nominal variable fits some expected pattern of frequencies -Chi-square test involving levels of a single nominal variable -X^2 Statistic: SIGMA (O-E)^2/E x x x x x x x x x x x x x x x x x x x x x x x x x x x x x EXAMPLE: Character gender on cereal boxes [single nominal variable] [Rice Krispies people, Tony the Tiger, etc.] -Coders relied on clothing, hairstyle, facial features, name, etc. [Problematic today: Not the best way to collect data on gender, but its a simple example for this class. A lot of research looks at gender in a very binary way, dividing characters up into men/women or male/female.] -1,386 characters: 996 men [of the 1,386 to be male] (72%), 390 women (28%) -Null hypothesis: expect characters to be equally likely to be men or women (693 of each) 693:693 -Obviously, there is a discrepancy [between 996 and 693 | between 390 and 693?]; is this discrepancy more than what we would expect just by chance for a sample of this size? -Is this a significant enough difference between what we expected and what we actually observed that we would say it seems like gender plays a role in cereal box character creation? -Example image: 1. O - E 996 - 693 = 303 390 - 693 = -303 2. (O-E)^2 (996-693)^2 = 91,809 (390-693)^2 = 91,809 3. (O-E)^2/E 91,809/693 = 132.48 91,809/693 = 132.48 4. SIGMA (O-E)^2/E 132.48 + 132.48 = 264.96 X^2 is big [264.96], but we need to compare it back to our cutoff score.

Chi-Square Tests

-So far we've only used ratio and interval data for our variables ... -Hypothesis testing procedures for nominal variables (variables whose values are categories) -E.g., Country, region, religion, hair color, gender, race, ethnicity, classification/year in college -Focus on the number of people in different categories (frequencies)

General formula for the correlation coefficient

-Sum up total cross-products and divide by the number of participants or data points in order to get that average of those cross-products. Give us our r (correlation coefficient). -Positive perfect correlation: r = 1 Rating of 1 on the x-variable equals a rating of 1 on the y-variable. Rating of 10 on the x-variable equals a rating of 10 on the y-variable. -1 variable can be perfectly predicted from the other variable score if we have that perfect correlation (extremely rare). -The opposite is true in the negative perfect correlation. -Negative perfect correlation: r = -1 +1 with 10 10 with 1 2 with 8 -No correlation: r = 0 No relationship between this data, the perfect no relationship is 0. -The closer an r gets to 0, the less correlation, the closer to +/- 1, the stronger the association between those variables.

Least squared error principle/method of least squares

-The math we use to figure out the best linear prediction rule -Error: actual score minus the predicted score. We're using this formula to predict scores, maybe we can get some actual scores and compare those. (error = Y minus Y hat) -The best prediction rule has the smallest sum of squared errors [taking all of the errors, squaring them, adding them together] -Sum of squared errors: sum of squared differences between each predicted score and actual score on the criterion variable -Example image: If the prediction score is the actual score, your difference score will be 0 even after being squared and added for the SS. -Rule 1 example: 2 (Y) - 7.10 (Y hat) = -5.10 = -5.10^2 = 26.01 + 24.21 + 15.37 + 7.51 + .19 + .04 = Sum of squared errors of 73.33 Versus Rule 4's SS of 6. We would choose Rule 4 as it has the smallest sum of squared errors. The prediction is much closer.

Finding the best linear prediction rule

-Thing with using regression lines and finding the best linear prediction rule is you must figure what the best option for the regression coefficient and constant are. -Possible to draw many lines based on many prediction rules. -Rule 1: a = 8, b = 0.18 Does not match the pattern of dots. -Rule 2: a = 4, b = 0 Flat, makes no sense of the dots -Rules 3 and 4 look close together, but because Rule 4 crosses through the most points, we would assume BY GLANCING that it is probably the best rule. -But we can also use math to figure out the best line to pick.

Significance/significance testing of the correlation coefficient

-Use t-tables for looking up the correlation coefficient. -T is used to determine the significance of a correlation coefficient. t = r / square root of (1-r^2) / (N-2) 1. r^2 2. 1-r^2 3. N-2 4. (1-r^2)/(N-2) 5. square root of step 4 6. r / step 5 -With a DF of N-2. Example: r (correlation coefficient) = 0.71 t-cutoff (DF = N-2), 0.05 significance, two-tailed test = +/- 12.706 [A high cutoff]. If your t-score is more extreme than the +/- 12.706, then you will reject the null. -More participants/data you have, the lower your cutoff and the more likely you'll find a significant effect if there is one (power the study will carry). -In other words, the higher your DF, the more data points you'll have, the lower your cutoff t-score. t = 1.01, less extreme than our cutoff -Fail to reject/retain the null -Though we found a strong correlation of r=0.71, we will not say this is a significant correlation just yet (not meeting the critical threshold needed to say it is significant, probably needed more participants to lower that cutoff)

The linear prediction rule

-Using raw scores, the prediction model is: Predicted raw score (on the criterion variable) = Regression constant + the product of (the raw score regression coefficient) x (the raw score on the predictor variable) Y hat [predicted raw score on the criterion variable] = a [regression constant] + (b)(X) [the product of the raw score regression coefficient x the raw score on the predictor variable] Y hat = a + (b)(X) EXAMPLE: College GPA [what we are trying to predict] = a + (b)(known SAT score) Mnemonics: A = regression constAnt B = regression coefficient, BE EFFICIENT like cheerleader be aggressive. 1. Figure the regression constant (a) 2. Figure the raw-score regression coefficient (b) 3. Find predicted raw score on the criterion variable (Y hat).

Multiple regression

-When we predict scores on a criterion variable but we use 2 or more predictor variables. -Multiple regression prediction models: each predictor variable (x) has its own regression coefficient (b). Just one regression constant, multiple regression coefficients. -Multiple regression formula with three predictor variables: Y hat = a + (b1, corr. coeff. for number of hours of sleep)(x1, number of hours of sleep) + (b2, corr. coeff. for SES)(x2, SES) + (b3, corr. coeff. for a third variable like number of minutes exercised/of joyful movement) (x3, number of minutes) -Example: Predict happy mood from number of hours of sleep but also including socioeconomic status [a second predictor variable or x]. Now we try to predict someone's rating of happy mood based on both the number of hours of sleep they got as well as their SES. Only one constant would still be reported (one place where the line intercepts the y-axis).

Study guide continued: 1. Why do we convert raw scores to Z scores when figuring a correlation coefficient? 2. What is a scattergram and how do you interpret it? 3. How do you describe the pattern of the data in a positive correlation? In a negative correlation? In a curvilinear correlation? 4. What are the possible results of multiplying two Z scores when observing the correlation between variables? 5. How is the proportionate reduction in error figured in order to compare correlations? 6. Describe a correlation matrix.

1. It gives us a standardized value in which low scores are negative numbers and high scores are positive numbers. Tells us the direction of the correlation and is standardized in a way where we can compare it across different studies. 2. A graph that shows the relation of two variables through dots representing data points. No correlation: Dots are randomly scattered with no clear line/relationship Curvilinear: Dots arranged in a curve going up/down Positive correlation: dots in line going up from left to right Negative correlation: dots in line going down from left to right 3. Positive correlation: High scores go with high scores, low with low. Negative correlation: High scores go with low scores, low with high. Curvilinear: data points appear arched. 4. High scores (+) x High scores (+) = Positive Z scores = Positive correlation Low scores (-) x Low scores (-) = Positive Z scores = Positive correlation High scores (+) x Low scores (-) = Negative Z scores = Negative correlation 5. By squaring each correlation coefficient 6. Way people oftentimes report correlation in research articles. In a correlation matrix, variables are listed on top and left side, and the relationship between the variables are listed in each cell. Health factors listed and income factors listed. Good health , high income in one cell. Good health , low income. Bad health , high income. Bad health , low income.

Chapter 11 Review: 1. Why do we convert raw scores to Z score when figuring correlation coefficients? 2. How would you describe the following correlation coefficients: .2, -.87, 0, .9, .06? 3. What does the null hypothesis state when testing the significance of a correlation coefficient? 4.What is the formula for degrees of freedom when conducting a t test for the correlation coefficient? If we had a study with 35 individuals, what would the degrees of freedom be? 5.If there is a strong linear correlation between health & income, what is the direction of causality? 6.What is an outlier?

1. It gives us a standardized value in which low scores are negative numbers, and high scores are positive numbers. Helps us to understand the direction of the relationship between the variables. 2. -0.87 - strong negative linear correlation 0.2 - weak/medium strength positive linear correlation [closer to 0] 0 = no linear correlation between the two variables 0.9 - strong positive linear correlation [very strong association between the variables] [closer to +1] 0.06 - weak positive linear correlation [very close to 0, our regression/correlation line would still be going up, but when looking from left to right its still a very weak correlation] 3. The true correlation in the population [those population variables] is 0. No correlation happening between those variables in the population. 4. DF = N-2, 35 - 2 = 33 5. Can't say; health could cause income (greater/less earning potential) ; Income could cause health (the more income, the better the access to health care or better healthy activities and facilities) ; A third variable could be impacting both health and income. Do not necessarily know the direction of the causality in this relationship. Could go in either direction. Correlation does not equal causation. 6. Pay attention when looking at our correlations and our data and the ways those can influence our data and associations we are finding between variables. -A score that has an extreme value in relation to other scores in a distribution. Example: Number of hours spent studying and scores on a Stats exam. Positive linear correlation between the more hours studied and the better you did on the Stats exam. -Let's say someone only studied one hour and got an A+ or the inverse; someone studied 8+ hours and got an F. Those data points are big outliers.

Patterns of correlation

1. No correlation (no discernable pattern) 2. Curvilinear correlation (the pattern curves) 3. Linear correlation 3a. Positive correlation 3b. Negative correlation When looking at a scatter diagram, we have different possible patterns we might see among the dots. -Example image: Some points it appears to have a pattern emerging, but for the most part, they are kind of scattered about randomly. Does not tell you much about the association between the two variables. -We would say both of these graphs show no correlation (no pattern).

Study Guide: 1. Of the total sample, what percentage were agnostics, atheists, and others? 2. What percentage of the middle class respondents were Catholic? 3. What religious identification are working class individuals most likely to be? Middle class individuals? 4. What is the phi coefficient for this study? 5. For the chi-square test for goodness of fit, what is the df? What is the null hypothesis? 6. What is the difference between the chi-square test for goodness of fit and the chi-square test for independence? 7. 1. Explain a chi-square test: a. what kind of variables do you have? Nominal variables b. What is the main idea of the test? c. What calculation is necessary for the statistic? d. What type of table is used? e. How would you describe the distribution? f. What is the primary debate regarding chi-square tests? 8. What is the null hypothesis when conducting a bivariate linear prediction? How do the correlation coefficient & regression coefficient relate in hypothesis test for a linear prediction rule? 9. How do you calculate error in regression? Why are regression errors squared? What is the name for adding these squares together? 10. What is a regression line? What is the minimum number of predicted points on a graph required for drawing a regression line?

1. Total sample: 100 Agnostic: 12 people Atheist: 13 people Other: 22 people (12 + 13 + 22) / 100 = 47% 2. Middle class: 34 Catholic: 5 5 / 34 = 14.7% 3. Protestant/Catholic/Other, Protestant 4. SQUARE ROOT X^2 / N SQUARE ROOT 11.73 / 100 = 0.34 5. Null: The proportion of people over categories breaks down the same for the two populations. 6. Goodness of fit test is limited to one variable, and the independence test is not. Independence test looks for a relationship between multiple nominal variables.Independence means there is no relationship between the variables. 7. a. nominal b. Examine if pattern of observed frequencies fit the expected pattern of frequencies c. Take difference between observed and expected frequencies; square and sum the difference; divide by the expected frequency of each cell d. Contingency table: distribution of two nominal variables listed so that you have frequencies for each cell and the combination/totals e. Always greater than 0 and positively skewed f. Small expected frequencies (try to have at least 5x as many individuals as cells) 8. Null: predicted amount of change on the criterion variable is 0 when predictor variable increases by one standard deviation [An increase would happen on the predictor variable but there would be no change in the criterion variable because they are not related/correlated to each other] Beta: Standardized Regression Coefficient = 0 For linear prediction, if the correlation coefficient is significant, the regression coefficient will be significant. 9. Error in regression = actual score on criterion variable - predicted score on criterion variable (Y - Y hat) We square regression errors because summing the positive and negative errors will cancel them out Adding up the squared errors - The Sum of Squared Errors 10. A regression line shows the relationship between predictor and criterion variables. Can be positive/negative according to the relationship between variables. A minimum of 2 data points is required to draw a regression line.

Chapter 12 Review: 1.If a professor wants to predict test grades from the hours a student studied, a. what is the predictor variable? b. What is the criterion variable? 2.What is a regression coefficient? 3.What is the regression constant? 4.An I/O psychologist studying adjustment to the job of new employees found that employees' amount of education (in number of years) predicts ratings by job supervisors two months later. If: -Regression Constant = 0.5 -Regression Coefficient = 0.4 -Individual has 10 years of education a. What is the predictor variable? b. What is the criterion variable? c. What is the linear prediction rule for this example? d. What is the predicted job rating for the employee in this example?

1a. The predictor variable [x] is the number of hours studied. The variable we're basing our prediction on. 1b. The criterion variable is the test grades. The variable we're trying to predict. 2. (b) In the linear prediction equation; indicates how many units of change is predicted in the criterion variable for each unit of change in the predictor variable. -Example: Number of hours studied and test grades - how much we're predicting a test grade is going to change for every additional hour a student studies. The slope of our line. 3. (a) in the linear prediction equation; the y-intercept; predicted score when the score on the predictor variable is 0; the number you always start with when calculating regression/prediction line. -Example: When a student studied 0 hours, what would we predict that their grade be on the exam (criterion variable) is going to be? That starting point - the constant, that's going to be added to the slope of that line/prediction. -When a student got a 0 on the SAT, what would we predict their GPA in college (criterion variable) to be? 4. We can work through this problem of what we would predict their job supervisor's rating of them to be. 4a. What variable are we using to predict another variable? Amount of education in number of years. Years of education. b. What variable are we actually trying to predict? Ratings by job supervisors c. Y hat = 0.5 + (0.4)(10) Y hat/criterion variable predicted value = regression constant/a + (regression coefficient/b) (predictor variable/x) d. 4.5 What we would expect this person's job rating to be two months into a job based solely on their number of years of education.

Correlation

Association between scores on two variables. Example: Age and score for coordination skills in children. Price and quality. Does coordination go up with age, does it go up and peak and then start falling back off? The price of an item and a quality rating of that item (the association between those difference scores).

Assumptions for the chi-square test and controversies/limitations for the chi-square test

Assumptions: -No individual can be counted in more than one category/cell -No pre-post test measures Controversies: -Minimum acceptable frequency for a category or cell [If you do not have a lot of data points, small sample] -Small expected frequencies [general rule of thumb for the small expected frequencies] -At least 5 times as many individuals as categories (or cells) -Reduce power [Smaller your cells in frequency get, the lower your power] -Example: 2x2 -> 4 cells. Want to have at least 20 individuals in that study. 3x4 -> 12 cells. Want to have at least 60 individuals to run the X^2 analyses.

Chi-Square Test for Independence example

Step 1. Determine the expected frequencies -In a contingency table our expected freqs. are typically in parentheticals next to the observed freqs. E = (R/N) (C) E = (Number in Row / Total # of items) (Number in column) E = (58 / 222) (153) = 39.9 - expected frequency for male child cell. E = (164 / 222) (153) = 113.1 - expected frequency for male adult cell. E = (58 / 222) (69) = 18.0 - expected frequency for female child cell. E = (164 / 222) (69) = 51.0 - expected frequency for female adult cell. Step 2. Figure chi-square FOR EACH CELL X^2 = SIGMA (O-E)^2 / E a. (O-E)^2 per cell [in our contingency table we had 4 cells] b. (O-E)^2/E per cell c. SIGMA (O-E)^2/E per cell Step 3. Degrees of freedom DF = (NColumns - 1) (NRows - 1) Step 4. Compare chi-square to cutoff X^2 [using our DF] Step 5. Reject/fail to reject the null hypothesis x x x x x x x x x x x x x x x x x x x x x x x x x x x Example: X^2 = [(28 - 39.9)^2 / 39.9 ] + [(30-18)^2 / 18 ] + [(125-113.1)^2 / 113.1 ] + [(39-51)^2 / 51 ] X^2 = 3.55 + 8 + 1.25 + 2.82 X^2 = 15.62 -> Compare that back to our X^3 table using the DF [our cutoff X^2] DF = (2-1) (2-1), DF = 1 X^2 cutoff = 3.841 Our X^2 = 15.62 -Reject the null hypothesis. The research hypothesis that the two variables are not independent in the population is supported. That is, the proportion of characters that are children or adults is different for men and women characters.

Spot the pattern of correlation -Linear (positive or negative), curvilinear, or no correlation. -How diffuse/close together the dots are tell us the size of the correlation.

a. Linear, negative, large correlation (dots are close together). High Y, low X. Low Y, high X. b. Curvilinear c. Linear, positive, large correlation (tight, these dots are going together clearly a line going through them) Looking left -> right, dots go up (+) d. No correlation (lots of scatter, no clear pattern)

Finding the regression coefficient (b) and regression constant (a)

b = Sum of total difference scores on the predictor and criterion variable / sum of squares for the X variable a = Mean of variable y - (b x mean on variable x)

Chi-Square Tests in research articles

χ2(2, n = 101) = 11.89, p < .005 χ2 = Chi-square = 11.89 2 = Smallest DF to get our cutoff score [DF = NCategories - 1] 101 = Total number of participants calculated for χ2 p < .005 = χ2/chi-square test is significant at the set p of < .005.


Related study sets

mastering A and P chap 21 (I, II)

View Set

Pharmacology PrepU Chapter 9: Antibiotics

View Set

APSUSH Period 4 Exam Study Guide

View Set

Business Tech Chapter 1 Practice Quiz

View Set

Chapters 4-6 Criminal Justice Study guide

View Set

AP Language and Composition Mock Exam 2

View Set

SKills 22- administration of parenteral meds

View Set