PSYC 210
Functions of Regression
-predicting outcomes -must have at least two variable (predictor, outcome) -predictor and moderator help estimate dependent variable
r squared effect size
-small 0.01 -medium 0.09 -large 0.25
r effect size
-small 0.10 -medium 0.30 -large 0.50
Regression analysis of humidity as the predictor variable and thunderstorms as the outcome yields a large correlation coefficient of 0.9. Researchers can safely conclude that humidity causes thunderstorms. TRUE FALSE
FALSE
estimated omega squared
(t squared minus 1)/ (t squared plus N1 plus N2 minus 1)
Limitations of hypothesis testing
-"Chance or more than chance?" -Only protects against Type I Error -Other issues that we might want to know
How can we repair limitations of regression?
-"standardize" -Transform both X and Y into Z scores -means equal to 0 -standard deviations equal to 1
Three major types of effect size
-Cohen's d -r and r squared -omega squared
Assumptions of Regression/ Correlation
-Interval or ratio scales -Linear relationship -Normal distributions for X and Y -Homoscedasticity
Line of best fit
-Line that minimizes the difference between predicted Y (Y-hat) and the actual Y we observed -This is called the least squares criterion -judges central tendency within scattered linear distributions (sum of differences of y squared)
Functions of multiple regression
-Measuring complex phenomena with multiple factors
Effect size
-Strength of relationship/ difference -single number -much like using the mean to describe a sample
What happens to a regression equation when standardized?
-The correlation is normalized but reduced -The denominator changes to take square root of multiplied differences in the sums of x with the differences in the sums of y
Limitations of Regression
-The regression line shows us the best line for describing the relationship between two variables, but... -It doesn't tell us the RELATIVE strength of the relationship -It doesn't allow us to truly compare relations between variables
Omega squared
-Variance in dependent variable attributable to independent variable (%) -Large differences, more variance due to groups
Why can correlations not be used to establish causation?
-X might cause Y -Y might cause X -A third variable might otherwise impact X and Y
confidence intervals
-extension of hypothesis testing -single interval cannot contain 0 -two intervals cannot overlap
y= bx + a
-linear equation (equation for a straight line) -a= intercept (predicted point when x=0) -b= the slope of the line
Correlation Coefficient
-the regression coefficient (b) when predicting Zy from Zx is represented by the letter r -typically r(xy) and is called the correlation coefficient -Ranges from (+/-)1 to 0
Which of the following study scenarios would be best tested using a oneway ANOVA? a. A researcher wants to examine for differences in happiness level in fall, Winter, Spring, and Summer b. A researcher wants to examine for differences in happiness level between males and females c.A researcher wants to predict happiness level as a function of age, political ideology, and education level d. A researcher wants to examine mortality rates as a function of happiness level
a. A researcher wants to examine for differences in happiness level in fall, Winter, Spring, and Summer
What kind of variable should be used when using regression? a. Continuous b. Categorical c. Doesn't matter d. Interval
a. Continuous
You run a regression for child and parent height and obtain a slope of the line of best fit of 0.913. What does this mean? a. For every inch a child grows taller, the parent seems to get taller by 0.913 b. For every inch a child grows taller, the parent seems to get shorter by 0.913 c. This means there's is not a significant relationship because the slope is less than one.
a. For every inch a child grows taller, the parent seems to get taller by 0.913
What does the regression line tell us? a. It tells you the relationship between the two variables b. It tells you the relative strength between the relationship of the variables c. It compares the relationship between 3 or more variables d. It is a graph of all the points in the data set
a. It tells you the relationship between the two variables
If I was interested in determining if two means differed and my 95% confidence interval ranged from 5.24 to 12.59, what would I conclude? a. Reject the null hypothesis because the confidence interval does not include the value of my null hypothesis (zero) b. Reject the null hypothesis because the confidence interval does not include the value of my sample means c. Fail to reject the null hypothesis because the confidence interval does not include the value of my null hypothesis (zero) d. Fail to reject the null hypothesis because the confidence interval does not include the value of the sample means a
a. Reject the null hypothesis because the confidence interval does not include the value of my null hypothesis (zero)
A researcher runs a multiple regression model with satisfaction with life as the outcome and sense of community and self-efficacy as the predictors. They obtain the following results: Sense of community: b = 3.0, beta = .8, p = .02; Self-efficacy: b = 3.0, beta = .6, p = .0001. Which, if any, is the strongest predictor of satisfaction with life and why? a. Sense of community because it has a larger value of beta b. Self-efficacy because it has a smaller p-value c. They are equally strong because they have the same value of b d. There is not enough information to make a decision
a. Sense of community because it has a larger value of beta
How would you interpret an interaction effect between sleep and test difficulty on test performance? a. The effect of sleep on test performance depends on test difficulty b. The effect of test difficulty on test performance depends on sleep c. The main effect of sleep and test difficulty are significant d. There are no main effects
a. The effect of sleep on test performance depends on test difficulty b. The effect of test difficulty on test performance depends on sleep
If I have a factorial design examining the impact of sleep and test difficulty on performance, how many factors are present? a. 1 b. 2 c. 3 d. Unable to tell without more information
b. 2
Which of the following study scenarios would be best testing using a factorial ANOVA? a. A researcher wants to test if sex (male, female) differ as a function of happiness MSwithin groups is an estimate of error variance: level b. A researcher wants to test if there are differences in happiness level as a function of sex (male, female) and marital status (married, single) c. A researcher wants to test if happiness levels differ between single and married people d. A researcher wants to test what the strongest predictor of happiness is
b. A researcher wants to test if there are differences in happiness level as a function of sex (male, female) and marital status (married, single)
What would you do if your null hypothesis was that there were no mean differences between two groups and you had a 95% confidence interval of -1.23 to 3.45? a. Reject the null hypothesis b. Fail to reject the null hypothesis c. I hate "What do you know Wednesdays..."
b. Fail to reject the null hypothesis
A researcher runs a multiple regression model with anxiety as the outcome and number of cups of coffee and number of hours slept as the predictors. The unstandardized slope for cups of coffee was estimated to be -0.2. What is a correct interpretation of this number? a. For each additional cup of coffee consumed, anxiety is expected to decrease by 0.2 points. b. For each additional cup of coffee consumed anxiety is expected to decrease by 0.2 points, holding constant number of hours slept. c. There is a weak, negative correlation between cups of coffee consumed and anxiety. d. Coffee consumption explains 20% of the variance in anxiety in the negative direction.
b. For each additional cup of coffee consumed anxiety is expected to decrease by 0.2 points, holding constant number of hours slept.
How does ANOVA "work?" a. It conducts multiple t-tests to determine which means are different. b. It partitions variance into two meaningful types - between and within group. c. It tests the extent to which variables covary relative to how variables vary on their own.
b. It partitions variance into two meaningful types - between and within group.
In relation to a simple analysis of variance, which of the following statements is NOT true? a. SStotal = SSbetween groups + SSwithin groups b. MStotal = MSbetween groups + MSwithin groups c. MSbetween groups = MSwithin groups when the null hypothesis is true d. All are true
b. MStotal = MSbetween groups + MSwithin groups
Why do you think you get different results when different combinations of independent variables are added into a multiple regression model? a. Multiple regression allows you to ignore the effects of the other variables b. Multiple regression takes into account overlapping variance between predictors c. Wait, you don't get the same results if you add in different predictors?!?
b. Multiple regression takes into account overlapping variance between predictors
If your regression line ends up being horizontal this means that: a. The correlation is large b. No correlation c. Cannot be determined
b. No correlation
Pairwise comparisons made after finding that F is significant and used to determine which pairs of means are significantly different are known as: a. Omnibuses b. Post hoc tests c. Effect size measures d. Planned contrasts
b. Post hoc tests
How do you know if you have a small, medium, or large effect? a. Small effects are always less than 0.1, medium effects are always around 0.25, and large effects are always around 0.9 b. Statisticians have set criteria for what is small medium and large c. If an effect is significant, the effect size is large
b. Statisticians have set criteria for what is small medium and large
Conducting a oneway ANOVA with only two groups is actually the same as conducting a: a. Z test b. T-test c. Multiple regression d. Chi-square
b. T-test
In regression, residual refers to: a. The difference between the predicted value and the overall mean b. The difference between the predicted value of an outcome and the actual observed value c. The difference between an actual outcome score and the overall mean d. The slope of the regression line
b. The difference between the predicted value of an outcome and the actual observed value
In a 2-way analysis of variance, when we say we have a MAIN EFFECT, we mean that: a. The main source of variance in our experiment is produced by Factor A and B b. The marginal means for the levels of Factor A after collapsing over Factor B are significantly different from one another c. The effect of one independent variables on a dependent variable is significant, taking into account the second independent variable d. The effect of one independent variable on a dependent variable differs as a function of a second independent variable
b. The marginal means for the levels of Factor A after collapsing over Factor B are significantly different from one another
What information does r-squared in the regression model give you? a. The overlapping variance in the predictors b. The proportion of variance in the outcome accounted for (or explained by) the combination of predictors c. Leftover (unexplained) variance in the outcome d. The significance of the predictors in the overall model
b. The proportion of variance in the outcome accounted for (or explained by) the combination of predictors
For the previous example: I have a significant main effect of sleep. What does this mean? a. There is a difference in test performance as a function of sleep taking into account test difficulty b. There is a difference in test performance as a function of sleep averaging across test difficulty c. There is a difference in test performance as a function of test difficulty taking into account sleep d. There is a difference in test performance as a function of test difficulty taking into account sleep
b. There is a difference in test performance as a function of sleep averaging across test difficulty
When would you use R^2? a. When you want to know how strong the correlation is b. When you want to know what proportion of the variance can be accounted for by your model c. When you want to know the slope of the line d. When you want to minimize error
b. When you want to know what proportion of the variance can be accounted for by your model
Oneway ANOVA is useful when: a. You are examining for differences on a continuous outcome and you have two groups b. You are examining for differences on a continuous outcome and you have more than two groups c. You want to examine a bivariate relation between two variables d. You are interested in relationship strength
b. You are examining for differences on a continuous outcome and you have more than two groups
How do you determine which independent variable is the strongest predictor in multiple regression? a. b (slope) b. beta (standardized slope) c. significance d. r-squared
b. beta (standardized slope)
Regression must have at least __________ variables. a. one b. two c. three d. no minimum amount of variables
b. two
Regression analysis provides us with a 'line of best fit.' What do we mean by line of best fit? a. A line that can allow us to calculate an exact outcome for a given predictor. b. A line that captures all data points c. A single line that best represents the relation between X and Y d. A line that minimizes residuals
c. A single line that best represents the relation between X and Y d. A line that minimizes residuals
When we run a 2-way factorial analysis of variance, we are partitioning the ______________ into three meaningful parts. a. Total variance b. Within group variance c. Between groups variance d. Interaction variance
c. Between groups variance
What effect size is most suitable for a t-test? a. r b. r-squared c. Cohen's d d. omega-squared
c. Cohen's d
Correlation coefficients can be impacted by a number of factors. Which of the following statements is true? a. Curvilinear effects will strengthen a correlation b. Heteroscedasticity (difference of variances) will improve b. interpretability of a correlation c. Extreme scores can inflate a correlation d. Restricting range will generally inflate a correlation
c. Extreme scores can inflate a correlation
You run a regression on data for drinking and test performance. Your results are a line of best fit with slope -0.754, a correlation coefficient of 0.366, and a p value of 0.001. Interpret these results. a. For every unit increase in drinking, test performance increases by 0.754. The effect size is medium, and the results are significant. b. For every unit increase in drinking, test performance increases by 0.366, the effect size is large, and the results are significant. c. For every unit increase in drinking, test performance decreases by 0.754. The effect size is medium, and the results are significant. d. For every unit increase in drinking, test performance increases by 0.366, the effect size is large, and the results are insignificant.
c. For every unit increase in drinking, test performance decreases by 0.754. The effect size is medium, and the results are significant.
What does F-ratio tell us in simple language? a. Out of two variables, which one is affecting the outcome more significantly b. Where on the normal distribution the difference between two means falls c. How large the differences between groups are when you take into account the differences within individual groups d. How many standard deviations from the mean your sample is
c. How large the differences between groups are when you take into account the differences within individual groups
What does R-squared tell us? a. R-squared tells us the unstandardized slope. b. R-squared is the correlation coefficient. c. R-squared tells us the proportion of variance or difference in your outcome that is associated or predicted by your predictor value. d. Both b and c.
c. R-squared tells us the proportion of variance or difference in your outcome that is associated or predicted by your predictor value.
Which best describes what regression allows us to do? a. Regression allows us to determine if the mean of two populations is the same . b. Regression allows us to determine if the mean of two samples is the same. c. Regression allows us to predict one variable from another. d. Regression allows us to do all of the above.
c. Regression allows us to predict one variable from another.
What does the term 'residual' refer to in regression? a. The slope of the line b. The Y intercept c. The difference between a predicted and actual outcome d. The predicted outcome derived from the regression analysis
c. The difference between a predicted and actual outcome
What's the deal with sums of squares? a. They can be used to compute an F ratio b. They are an indicator of average reliability c. They are the summed total of squared differences d. They give us an interpretable number for between and within groups variance
c. They are the summed total of squared differences
MSwithin groups is an estimate of error variance: a. Only when the null hypothesis is true b. Only when the null hypothesis is false c. Under all circumstances d. Under no circumstances
c. Under all circumstances
How would you interpret a 95% confidence interval around a population parameter around zero. a. We are 95% confidence that our results are significant (because the interval contains zero) b. We are 95% confidence our results are not significant (because the interval contains zero) c. We are 95% confident that the interval captures the true population mean (zero) d. We are 95% confident that the mean is zero
c. We are 95% confident that the interval captures the true population mean (zero)
What is FALSE about correlation and causation? a. X might cause Y b. Y might cause X c. Correlation implies causation d. none of the above
c. correlation implies causation
What is a major limitation of regression? a. it is not statistically significant b. it does not involve predictors and outcomes c. it doesn't tell us the relative strength of the relationship
c. it doesn't tell us the relative strength of the relationship
How does regression allow us to predict one variable from another? a. it doesn't b. slope to give us an estimator of the relation between two or more variables c. line of best fit to give us an estimator of the relation between two or more variables d. t-tests are used to give us an estimator of the relation between two or more variables
c. line of best fit to give us an estimator of the relation between two or more variables
What measure of effect size is preferred for multiple regression? a. Cohen's d b. r c. r-squared d. Omega-squared
c. r-squared
Which is NOT an assumption of regression/correlation a. Linear model b. Interval/ratio scales c. Homoscedasticity d. A large value of N
d. A large value of N
If I used a 2x3 design, how many cells or conditions would I have? a. 2 b. 4 c. 5 d. 6
d. 6
In multiple regression, we can use standardized regression coefficients (betas) to: a. See which predictor is the strongest/best predictor. b. See which outcome is the best. c. Give us an indicator of relationship strength. d. Both A and C
d. Both A and C
In a 2-way analysis of variance, when we say we have an INTERACTION, we mean that: a. The main source of variance in our experiment is produced by Factor A and B b. The marginal means for the levels of Factor A after collapsing over Factor B are significantly different from one another c. The effect of one independent variables on a dependent variable is significant, regardless of the second independent variable d. The effect of one independent variable on a dependent variable differs as a function of a second independent variable
d. The effect of one independent variable on a dependent variable differs as a function of a second independent variable
r-squared can be interpreted as: a. The Y unit increase for a one unit increase in X b. The slope of the line between X and Y c. The proportion of unexplained variance between X and Y d. The proportion of variance in Y that is accounted for /explained by X
d. The proportion of variance in Y that is accounted for /explained by X
A 95% confidence interval around a given sample statistic can be interpreted as: a. We are 95% confident the statistic is significant b. We are 95% confident in the generalization of our findings c. We can reject the null hypothesis d. We are 95% confident the true population parameter is between the upper and lower bounds of the interval
d. We are 95% confident the true population parameter is between the upper and lower bounds of the interval
A correlation coefficient of -1.32 tells us: a. There is a very strong negative relationship between X and Y b. There is a very strong positive relationship between X and Y c. The relationship between X and Y is not linear d. We have calculated the correlation coefficient incorrectly
d. We have calculated the correlation coefficient incorrectly
The least squares criterion is also known as a. what maximizes residuals b. what minimizes the residuals c. the line of best fit d. both b and c
d. both b and c
Cohen's d
size of the difference standardized
Cohen's d effect size
small .2 medium .5 large .8 *values may exceed 1*
Slope formula
sum of x times y - sum x times sum y all over sum of x-squared minus the square of the sum of x
r(xy)
the degree to which x and y covary relative to how much they each vary by themselves EX what you do with friends vs what you do alone
r squared
the proportion of the total variation in a dependent variable explained by an independent variable