HCS 202 Final Exam
Multiple Regression: what are these often used to create?
Predict variance on dependent variable based on a single continuous predictor These are often used to create prediction equations
Least squares criterion:
Prediction errors are squared and the best-fitting regression line is the one that has the smallest sum of squared errors
Simple Linear Regression:
Predictor variable is used to predict a case's score on another
what is R^2?
Proportion of variance in the DV (outcome) is explained by the predictor in the model, the proportion of shared variance
Assumptions of Chi-Square:
Random sample Independence of observations Expected frequencies
Assumptions of simple regression:
Random samples Independence of observations Normality Linearity
Repeated Measures ANOVA:
Same people measured three or more times (pretest, posttest, follow-up)
also known as...
Simple bivariate regression
p-values and multiple regression
The p-value for the slope will not be the same as the p-value of the model
2 coefficients:
a = y-intercept b = slope
expected frequencies:
all cells must have expected frequencies of at least 5. (not robust)
if R is negative...
as x increases, y decreases (negative correlation)
if R is positive ....
as x increases, y increases (positive correlation)
Degrees of Freedom of chi square:
(R - 1)(C - 1) R = rows in contingency table C = columns in contingency table
One-way ANOVA:
- Difference test to see if the mean is the same across three or more groups - One categorical grouping variable and one continuous dependent variable - Categorical group has three or more groups
Spearman's Rank Order Correlation Coefficient examines the relationship between: (2)
- two ordinal-level variables - one one ordinal level variable and one interval/ratio level variable
Correlation coefficient is bound between ....
0 and 1 (can be negative)
To show Cause and effect, what 3 things must occur?
1. relationship must be shown 2. show that the cause happened before the effect 3. there is an absence of other factors that would affect both variables being measured
chi square is also known as....
Chi-square test of association Chi-square test for contingency tables
what is the residual?
Difference between the actual values and the prediction = residual difference between observed - predicted
non parametric test: examples:
Does not have strict assumptions about the shape of the distributions - not as powerful, less likely to detect effects - Used w/ nominal or ordinal level dependent variables examples: - Chi-square goodness-of-fit test - Chi-square test of independence - Spearman rank-order correlation coefficient - Mann-whitney U test
what is the mann-whitney u test an equivalent to?
Equivalent of the independent-samples t test
b = slope is.... - positive slope: - negative slope: - when is slope interpreted? - comparisons of p-values if the model has one predictor:
Expected change in y for every one unit increase in x - Expect y to increase by the number slope given - Expect y to decrease by the number slope given - Interpret the slope ONLY if the p-value is significant - Model with one predictor will have the same p-value of the slope p-value
a = y-intercept is.... - is p-value interpretable? - when do u interpret y-intercept?
Expected level of y when x is equal to zero OR Expected level of DV when IV is equal to zero P-value for y-intercept is NOT interpretable ONLY slope p-value - Interpret the y-intercept no matter what
what is the only design that can show cause and effect?
Experimental design
When is the Mann-Whitney U test used?
Fallback test when non robust assumptions have been violated for independent-samples t tests
how to interpret slope of multiple regression
For every one unit increase in [predictor], the [DV] will increase/decrease by [the number given] after controlling for [other predictors]
experimental designs:
Independent Variable (IV): manipulated Dependent Variable (DV): measured outcome random assignment is used for groups
hypotheses of multiple regression:
Null: (The variables) are not predictors of the dependent variable. Alternative: (The variables) are predictors of the dependent variable
Hypotheses of regression:
Null: The independent variable is not a significant predictor for the dependent variable Alternative: The independent variable is a significant predictor for the dependent variable
Hypotheses of Pearson's R: null - alternative -
Null: no relationship between both variables, the correlation = 0 Alternative: there is a relationship/correlation between both variables, the correlation ≠ 0
interval numbers
Numbers that can identify, rank with equal distance but have no absolute zero
Nominal Numbers:
Numbers used for identification only (ex: sports jersey)
what statistic is used?
Pearson's r
normality of correlations coefficient:
both variables are dependent, so must test normality for both
Correlation DOES NOT equal
causation
Dependent / Paired Samples T-Test: also known as paired samples, Within-subject, Matched pairs, Repeated-measures
compares two sample means from the same group of subjects (test & retest or before and after) Used to compare means of two dependent samples
When two variables are related, they are said to be...
correlated
if there is no relationship....
correlation coefficient of ZERO
correlational designs:
examine relationships between variables without manipulating them
Pearson's Correlation Coefficient:
measures the degree of linear relationship between two continuous variables
what is the prediction and reality?
model is prediction, actual data is reality
Regression can be used to test what 4 things?
moderation, mediation, continuous and categorical predictors
how to describe strength of a relationship?
more linear = stronger more scattered = weaker
Non-linear relationships:
not a straight line relationship (pearsons cannot be used)
Variables in relationship tests are not ... but rather
not manipulated by the researcher, they are measured.
linearity:
not robust the relationship between two variables in the population is a linear one (not NOT linear) - The rate of change is constant - look at scatterplot to see trend
ratio numbers
numbers that can identify, rank with equal distance and have an absolute zero
ordinal numbers
numbers that describe position or order (first, second . . .)
inferential statistics
numerical data that allow one to generalize- to infer from sample data the probability of something being true of a population Observations used to draw conclusions about a population
what are the types of frequencies of chi square? define them
observed frequency: how many people were in data set EXPECTED FREQUENCY: if the null is true what do we expect the effect to be → no effect, distribution should be even
Outliers:
outlier points on the graph falls outside of the range of where the rest of the data falls, changes type of graph falsely
Calculating R:
r = sum of [(X - Mx)(Y-My)] / square root of (SSxSSy) sum of [(case's score on variable X - Mean of variable X)(case's score on variable Y - Mean of variable Y)] / square root of (sum of deviation scores of variable X)(sum of deviation scores of variable Y)
APA format of Pearson's R
r(df) = ._ _ _, p = ._ _ _ No leading zeros for ANY Both should be reported to three decimal places
how to report r statistic:
r=._ _ _ (do not put leading number)
Effect Size:
r^2 Interpreted as the % of shared variability Based on r correlation number, you can determine the strength (how close to 1) and direction (pos or neg)
Assumptions of Pearson's R:
random samples independence of observations normality linearity
quasi-experimental designs:
research designs involving the manipulation of the independent variable but lacking either random assignment to groups or a control group
Interpretation of r2:
small effect: 1% medium effect: 9% large effect: >25%
smaller the value of chi-square
smaller the difference between the observed and expected
descriptive statistics
statistics that summarize the data collected in a study Meaningful interpretations of data
Calculating Chi-Square:
sum of (observed frequency - expected frequency)^2 / expected frequency
Two-Way ANOVA:
tests the effect of Two different independent categorical variables on One continuous dependent variable
The variables vary together but this doesn't necessarily mean .....
that one causes a change in the other, other factors may influence individual variables to change
As correlation becomes stronger....
the correlation coefficient becomes closer to ONE
cumulative percentage
the percentage of cases at or before any given value of the variable
Cumulative Frequency:
the sum of the frequencies for that class and all previous classes
Restriction of Range:
there is a full range of possible values showing a correlation on scatterplot, BUT only a small portion is shown which doesn't represent the correlation well
what do correlations ONLY guarantee?
there is an association between both variable
as variability increases ....
unexplained variability decreases so it is more likely to be significant
Ungrouped vs grouped data
ungrouped is used for smaller amounts of data grouped is used for larger amounts of data
spearman's can be used when ...
when assumptions for Pearson r have been violated
when do you interpret coefficients?
when model is statistically significant
formula for calculating regression line:
y = bX + a y = predicted value of y b = slope x = value of x for which one wants to find y a = y-intercept of regression line
APA format:
χ2(1, N = 50) = 3.93, p = .025
APA format of multiple regression:
APA: R^2 = ._ _ _, F (df) = F ratio, p = ._ _ _
Least Squares:
Best prediction is the one that yields the smallest errors between predicted outcomes and actual outcome the predicted model will show the least amount of error
Chi-Square test of independence:
Nonparametric test used to determine whether 2 or more samples of cases differ on a categorical (nominal or ordinal) level dependent variable Two independent variables
Single Sample T-Test:
Compares a sample mean to a known population mean when the population standard deviation is unknown COMPARES TWO MEANS OF SAMPLE AND POPULATION
Spearman's Rank Order Correlation Coefficient:
Nonparametric version of the Pearson correlation coefficient
larger the value of chi-square if:
Larger the difference between the observed and expected
why is it better to have multiple predictors?
Multiple predictors can allow us to get a better sense of the effect of our variables of interest = more accuracy = increasing R^2 (proportion of variance in DV that is accounted for by the predictors in the model)
Degrees of Freedom:
N-2 N = number of cases Not given on output, need to calculate
Mann-Whitney U Test:
Nonparametric test used to compare 2 independent samples on an ordinal-level dependent variable that utilizes ranking
Standard Error of the Estimate:
Standard deviation of the residual scores, a measure of error in regression
Parametric Test: examples:
Statistical test for use with interval or ratio-level dependent variables, and for which assumptions about the shape of the population must be met. (continuous variables) Must be normally distributed Assumptions about the distribution of the data examples: t, F, r statistics
Distribution of R:
Symmetric distribution centering near zero (similar to t) Bound from -1 to 1 (different from t)
Model Statistics: this yields what 3 results?
Tell us whether the model fits the data, is the model statistically significant R^2 , F, p value
what do scatterplots show us?
Tells us DIRECTION Tells us STRENGTH
coefficients of multiple regression is the sample as simple regression, but what do you write for slope?
The unique change in y for every one unit change in each x variable AND after controlling for other predictors in the model (you would write out each predictor)
when is simple linear regression used?
Used only with a statistically significant Pearson r
Independent Samples T-Test:
Used to compare the mean of one sample to the mean of another sample COMPARES TWO SAMPLES MEANS
what kind of model is ideal?
Want to have the simplest model that explains the most variance (use predictors that explain the variance the best)