Statistics Final Exam
When to use (think about) one-tailed tests?
-One sided tests are acceptable only if the outcome variable can change only in one direction -A smaller effect is statistically significant in the one-tailed test -Use two-tailed tests unless you have concrete reasons for using a one-tailed test Preregistration helps (You must not perform a two-tailed analysis, obtain non-significant results, and then try a one-tailed test to see if that is statistically significant)
If there is no variation in scores (X1 = X2 = X3... = Mx) what will SD be?
0
Box plots:
A boxplot is a standardized way of displaying the dataset based on a five-number summary: the minimum, the maximum, the sample median, and the first and third quartiles.
Histograms:
A histogram is the most commonly used graph to show frequency distributions.
The paired t-test vs. 1-sample t-test:
A paired t-test simply calculates the difference between paired observations (i.e., before and after), and then performs a 1-sample t-test on the differences.
Confidence interval:
A range of values within which we expect the true population value falls with (1-⍺) probability. It is up to us to decide what level of confidence we desire. In the field of psychology, the most common is the 95% confidence interval where alpha = .05 We can be 95% confident the CI contains the population parameter. This will be true in 19 out of 20 cases (in the long run) A confidence that in the long-run, when we sample many samples and get a confidence interval of whatever statistic we are measuring each time, 95% of those confidence intervals will include the population parameter
T-Test-
A t test is an inferential statistic used to determine if there is a significant difference between the means of two groups
Dependent variable (DV):
A variable thought to be affected by changes in an independent variable. Also called outcome variable.
Independent variable (IV):
A variable thought to be the cause of some effect.
Which of the following is NOT true about power in ANOVA? a. A separate power estimate is required for each omnibus effect. b. Power provides the probability of finding a significant result in a study assuming the null hypothesis to be false. c. Power can be examined as a function of effect size. d. Power is influenced by the chosen α-level for a study.
A. You don't need a separate power estimate for each omnibus effect. Omnibus tests are a kind of statistical test. They test whether the explained variance in a set of data is significantly greater than the unexplained variance, overall.
Z-scores:
Are simply the distance of a score from the mean in standard deviations. They allow you to compare things that are not on the same scale, as long as they are both normally distributed (ex. heights of people might range from eighteen inches to eight feet) The reason we use Z scores is that wide ranges make it difficult to analyze data, so we "standardize" the normal curve, setting it to have a mean of zero and a standard deviation of one. Z scores can be translated into percentages The larger the absolute value of the Z score the less likely the score is. This means a smaller percentage of scores are further away from the mean.
When do we use t-tests?
Are two sample means different? OR What is the probability that they are the same? OR What is the probability that two observed samples of data came from the same distribution? i.e. Do students learn more from video or live lectures? Is the transmission rate of COVID-19 different in countries above verse below the equator? Is there a difference in reaction times between two experimental conditions?
Assumptions about Population (ANOVAs):
Assumption of normality Assumption of homogeneity of variance
Why do we square the deviation scores when calculating SD?
Because the sum of the deviations from the mean always equals zero... And zero divided by anything always equals zero... And that is meaningless
Between-subject design vs. Within-subject design:
Between-subject: 1. One independent variable (IV) 2. Two groups: treatment vs. control 3. One dependent variable (DV) ->Did the mean DV significantly differ between the two groups? (is the difference in the measured outcome between the two groups likely to occur due to chance?) Within-subject: 1. One DV is measured (pre-test) 2. A treatment is applied, or time passes 3. The DV is measured again (post-test) ->Did the mean DV significantly differ between pre-test and post-test? (is the difference in the measured outcome from time1 and time2 likely to occur due to chance?)
t-distribution vs. normal distribution
Both smooth, symmetric, mean=0, BUT t-distribution has thicker tails and is used for smaller sample sizes
Variable types:
Categorical Quantitative
Independent Samples t test-
Compares two means based on independent data. E.g., data from different groups of people. ● When both samples are randomly selected, we can make inferences about the populations.
Dependent t-test-
Compares two means based on related data. E.g., Data from the same people measured at different times. Pre and post scores
Range:
Distance between the min score and max score
Unimodal:
Distribution has a single peak
Multimodal:
Distribution has two or more peaks Multiple modes often indicate distinct groups
Bimodal:
Distribution has two peaks
Categorical variables:
Entities are divided into distinct categories Nominal: there are two or more categories i.e., whether someone is an omnivore, vegetarian, vegan, etc. i.e., someone has cancer or not. i.e., someone is dead or alive. Ordinal: Similar but the categories have a logical order i.e. grades. i.e., freshman, sophomore, junior, senior.
Quantitative variables:
Entities get a distinct score Discrete: it results from counting i.e., number of people in the room. i.e., heartbeat. Continuous: results from: i.e., height. i.e., speed.
Effect size-eta squared vs. partial eta squared:
Eta squared is the effect size for ANOVA In non one-way ANOVA (i.e., 2-way), when we have multiple IVs, we might already know that some effect is due to some IV. We might not care about it much, and want to partial it out and see how much variability in the remainder is explained by our model (factor of interest). This differs from eta squared in that it looks not at the proportion of total variance that a variable explains, but at the proportion of variance that a variable explains that is not explained by other variables in the analysis.
ANOVA (One-way, Repeated measures, Factorial):
F(df1,df2) = ___, p=___. A small probability is obtained when the statistic is sufficiently large, indicating that the set of means differ significantly from each other.
Type I Error =
Finding a significant difference in the sample that actually doesn't exist in the population → Denoted ɑ (alpha)
Type II Error =
Finding no significant difference in the sample when one actually exists in the population → Denoted B (beta)
Calculate SD step-by-step:
For each individual score, calculate it's deviation from the mean Sum the squares of the deviation Divide by the total number of observations minus 1 to get variance Take the square root of the variance to get SD
Scatter plots:
Help us look at correlations, relationships of variables with one variable on the x and the other on the y axis.
When studying the effects of alcohol on motor coordination, one group of people (given a moderate dose of alcohol) is compared with another group (given no alcohol). IV: DV: Design: Test:
IV: Alcohol dosage; DV: Motor coordination; Design: Independent groups/between groups; Test: Independent samples/between groups t-test
A student who was taking driving lessons asked his driving instructor what number of lessons (on average) were taken by those who passed their driving test the first time. The instructor told him that the company average was 23 lessons. The student decided to see how this average compared to a sample of 6 of his friends. IV: DV: Design: Test:
IV: Friends vs. population DV: Number of lessons Design: Single sample Test: Single sample t-test
Identify the IV, DV, experimental design, and the appropriate statistical test: a. When studying the effects of a new memory enhancing drug on the memory test scores of Alzheimer's patients, a group of patients is tested before and after administration of the drug. IV: DV: Design: Test:
IV: Time (pre and post drug administration) DV: Memory test scores Design: Repeated measures Test: Repeated measures t-test
Central Limit Theorem:
If you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement continuously, then the distribution of the sample means will be approximately normally distributed!
Example of Cohen's d used in a study:
Imagine we did a study in which we got two groups of 10 heterosexual young men to go up to a woman that they found attractive and either engage them in conversation (group 1) or sing them a song (group 2). We measured how long it was before the woman ran away. ● If we wanted to quantify the effect between the singing and conversation groups, how might we do it? A fairly simple thing to do would be to take the differences between means: ○ The conversation group had a mean of 12 minutes (before the woman ran away), and the singing group a mean of 10 minutes. ○ So, the effect of singing compared to conversation is 10-12 = -2 minutes. This is an unstandardized effect size. ○ Singing had a detrimental effect on how long the woman stayed, by -2 minutes. ● That's fairly easy to compute and understand, but it has two small inconveniences: ○ First, the difference in means will be expressed in the units of measurement for the particular study. In this particular example, this inconvenience isn't really an inconvenience at all because minutes mean something to us: we can imagine what an extra 2 minutes of time with someone would be like. We can also have an idea of what 2 minutes with someone is like relative to the amount of time we usually spend talking to random people. ○ However, if we'd measured what the women thought of the men rather than how much time they spent with them, then interpretation is more tricky: 2 units of 'thought' or 'positivity' or whatever is less tangible to us than 2 minutes of time. ○ Second, although the difference between means gives us an indication of the effect, it does not tell us about the variability in the measure. Is 2 minutes of time a lot or a little relative to the 'normal' amount of time spent talking to strangers? ● We can remedy both of these problems by dividing the difference between means by the standard deviation to get a standardized effect size. This is Cohen's d
Assumptions for independent t-test and ANOVA:
Independence: -data from different participants are independent, which means that the behavior of one participant does not influence the behavior of another -cases must be a member of only one of the two groups (otherwise, it is a dependent test). Normally distributed data Homogeneity of variance
All ANOVAs have two things in common:
Independent variable(s) Measured using either the same or different participants If the same participants are used we typically use the term repeated measures and if different participants are used we use the term independent.
T distribution-
Normal distributions are used when the population distribution is assumed to be normal. The T distribution is similar to the normal distribution, just with fatter tails. Both assume a normally distributed population. Think of it as a distribution for the NULL hypothesis.
How do different ANOVA's measure IVs?
One-way->One IV measured using different participants Two-way->Two IVs measured using different participants Two-way repeated-measures ANOVA->Two IVs both measured using the same participants Two-way Mixed ANOVA->Two IVs, one measured using different participants and one measured using the same participants Three-way->Three IVs all of which are measured using different participants
the only equation you will need to memorize:
Outcomei = (Model) + errori
Skew:
Positive skew or negative skew - the data is pulled towards one side of the distribution. Not normally distributed.
What is a hypothesis?
Prediction of the study. There can be many different hypotheses in one study: H1 Experimental Hypothesis: A statement about how the world works, which your experiment is designed to test. H2 Alternative Hypothesis: Another statement about a specific alternative way that the world works.. H3 Alternative Hypothesis:.... H4 Alternative Hypothesis:.... H0 Null hypothesis: What we would assume to be true if we don't find support for our hypotheses. (no difference, no effect etc...)
Interquartile Range:
Quartile 3 - Quartile 1. The middle 50% of the data. (example in study guide).
The shape of the t-distribution is determined by?
Sample size (Smaller samples, Thicker tails) There are about 100 t-distributions (for sample sizes 100+ the t-distribution is more-or-less a normal distribution)
Power on SALE (how to remember what you need to do to increase power):
Sample size - increase it Alpha level - increase it (but you risk Type I Error doing so) Larger effects - focus on larger effects Error variance - decrease it
Different ways to plot data:
Scatter plot Histogram Box Plot
Common methods of measurement:
Self-report (surveys) Naturalistic Observation Performance based Case studies
General approach in null hypothesis testing:
Specify the hypothesis (null, alternatives) Specify significance level (alpha) Calculate the test statistic -Z score -Pearson r -T-value Look up the appropriate probability for the statistic Accept/reject the null hypothesis {In every case, we are using the mean, standard deviation, and sample size to calculate a test statistic, and look up the probability associate with that test statistic}
Which measure of variance is most influenced by outliers?
Standard deviation For normal distributions, all measures can be used. The standard deviation and variance are preferred because they take your whole data set into account, but this also means that they are easily influenced by outliers. For skewed distributions or data sets with outliers, the IQR is the best measure.
Spread/Diversity:
Standard deviation (SD or s) - Each data point can be seen as a deviation from the mean (X-M). Think of the standard deviation as the average or typical deviation from the mean. Sensitive to outliers. ● Small SD means the mean is a good representation of the data. ● Large SD means the mean is not a good representation of the data.
Types of ANOVAs:
The One-way Groups Design (ANOVA) The Two-way Groups Design (ANOVA) Repeated Measures ANOVA
Reliability: (test-retest vs. inter-rater)
The ability of the measure to produce the same results under the same conditions. Test retest reliability: does the measure produce the same results each time it is taken by the same individual? Inter-rater reliability: Do two raters agree on the same score for the measure. If so, their two scores would be highly correlated and there would be strong inter-rater reliability
Useful definition of power =
The degree to which we can detect treatment effects (includes main effects, interactions, etc.) when they exist in the population Power analyses put the emphasis on researchers' ability to find effects that exist, rather than on the likelihood of incorrectly finding effects that don't exist
Measurement error:
The discrepancy between the actual value we're trying to measure, and the number we use to represent that value
"Technical" definition of power =
The probability of correctly rejecting a false H0 mathematically works out to 1 - B where B = Type II Error = probability of accepting false H0).
What is the width of your CI determined by?
The width of your CI is determined by sample size, variability within your sample, and the p-value obtained from your inferential analysis
Alternative hypothesis:
There is a difference between the population mean from which group A is sampled and the population mean from which group B is sampled
Alternative hypothesis for a t-test:
There is a difference between the population mean from which group A is sampled and the population mean from which group B is sampled.
Null hypothesis:
There is no difference between the population mean from which group A is sampled and the population mean from which group B is sampled
Null hypothesis for a t-test:
There is no difference between the population mean from which group A is sampled and the population mean from which group B is sampled.
One Sample t Test:
To compare a single sample mean to a population mean when the population standard deviation is not known
Factorial ANOVA:
To compare four or more groups defined by a multiple variables in a factorial research design.
One-Way ANOVA:
To compare two or more sample means when the means are from a single-factor between-subjects design.
Repeated Measures ANOVA:
To compare two or more sample means when the means are from a single-factor within-subjects design.
Paired Samples t Test:
To compare two sample means when the samples are from a single-factor within-subjects design.
Effect size:
To what degree is the Null hypothesis false? How much of the DV can be controlled, predicted, or explained by the IV?
Different shapes of the distribution:
Unimodal Bimodal Multimodal Skew
Test tails: one or two?
We frame alternative hypothesis as nullA does not = nullB -ex. Average income of males is not equal to average incomes of females But sometime we nullA>nullB or nullA<nullB -ex. Average income of males is larger than average income of females -ex. Average income of males is smaller than average income of females
Summary:
We have two populations: 1 & 2 Take samples 1 & 2 from the two populations To compare their means: -calculate the mean difference -divide by SE *for unpaired: use pooled standard deviation when calculating SE *for paired: use standard deviation of the mean difference to calculate SE of difference -Get t-value -Find the P value for the t-value *this is the probability of finding mean difference we got (or larger), assuming the two populations from which we drew our sample have identical means
Validity:
Whether an instrument measures what it set out to measure
Can you tell if the result is significant by merely looking at the CI? How?
Yes, you can tell if the result is significant if 0 falls within the CI. If it does, this means the true population mean could possibly be = to 0, and therefore not significant. If you were to run the experiment again, there would be a good chance of finding no correlation.
A researcher conducts a study whereby the IVs of motivation (low, medium, high) and prior skill level (low, medium, high) are crossed with one another. This design would: a. Be a two-way between groups factorial design. b. Contain 9 factors. c. Be a 2 x 3 between participants factorial design. d. Have 6 cells.
a.
A researcher conducts a study whereby the IVs of motivation (low, medium, high) and prior skill level (low, medium, high) are crossed with one another. This design would: a. Be a two-way between groups factorial design. b. Contain 9 levels of the IVs. c. Be a 2 x 3 between participants factorial design. d. Have 6 cells.
a.
If a histogram of your data has a single mode but is not symmetrical around this mode, then you have: a. Skew b. Leptokurtosis c. Platykurtosis d. Multi-modality
a.
In a between groups ANOVA, which of the following is NOT true of the F-ratio? a. It assesses the distribution of group means around the grand mean relative to the distribution of individual scores around the group mean. b. Larger treatment terms will always produce larger F values. c. It is calculated as MS treatment / MS error. d. It is the ratio of between groups variance to within groups variance.
a.
In a repeated measures ANOVA, SSerror can contain: a. The experimental (residual) error b. Participant (individual) differences c. The participant effects d. All of the above
a.
In a within participants factorial design, individual differences are calculated through: a. Error variance. b. Between treatment variance. c. Between-participant variance. d. Within participant variance.
a.
In within participants designs, the treatment factors are __________, while participants are __________. a. Random factors; a fixed factor. b. Within participants variance; between participants variance. c. Fixed factors; a random factor. d. Both (b) and (c).
a.
The treatment effect for a 1-way between groups and 2-way within participants factorial design is: a. Between groups variance and within groups variance, respectively. b. Within groups variance and within groups variance, respectively. c. Between groups variance and between participants variance, respectively. d. Between groups variance and between treatments variance, respectively.
a.
What figure would help show the results of a correlation best? a. Scatterplot b. Boxplot c. Violin graph d. None of the above
a.
What is the key factor in deciding whether to perform an independent groups t-test versus an independent groups ANOVA? a. Number of levels for the IV b. Whether I have the population variance or just the sample variance c. I want to increase my power d. I want to infer something back to the population of interest
a.
Which are the symbols for population variance and standard deviation of a sample? a. σ2, s b. σ, s2 c. μ, X d. μ, s
a.
Which of the following is NOT an assumption of between groups ANOVA? a. Categorical scaling for the DV (i.e., groups). b. Homogeneity of variance. c. Independence of observations. d. Normality of treatment populations.
a.
Which of the following measures of central tendency typically has the smallest value in a positively-skewed distribution? a. Mode b. Standard Deviation c. Median d. Mean
a.
Which of these describes SStotal? a. It represents how much each person's raw score differs from the average score for all participants (i.e., the grand mean). b. It represents how much each condition's mean differs from the mean for all the conditions together. c. It represents how much each person's score in one condition differs from the mean score for that condition. d. It represents how much the mean of each participant's scores across all the conditions differ from the grand mean.
a.
Which of these is an accurate description of the plotted results? **(look in study guide for graph) a. No main effect of type of day, but a main effect of weather qualified by an interaction. b. No main effects, but a significant interaction c. No main effect of weather, but a main effect of type of day qualified by an interaction d. A main effect of type of day and a main effect of weather, but no interaction
a.
A _______ is an example of a descriptive statistic, while a _______ is an inferential statistic. a. mean, t-statistic b. mean, standard deviation c. t-statistic, variance d. t-statistic, correlation coefficient
a. Descriptive statistics summarize the characteristics of a data set. Inferential statistics allow you to test a hypothesis or assess whether your data is generalizable to the broader population.
Hypothesis testing focuses more on __________ while power analysis focuses more on __________. a. Type 1 error; type 2 error. b. Type 2 error; type 1 error. c. Evaluating probabilities in the population; evaluating probabilities in the given sample. d. Both (a) and (c).
a. The power of a hypothesis test is the probability of not committing a Type II error.
You are reading a paper and they report the correlation between conscientiousness and grades as r(18) = .48. With your awesome knowledge of statistics, what can you conclude? a. As conscientiousness increases, grades increase. b. As conscientiousness increases, grades decrease significantly. c. There is no significant relationship. d. You do not have enough information to determine anything
a. +/-0.1=small+/-0.3=medium+/-0.5=large
Compared to a between-groups ANOVA design with the same N, a within-groups ANOVA design should deliver: a. Lower Type 1 error. b. Higher Type 1 or Type 2 error (depending on alpha). c. Lower Type 2 error. d. No expected difference in error rate.
a. within-subjects has greater power than a between-subjects design because individual variation has been removed. (so why not type 2 error?)
In a normal distribution, approximately what percentage of scores is between one-half a standard deviation below the mean and two-thirds of a standard deviation above the mean? a. 44.01% b. 56.31% c. 43.69% d. 55.39% e. 50%
a.(askta)
State whether each of the following statements is true or false. a. Although we typically calculate a 95% CI, we can calculate CIs for any level of confidence desired. T/F b. Compared to a 95% CI, a 99% CI will be longer. T/F c. For a given level of confidence, we find the z score corresponding to our selected percentage of cases in the normal distribution, then use that z to help calculate the MoE. T/F
a.T b.F (well, wider) c.T A margin of error tells you how many percentage points your results will differ from the real population value.
A 2-way between groups factorial design was used to investigate the effects of caffeine dosage (zero, small, moderate, large) and amount of sleep (0-4 hours, 4-8 hours, 8+ hours) on driving reaction times. What would be the degrees of freedom for caffeine dosage, amount of sleep and the Dosage x Sleep interaction, respectively? a. 3, 2, 6 b. 3, 2, 11 c. 4, 3, 11 d. 4, 3, 12
b.
A key difference between a within participants and between groups factorial design is: a. Within participants designs include individual differences in the error term, while between groups designs partial individual differences out of the error term. b. Within participants designs assess all within groups variability as treatment variance, while between groups designs evaluate all between groups variability as treatment variance. c. Both (a) and (b). d. None of the above.
b.
A true experimental design is one where: a. A correlation must be used to analyse the results. b. Random assignment to study conditions is used. c. No inference of causality can be made. d. Naturalistic data is gathered and then categorised to form experimental groups.
b.
A type I error and a type II error, respectively, are when the researcher: a. retains a false null hypothesis; rejects a true null hypothesis b. rejects a true null hypothesis; retains a false null hypothesis c. has a confound affecting the study; has biased samples d. has biased samples; has a confound affecting the study e. sets power levels too low, sets power levels too high
b.
Effect size estimates are useful because: a. The magnitude of the effect size will change with N. b. Any effect will be significant if you have a large enough sample size, but this doesn't necessarily indicate that it is important. c. The p-statistic provides a key indication of the applied value of a finding. d. Both (b) and (c).
b.
I want to know if a group differs significantly from the population. I know the population mean, as well as the group mean, variance, and sample size. What test should I run? a. A paired samples t-test. b. A one sample t-test. c. A z-test. d. B or C.
b.
If a relationship, or difference, between variables is statistically non-significant, it means that: a. the null hypothesis is false b. the null hypothesis is true c. the alternative hypothesis is false d. the relationship, or difference, is unlikely to be real in the population e. the relationship, or difference, is likely to be real in the population
b.
If a researcher sets the alpha level at 5% and finds a statistically significant result, it is most likely that: a. p = .05 b. p < .05 c. p > .05 d. the researcher has made a type I error e. the researcher has made a type II error
b.
In a one-way ANOVA, which of the following is an estimate of the variation between individuals' scores and the mean for their group/condition? a. MStreatment b. MSerror c. MStotal d. MSbetween
b.
In between groups ANOVA, the F-value is: a. Calculated as SS treatment / SS error. b. A ratio of treatment to error variance. c. Used to determine the significance of follow-up main and simple comparisons. d. Calculated for all main effects, interactions and error.
b.
We use a measure of sample characteristics called a _______ to infer the characteristics of a population _______. a. parameter, statistic b. statistic, parameter c. mean, distribution d. size, median
b.
What is the best way to describe "between-groups variance" (BG variance) and "within-groups variance" (WG variance)? a. BG variance describes the distribution of individual scores around the group means, whereas WG variance describes the distribution of individual scores around the regression line b. WG variance describes the distribution of individual scores around the group means, whereas BG variance describes the distribution of group means around the grand mean c. BG variance measures error variance and WG variance measures systematic treatment variance
b.
What percentage of scores fall below the 23rd percentile? a. 22% b. 23% c. 0.23% d. 2.3%
b. For instance, If the children are in the 23th percentile for height, the child will exceed 23% of children their age in height.
Which of the following factors does NOT increase power? a. Decreased error variance. b. Increased α-level. c. Increased sample size. d. Increased magnitude of the treatment effect.
b.* (ask ta) The greater the error variance (or the standard deviation), the less the power. As the sample size gets larger, the z value increases therefore we will more likely to reject the null hypothesis; less likely to fail to reject the null hypothesis, thus the power of the test increases. none of them?*
If r = .50, the proportion of variance shared between the two variables is: a. .50 b. .25 c. .75 d. 0
b.; square r to find proportion of variance in correlations "Proportion of variance" is a generic term to mean a part of variance as a whole. For example, the total variance in any system is 100%, but there might be many different causes for the total variance — each of which have their own proportion associated with them.
A researcher is interested in the effects of IQ (below average, average, above average) and job demands (low, high) on intellectual job performance. She expects that when job demands are high, higher IQ will lead to better job performance due to the individual possessing the necessary cognitive resources to complete the work successfully. In contrast, when job demands are low, higher IQ is anticipated to lead to lower job performance due to boredom setting in and cognitive attention being diverted away from the tasks at hand. What omnibus test results are needed to lend support to these predictions? a. Main effect of IQ. b. Main effect of IQ, main effect of job demands. c. IQ x Job Demands interaction. d. Main effect of IQ, main effect of job demands, IQ x Job Demands interaction.
c.
According to the structural model for 2-way ANOVA, a participant's score is a function of ___ elements. a. 3 b. 4 c. 5 d. 6
c.
Effect sizes in a between groups factorial design: a. Can be compared across studies to determine the most important treatment effect. b. Are expressed in terms of variance. c. Both (a) and (b). d. Neither (a) nor (b)
c.
If a 2-way between groups ANOVA reveals a significant main effect of both IV 1 and 2 but no IV1 x IV2 interaction, this indicates that: a. The effect of IV 1 differs across levels of IV 2. b. The effect of IV 2 differs across levels of IV 1. c. Both (a) and (b). d. The group means for IV 1 are the same at each level of IV 2
c.
In a 1-way between groups ANOVA, if the between groups variance is 0, this would indicate: a. The F-statistic could not be significant (i.e., < 1). b. All participants provided the same score in their respective treatment groups, which was the group mean. c. There was no error variance. d. The null hypothesis is rejected.
c.
In a between groups factorial design, the treatment and error terms: a. Are used to infer causality. b. Are calculated as the average sums of squares of group means from the grand mean, and the average sums of squares of cell means from the grand mean, respectively. c. Represent systematic and unsystematic differences in scores, respectively. d. Are the within groups variance and between groups variance, respectively
c.
The variance in a 2-way ANOVA is partitioned into ___ sources. a. 2 b. 3 c. 4 d. 5
c.
When considering a within-subjects design, which of the following problems would be the hardest to solve? a. Error variance b. Statistical assumptions (violations of compound symmetry) c. Methodological issues (sequencing effects) d. Sample size
c.
Which of the following is NOT an assumption of within participants factorial designs? a. Normality of the population distribution. b. Homogeneity of variance. c. Independence of observations. d. Compound symmetry of the factor variance-covariance matrix.
c.
Which of the following is not an advantage of non-parametric tests? a. They can analyze severely skewed data b. They can be used when you have a non-normal distribution due to a small sample c. They are generally more sensitive d. They reduce the effects of extreme outliers
c.
With an alpha level set at 5%, rejecting the null hypothesis means that: a. the null hypothesis is false b. the null hypothesis is true c. there is less than a 5% chance that the null hypothesis is true d. the alternative hypothesis is true e. any of the above could be true
c. An α of 0.05 indicates that you are willing to accept a 5% chance that you are wrong when you reject the null hypothesis.
The Null Hypothesis (H0) states: a. There is a relationship between the IV and DV. b. There is a relationship between the IV and DV but it is not a causal relationship. c. The obtained results are what we would expect if chance is the only factor at play. d. None of these.
c. In general the null hypothesis states that there is no change, no difference, no effect, and otherwise no relationship between the independent and dependent variables.
Which of the following is not an example of a non-directional conceptual hypothesis? a. The mean finger tapping rate is significantly different between first-year psychology students and the population of neurologically intact people. b. Time spent daydreaming (in minutes) does differ between this class and the general population. c. Alcohol consumption will have a significant negative impact on driving ability as measured by the driving simulator task. d. Appreciation of statistics-related humour is significantly different between Intro to Stats students and the general population.
c. Sometimes called a two-tailed test, a test of a nondirectional alternative hypothesis does not state the direction of the difference, it indicates only that a difference exists.
Assuming there is a linear relationship between the number of alcoholic drinks consumed and the number of errors committed in the driving simulator, what is the most appropriate correlation analysis for this data? a. Spearman's Rho b. Galton's G c. Pearson's r d. t-test e. it is not possible
c. Spearman's Rho is used to understand the strength of the relationship between two variables. Your variables of interest can be continuous or ordinal and should have a monotonic relationship. The difference between the Pearson correlation and the Spearman correlation is that the Pearson is most appropriate for measurements taken from an interval scale, while the Spearman is more appropriate for measurements taken from ordinal scales. Pearson correlation coefficient is the most and widely used. which measures the strength of the linear relationship between normally distributed variables.
Why is the analysis of variance (ANOVA) an omnibus test? a. It sets a .05 level of significance (alpha) b. The IV has more than 2 levels c. It analyzes all the scores in the dataset d. It guards against familywise error
c. Omnibus means that the test tells us, overall, if variance explained by the model is proportionally greater than the error variance - this is our F ratio (or 'F statistic/value')
Imagine you are conducting an independent sample/between groups t-test with 12 participants in condition A and 13 participants in condition B. If your alpha level is set at .05, and your obtained critical t value = −2.96, would you accept or reject the null hypothesis? a. Accept b. Not enough information c. Reject d. The statistician must have made a mistake because you can't have a negative t value.
c. (why?)
A Strength of Political Affiliation x Group Size interaction on behavioral intentions to join a political protest was found to be significant. This suggests that: a. The effect of political affiliation strength on intentions to participate in a political protest changes depending on group size. b. The influence of group size on intentions to participate in a political protest varies according to political affiliation strength. c. Any main effect of strength of political affiliation or group size would be better explained by the interaction. d. All of the above.
d.
A Strength of Political Affiliation x Group Size interaction on behavioral intentions to join a political protest was found to be significant. This suggests that: a. The effect of political affiliation strength on intentions to participate in a political protest changes depending on group size. b. The influence of group size on intentions to participate in a political protest varies according to political affiliation strength. c. Any main effect of strength of political affiliation or group size would be better explained by the interaction. d. All of the above.
d.
A researcher interested in binge spending gives participants who are high vs low in impulsivity $10 and tells them they can choose to purchase objects from the experimenter's "store" or keep the money and take it home. He exposes them to high vs low temptation by offering brand name vs no name products. The amount in $ spent is shown below. Assume that there is no error of measurement and that all observed differences on the graph are significant. Which of the following statements is FALSE about the results shown in the graph? **(look in study guide) a. There is no significant interaction. b. There is a main effect of impulsivity. c. There is a main effect of temptation. d. There is a significant simple effect of impulsivity for low temptation.
d.
Engaging sophisticated study methodologies is most relevant to which technique aimed at increasing a study's power? a. Focusing on large effect sizes. b. Increasing the sample size. c. Increasing the α-level. d. Decreasing the error variance.
d.
Error in a between groups factorial design consists of: a. Random/ chance variation. b. Individual differences. c. Unmeasured variables in the design. d. All of the above.
d.
If the null hypothesis is false and the researcher declares a relationship in the data to be statistically significant in the opposite direction to the direction in the population, which of the following has occurred? a. Type I error b. Type II error c. Type III error d. Correct decision e. None of the above
d.
If the null hypothesis is true, then the power of the test will be affected by: a. the significance level b. the sample size c. the effect size d. all of the above e. none of the above
d.
In a 2-way between groups ANOVA, results revealed a main effect of Anxiety, F(2, 135) = 4.21, p = .012, and a main effect of Amount of Study, F(4, 135) = 5.18, p = .008, on students' final exam marks (out of 50). How many levels of Amount of Study, overall study conditions and total number of participants were there, respectively? a. 4, 8, 135 b. 4, 8, 150 c. 5, 15, 135 d. 5, 15, 150
d.
In a 2-way between groups factorial design, which of the following is NOT true: a. All omnibus and follow-up tests are assessed against a pooled error term. b. Variance from the 3 original omnibus tests adds to give the total treatment variance. c. There is an assumption of independence of observations. d. Individual differences are partitioned out of the error term.
d.
In the long run, we expect about 95% of CIs to capture the population mean. However, this depends on the assumption that... a. The data have not been selected in a way that could bias the sample mean. b. Random sampling was used. c. The variable measured is normally distributed. d. All of the above.
d.
Janet has a z-score of 1 on her most recent exam. This means... a. That her percentile score will be > 50% b. That she has scored exactly 1 SD better than the mean c. That she has done better than average d. All of the above e. None of the above
d.
The research question posed is "the effect of sleep deprivation on driving ability is qualified by stimulant ingestion (e.g., consuming coffee)". Which omnibus test(s) is/are required to confirm this? a. Main effect of sleep deprivation. b. Main effect of stimulant ingestion. c. Sleep Deprivation x Stimulant Ingestion interaction. d. Both (a) and (c).
d.
What is another name for the standard (or standardized) score? a. Standard deviation b. Percentile c. t-score d. Z-score
d.
Which of the following is NOT one of the advantages of using a 2-way between groups factorial design over two separate 1-way designs assessing the same two IVs as the 2-way? a. It requires fewer participants overall to achieve the same power. b. It reduces the size of the error term. c. It can examine the two factors simultaneously in the same study, as well as assess for an interactive effect. d. It can examine the generalisability of the effects of both factors.
d.
Which of the following is NOT true of the error in a within participants design? a. It is the deviation of individual scores from their overall individual mean. b. It is the unsystematic effect of the treatment across participants. c. It is the interaction of the within factor and participants. d. It is the degree to which the treatment effect changes depending on the participant completing the treatment condition.
d.
Which of the following is true of the independent (IV) and dependent (DV) variables? a. The DV is manipulated by the experimenter and the IV is out of the experimenter's control. b. Both the IV and DV are under the experimenter's control. c. The experimenter changes the DV to see what happens to the IV. d. The IV is manipulated and the DV is out of the experimenter's control.
d.
Which of the following represents the strongest relationship between X and Y? a. +.90 b. -.01 c. +.50 d. -.96
d.
You conduct a study with two groups (control and treatment). You find: Mdiff = 0. From this, which of the following must be true? a. Mcontrol = Mtreatment b. The two groups did exactly the same (on average) c. The CI for Mdiff will include 0 d. All of the above
d.
A researcher conducts a study with 120 participants, making use of α = .05 and β = .30. What is the probability that she will find a significant result if it exists? a. .30 b. .35 c. .65 d. .70
d. 1-β = power (the probability of finding an effect if there is an effect) Alpha is also known as the level of significance. This represents the probability of obtaining your results due to chance. Beta (β) refers to the probability of Type II error in a statistical hypothesis test. In other words, it is the probability of of accepting the null hypothesis when it's false. Beta is directly related to the power of a test. Power relates to how likely a test is to distinguish an actual effect from one you could expect to happen by chance alone. Beta plus the power of a test is always equal to 1.
A researcher runs a study and finds non-significant results. A subsequent power analysis revealed that the researcher had sufficient power. This suggests: a. A type 1 error occurred. b. A type 2 error occurred. c. The anticipated treatment effect does not exist in the sample. d. Both (b) and (c).
d. (ASK TA...Would'nt it just be C.?) Statistical power is the probability of a hypothesis test of finding an effect if there is an effect to be found. A power analysis can be used to estimate the minimum sample size required for an experiment, given a desired significance level, effect size, and statistical power.
A linear model
i.e. Outcomei = b0+b1*groupi+errori Most of the models that we use in psychology to describe data tend to be linear models -correlation -T-test -ANOVA -Regression A linear model is simply a model that is based upon a straight line -we are trying to summarize our observed data in terms of a straight line
What does 'due to chance' indicate?
i.e. is the difference in the measured outcome between the two groups likely to occur due to chance? It indicates that we assume there is no difference between the means to begin with...what is the probability of observing the mean difference we find given that we don't expect a difference?
Where do specific types of t-test do?
independent t-test: -compares two means based on independent data -i.e., data from different groups of people Dependent t-test: -compares two means based on related data -i.e., data from the same people measured at different times -data from 'matched' samples Significance testing: -testing the significance of Pearson's correlation coefficient -Testing the significance of b in regression
Normal Distribution:
is a symmetrical bell-shape with most data points near the center. It can be concisely described by only two parameters: Mean and SD. Please note that Mean and SD vary across different data and samples, which results in narrower or wider curves.
Which measure of central tendency is most influenced by outlier?
mean
Central tendency:
mean: - The average of the data points. Highly sensitive to outlier mode: - Most frequently occurring data point. Not sensitive to outliers median: - the middle point in a distribution, the score at which half of the responses are above and the other half below- Splits the sample in half. Not sensitive to outliers.
Calculating between groups variance:
n∑(X bar j - X bar dot)2 = participants per group x sum of squared differences between group means and grand mean = estimate of between groups variability
A wider CI means
our estimate of the true population parameter is less precise
A narrower CI means
our estimate of the true population parameter is more precise
Your CI will exclude the null value (i.e., zero), if (and only if)
p < .05 (i.e., if your test is significant)
The t-test for one sample measured twice is called?
paired or dependent t-test
Power:
probability of detecting an effect, if there is a true effect present to detect. In Null Hypothesis Testing, -we specify a significance levels alpha -If p-value <alpha we infer the effect exists i.e., if the probability of finding the results is less that alpha (assuming null hypothesis to be true), then we have a significant effect
t-value is sensitive to?
sample size
Effect size (r):
small: .10 medium: .30 large: .50
Effect size (d):
small: .20 medium: .50 large:.80
Rationale for the t-test:
t = (observed difference between sample means - expected difference between population means (if null hypothesis is true)) / (estimate of the standard error of the difference between two sample means)
T test (One Sample, Independent Samples, Paired Samples)
t(df)=___, p = ___. A small probability is obtained when the statistic is sufficiently large, indicating that the two means significantly differ from each other.
Some intuition:
t-statistic is like Z-score for means (i.e., the distance between means in SE units) t=(observed mean-test value)/(standard error of the mean) The t-statistic ratio compares how far away our sample mean is from the null-hypothesis mean to how far we might expect a sample mean to be, given the variability demonstrated in our sample (SE)... Again, distance from the mean, in SD (in this case SE which is the SD of sampling distribution) Larger t-statistics indicate that our sample mean is (more) extreme, compared to what we would expect to observe if the null hypothesis is true. Like before, only the 5% most extreme of means do we consider to be unlikely to occur by chance, and thus use the same rule: if p<.05, accept.
The larger your sample size (N),
the narrower your CI will be, by default
R2 (Coefficient of Determination):
the variance in y accounted by x
The more variability within your sample,
the wider your CI will be
Independent Samples t Test:
to compare two sample means when the samples are from a single-factor between-subjects design
The t-test for two different samples is called?
unpaired or independent t-test
(Subjective) rules of thumb to describe size of correlation:
±0.1 = small ±0.3 = medium ±0.5 = large
Calculating within groups variance:
∑(X ij - X bar j)2 = sum of squared differences between individual scores and group mean = estimate of within groups variability
Overall point of power:
● Aim = To achieve .80 power (minimum optimal level) → 80% chance that you will find a significant effect, if it exists in the population ● Can conduct a priori power analyses before you run your study → If you want to know how many participants you need in your study, in order to find the effects you're looking for ● You can also conduct post-hoc power analyses after your study to help strengthen your argument: → If you didn't find significant results, but you wanted to: can argue that you didn't have sufficient power → If you didn't find significant results, and that's good: can argue that you did have sufficient power
Assumption of homogeneity of variance:
● Assumption of Homogeneity of Variance: Treatment populations have the same variance
Assumption of normality:
● Assumption of Normality: Treatment populations are normally distributed
The Two-way Groups Design (ANOVA):
● Basically the same with One-way Groups Design, but we have two independent variables with 2 or more levels
How variance is partitioned in a between-groups TWO-WAY ANOVA:
● Between-groups variance (i.e., treatment variance) from the one-way ANOVA is partitioned into 3 further components: → Variance due to the first factor (IV1) → Variance due to the second factor (IV2) → Variance due to the INTERACTION (IV1*IV2) ● ERROR variance is still anything that cannot be accounted for by the between groups variance (i.e., error is WITHIN GROUPS variance) ● The ERROR TERM is pooled & used to calculate an F-ratio for each effect (i.e., any main effects, interactions & follow-up tests) → So we use the same error term for every omnibus & follow-up test in a between groups ANOVA
T-test Steps:
● Calculate the mean difference ● Standardize it (i.e., convert to a format that allows comparison with other distributions for which we have probabilities) ● Conduct significance test (obtain a p-value to see if your difference was due to chance or due to a real difference in the population) this consists of comparing an observed average difference (mdiff) to the t-distribution to see whether or not both samples came from the same distribution. ● Or look up the critical t-value for the degrees of freedom (comes from sample size. N-1) for your sample with the relevant level of significance (alpha) and see if your t-value is above the critical value (this means your p-value would be smaller than the alpha level desired- usually .05 in psychology)
Cohen's d:
● Cohen's d is a standardized effect size for differences between group means. ● For the unstandardized effect size, you just subtract the group means. ● To standardize it, divide that difference in group means by the standard deviation. ● It's an appropriate effect size to report with t-tests.
Conceptual underpinnings of ANOVAs:
● Estimate of between-groups variability ● Estimate of within-groups variability ● Weight each variability estimate by # of observations used to generate the estimate ("degrees of freedom") ● Compare ratio (for one-way ANOVA) ○ ● When the F ratio is > 1, the treatment effect (variability between groups) is bigger than the "error" variability (variability within groups). Or more specifically: ○ The sum of the squared differences between the group means and the grand mean x the number of people in each group, divided by the number of groups minus 1, is bigger than the sum of the squared differences between the observations and the group means, weighted by the number of observations in each group minus 1 x the # of groups
What you need to know to describe a correlation:
● Form of the relationship (linear vs. non-linear - hint: shouldn't be using r if non-linear!) ● Direction of the relationship (positive, negative, or zero) ● Strength and size of the relationship (weak, strong)
Covariance:
● How much each score deviates from the mean. ● If both variables deviate from the mean by the same amount, they are likely to be related
Why you might care about power:
● I have done a study and I want to report the power of my significant effect. → Need to calculate observed power (post hoc power) ● I have done a study and did not find a significant effect. But I know the mean difference exists in the population. How can I increase power to detect it? → Need to calculate predicted power (a priori power) ● I am designing a study. I want to be sure that I have enough power to detect my predicted effects. → Need to calculate predicted power (a priori power)
Assumptions about Sample (ANOVAs):
● Independence: Samples are independent, no two measures are drawn from the same participant ○ The exception to that is Repeated Measures ANOVA. Then, we have the Sphericity assumption, which is the condition where the variances of the differences between all combinations of related groups (levels) are equal.
Partitioning of Variance:
● Like most statistical procedures we use, ANOVA is all about partitioning variance ● We want to see if variation due to our experimental manipulations or groups of interest is proportionally greater than the rest of the variance (i.e., that is not due to any manipulations etc) ● Do participants' scores (on some DV) differ from one another because they are in different groups of our study, more so than they differ randomly and due to unmeasured influences?
ANOVA is an omnibus test used to compare three or more sample means:
● Omnibus means that the test tells us, overall, if variance explained by the model is proportionally greater than the error variance - this is our F ratio (or 'F statistic/value') ● To put it another way, the ANOVA looks for any difference (anywhere) between the means, but not where those differences lie - that's why we need to follow-up a significant ANOVA/F test with pairwise comparisons (i.e., t-tests). The only difference between the t tests we do as pairwise comparisons and regular t tests is that we use corrections when we do pairwise comparisons. These corrections are corrections for familywise errors. ● At its core, we want to see if variation due to our experimental manipulations or groups of interest is proportionally greater than the rest of the variance (i.e., that is not due to any manipulations etc) when we do an ANOVA.
Overview of Effect Sizes:
● One of the problems we identified with null hypothesis significance testing (i.e., relying on a p value alone) was that significance (p) does not tell us about the importance of an effect. ● The solution to this criticism is to measure the size of the effect that we're testing in a standardized way. ● When we measure the size of an effect (be that an experimental manipulation or the strength of a relationship between variables) it is known as an effect size. ● An effect size is simply an objective and (usually) standardized measure of the magnitude/strength of the observed effect. ● The fact that the measure is standardized just means that we can compare effect sizes across different studies that have measured different variables, or have used different scales of measurement (so an effect size based on speed in milliseconds could be compared to an effect size based on heart rates) - just like with z-scores!
Power as a function of effect size (d):
● Power closely related to effect size ● Effect size is determined by some of these previous factors we looked at (μ0 - μ1 and σ) where d = μ1 - μ0 σ ● d indicates about how many standard deviations apart the means are
Repeated Measures ANOVA:
● Repeated measures ANOVA has multiple observations of one participant/case over time, where the DV is continuous and the same for each measurement. Studies that investigate either (1) changes in mean scores over three or more time points, or (2) differences in mean scores under three or more different conditions.
Factors affecting power:
● Significance level (ɑ) → Relaxed ɑ = more power ● Sample size (N) → Bigger N = more power ● Mean differences, μ0 - μ1 → Larger differences = more power ● Error variance (σe2 or MSerror) → Less error variance = more power
Type 1 and Type 2 Errors:
● Significant differences are defined with reference to a criterion or threshold (i.e., acceptable rate) for committing Type I Errors: typically set at .05! ● Hypothesis testing pays little attention to Type II Error ● Concept of power shifts focus to Type II Error
The One-way Groups Design (ANOVA):
● The one-way independent groups design has a single IV with 3+ levels, each experienced by an independent group of participants ● The one-way dependent group design has a single group of participants who are assessed more than 2 times (e.g. pre, post, follow-up)
Eta squared (η2):
● We can use eta squared, η2, as an effect size measure in ANOVA. ● This effect size is just r2 by another name and is calculated by dividing the effect of our treatment, SSM, by the total amount of variance in the data, SST. SSM / SST. ● It is the proportion of total variance explained by an effect. ● E.g., if you run an ANOVA and your main effect has a corresponding eta squared effect size of 0.14 (a large effect), it means that 14% of the variance in whatever you are measuring (frontal lobe size in crabs) can be accounted for by the main effect of sex (or species, whatever you are interested in at the time). The simplest way to measure the proportion of variance explained in an analysis of variance is to divide the sum of squares between groups by the sum of squares total. This ratio represents the proportion of variance explained. It is called eta squared or η². η2 = 0.01 indicates a small effect; η2 = 0.06 indicates a medium effect; η2 = 0.14 indicates a large effect.
How variance is partitioned in a between-groups ONE-WAY ANOVA:
● We compare the ratio of between-groups variance to within-groups variance ● Given that there is always within-group "error" variance in the sample, we want to know whether there is more between-groups variation than expected based on this error variance ● We want to be sure that differences between groups are not due simply to error!
"How strong the linear relationship is between two variables" Measured by correlation coefficient (r)
● r can only range from -1 and +1 ● Positive values = positive correlation ● Negative values = negative correlation ● Zero = no relationship ● r can be used to measure degree of reliability and validity