Psych 250: Final Exam
What's the issue with Limitation 2?
How do you explain to a client that there is no difference when clearly there is a difference?
Step 4 of a Hypothesis Test
- Decision - We make a decision about the null hypothesis - Two options: Reject & Fail to reject - Critical Value - If the obtained value is in the alpha, reject - If the obtained value is in the confidence, fail to reject
What is the advantage of using an ANOVA?
- Allows us to control experiment-wise alpha
What is a post hoc test?
- Analysis after the Hypothesis Test - Latin for "after this", in stats "this" refers to a rejected null hypothesis
What graph is used to detect outliers? Are they important or an annoyance? Link to the Ghost Map
- Box Plot - They are annoying. Outliers change the data and pull the mean. This is what causes the skew. - Ghost map was a map were John snow marked were people died to help him find a pattern in all the outbreaks. Found out that a lot of people who used a local water pump contracted Cholera.
What is the mathematical cost for doing an ANOVA?
- Can tell us IF a difference exists b22ut cannot tell us where or how many differences exist
How was probability used in medicine?
- Evidence Based Medicine: Determining if a treatment is effective by using statistics to standardize outcomes and compare treatments - NNT & NNH
Explain a Cohen's d.
- Examines the degree of separation between distributions. - Separation can must be understood in both absolute (effect size) and relative (distributional kurtosis) ways.
What's the issue with Limitation 1?
- In research you need to be 95% confident to publish your findings - That is our threshold for claiming an effect is statistically significant (AKA unlikely due to chance) - This could encourage changing the data to meet the threshold
How was probability used in finance?
- In stocks: Stock brokers can't prove that a stock price will rise or fall but they use probability to suggest what will happen. - In mutual funds: Probability plays a role in evaluating the potential risks and returns associated with investing in mutual funds. - In insurance: Insurance companies use probability to calculate the likelihood of certain events occurring and to determine the cost of providing coverage for those events.
What did Florence Nightingale do to change statistics?
- Nightingale's Rose - Using graphs to explain statistics more efficiently
Why did Plato and many others dislike probability? What changed this attitude (in London)?
- Plato and others disliked probability because they were searching for the absolute truth in life and probability means uncertainty. - The London Coffee Houses became places to talk about probability. Lead to Stocks, Mutual Funds, and Insurance.
Step 1 of a Hypothesis Test
- Starting Point - Hypothesis statement - Question or statement/claim (in words) - Only 3 questions - Directional (2): more than or less than - Non-directional (1): different
Step 2 of a Hypothesis Test
- The Critical Value - Critical values • Set your alpha (or confidence) • Confidence is defined as 1-(alpha) • Are you running a 1 (directional) or 2 (non-directional) tailed test? • Based on your question & Ha (alternate hypothesis) • Determine & justify your test • z-test • Use z when • Sample is large OR • The parameter σ is known (given) • Find your critical value(s) • Critical values ALWAYS come from tables
Step 3 of a Hypothesis Test
- The Obtained Value - Test statistic • Computed or obtained value • The Zobtained • General format - obtained value = effect size/standard error
Explain operational definitions. Do they need to be correct? Example (Is a bee a fish?)
- To make a construct measurable. - No they don't have to be correct, just measurable. The EPA protects Birds, Mammals, Fish, and Reptiles. Bees are endangered but the EPA doesn't protect insects so California operationally defined bees as fish so they would be protected.
Why do we use the letter r for this statistic? What do we use to represent the parameter?
- We use the letter because the correlation represents a relationship - To represent a population we use p
What is the possible range for the F ratio (lowest to highest value)?
0 to infinity
What are the three reasons we study statistics?
1. Better decisions 2. Better arguments 3. Better relationships
What are 2 things we can do to reduce sampling error?
1. Increase your sample size 2. Ensure randomness
What are the three key concepts in statistics?
1. Sampling 2. Variability 3. Probability
What are the 3 equally plausible interpretations for a significant correlation?
1. Variable x could cause variable y 2. Variable y could cause variable x 3. Spurious relationship
What does the p-value or area represent?
A computed tail-area. It's the area in the tail beyond the obtained value.
How is post hoc different from a priori tests?
A priori test is a planned comparison not dependent on a previous decision
Limitation 1
Absolute - Hypothesis tests are absolute, producing a all or none decision
Omnibus Test
Any statistical test of significance where two or more conditions are tested
Limitation 2
Artificial - The null hypothesis is artificial by stating that there is no difference
What does it mean to be data-informed as opposed to data-driven?
Being data-driven puts data at the forefront of decision-making, while being data-informed uses data as a valuable input, but not the only input, in the decision-making process.
Address/explain between group and within group variance
Between group: effect variability Within group: error variability
What is a data transformation?
Change the number but keep the essence
Analytic Frequency Probability
Classical probability is for when the population info is known and it's a parameter. Address source: population Coefficients: 0 - 1 Interpretation: Likelihood of event 0 = impossible 1 = certain
Who developed covariance? What goal were they trying to achieve? Use a graph to illustrate.
Descartes because he wanted a visual representation of variance (relationship graph)
How do you interpret a correlation coefficient in terms of direction and strength?
Draw a scatter plot
How can graphs cause problems?
Graphs can be misleading. The graph made it seem like the Stand Your Ground law reduced gun deaths in Florida when it could be many different reasons not just the law.
A graph is a type of data transformation. How are they helpful? How can they cause problems?
Graphs make it easier for people to understand the data. Graphs give meaning to numbers.
Explain the null and alternative hypothesis for the chi-square
H0: no preference fo = fe • Note: the null hypothesis assumes all are equally preferred; the distributions are the same Ha: preference fo != fe
What assumption SHOULD be met for an ANOVA?
Homogeneity of Variance
Explain the homogeneity/heterogeneity of variance assumption for ANOVA.
Homogeneity of variance is an important assumption in ANOVA, and violation of this assumption can lead to incorrect conclusions. Therefore, it is important to check for homogeneity of variance using appropriate tests before conducting ANOVA.
What is an eta-square?
If we reject the null hypothesis, we need to explain the percentage of variance accounted for by the treatment eta-square = ss(between)/ss(total)
Explain the difference between independent and dependent sampling
Independent sampling: Between multiple sample groups Dependent sampling: Within a subject/group
What's the issue with Limitation 3?
Just because something is statistically significant, does not make it meaningful.
Limitation 3
Magnitude - Hypothesis testing does not address how to interpret the magnitude of an effect.
Explain sampling error
No sample is a perfect representation of the population. There is always the possibility of making a mistake. Sampling error = parameter - stat or Samp er = mu - M
Can you ever prove anything with statistics? Explain.
No, we can't prove anything. We can only suggest the likelihood of something happening.
What is the post hoc for a chi-square?
Pair-wise Chi-Square
Link to Aristotle's 3 elements of data-informed decision making
Pathos - Appeal to emotion - Graphs, testimonials, etc. Ethos - Appeal to doing the "right" thing - Ethics, morals, etc. Logos - Appeal to logic - Data, computation, etc.
Explain pooled and separate variance t-tests
Pooled = Equal variance (fail to reject F) Separate = Unequal variance (reject F)
What is the purpose of a regression?
Regression allows us to predict someone's score on a Y variable if we know their score on an X variable.
Relative Frequency Probability
Relative frequency probability is for when the population info is unknown and it's a statistic. Address source: Sample Coefficients: 0 - 1 Interpretation: Confidence = 0% to 100%
What are the 2 types of dependent sampling techniques?
Repeated Measures: Does taking stats change your well being? Paired Samples: Within straight couples, are male or females more committed to their relationship?
Explain the terms significant and "statistically significant"
Significant: Great or important; worthy of attention. Statistically Significant: UNLIKELY due to chance
Explain the concept of covariance.
Spread of data around more than one reference point
What is the difference between test-wise and experiment-wise alpha? Illustrate with an example.
Test-wise alpha = alpha for each effect (comparison) Experiment-wise = total (sum) alpha for all possible effects in a study
What is the difference between a critical and obtained value?
The critical value is a fixed threshold used to determine statistical significance, whereas the obtained value is the actual result obtained from the sample data that is compared to the critical value to determine whether to accept or reject the null hypothesis.
How do we evaluate the linear equation (hint residual oval)?
The residual oval represents the variance space (area) around the regression line • A r-square close to 1 suggest low variability (leptokurtic) • A r-square close to 0 suggests high variability (platykurtic) • The total area in the oval is 1 • We can then compute the probability of different events occurring based on the area of interest
What is the purpose of the hypothesis test for the correlation?
The purpose of a hypothesis test for correlation is to determine whether there is a statistically significant relationship between two variables.
What is the purpose of the F test for variances?
The purpose of the F test for variances is to determine whether the differences in variances between two or more populations are statistically significant.
Why should we look at our data and not just interpret the correlation?
To avoid any potential misinterpretations that lead to errors
What is the purpose of a hypothesis test? (in general terms)
To make better decisions
Under what conditions do we conduct a chi-square analysis?
To see if there is a statistically significant difference between an observed and expected frequency distribution for nominal data • Observed: sample data • Expected: computed data
What are outliers?
Very large or small numbers, numbers that don't fit the data stream
Clearly explain the difference between α (Type I) and β (Type II) decision errors. Use an example to support your answer.
Type 1 - Alpha - Occurs when you reject the null (You claim there is a difference but there is not one) - False Positive (You detect something that is not there) Type 2 - Beta - Occurs when you fail to reject the null (You claim there is NO difference but there is one) - False Negative (You do not detect something that is there.) Example: Covid tests
What does it mean with the F ratio is above 1?
When the F-ratio is greater than 1, it means that the null hypothesis is rejected, and there is evidence to suggest that there is a significant difference between the variances of the two groups being compared
What does it mean when the F ratio is below 1?
When the F-ratio is less than 1, it means that the null hypothesis is true, which suggests that there is no significant difference between the variances of the two groups being compared
Scheffe
When to use: - Heterogeneity of variance Advantages: - High threshold for detection - If it finds a difference, you know it exists Limitations: - Potential inflation of beta error - Very few restrictions but it will miss a lot of differences
Tukey HSD (Honest Significant Difference)
When to use: - Homogeneity of variance Advantages: - Sensitive means a low threshold for detection It will find differences if they exist Limitations: - Works best with equal sample sizes Need homogeneity of variance (Fail to reject Levene Test for variance) - Potential influence of alpha error
When do you use the Pearson?
You use the Pearson correlation when the data is non-resistant, both x and y are quantitative, the data is linear, and there are no outliers
When do you use the Spearman?
You use the Spearmen correlation when the data is resistant, both x and y are qualitative, the data is not linear, and there are outliers
Why are data transformations important? Give examples
Your raw score is 20 on an exam. How did you do? Out of 20 possible points: - 100% your rock! - The raw data 20 has the same essence as 100% Out of 80 possible points: - 25%, struggle bunny. - The raw 20 has the same essence as 25% This just shows that raw data was transformed into percents which are easier to understand.
What does the word spurious mean?
a line of reasoning that appears valid or logical but may not be
What is a pattern?
a non-random sequence suggesting an underlying cause
What is the purpose of a correlation?
determine relationship between 2 variables
Explain the r2 analysis.
r2 represents variance explained AKA the ratio (percent) of between to within group variance
What is the F ratio?
ratio of the between-groups population variance estimate to the within-groups population variance estimate
Define the term robust
works under many conditions
What is the linear equation?
y = a + bx
Explain slope. Why do we call slope called the impact factor?
y = the predicted value of y. The hat represents a guess. a = the y-int. Where the line crosses the y axis. b = the slope, rise over run, the impact factor x = the predictor "known" value It is used to model the relationship between two variables and make predictions based on that relationship.