2022: Oral Exam
How do you interpret the confidence interval around a statistic?
--A CI allows us to express the degree of certainty that we have in our statistic, by using a range within which the population mean likely falls. Asking the question of, "given the data that I have, from sampling the population a large number of times, what values are likely". For example, if looking at a 95% confidence interval, we can say that if we replicated the experiment over and over again and computed a 95% confidence interval for each replication, then 95% of those intervals would contain the true mean of the population OR "We have good reason to believe the true mean lies in this interval because 95% of the time such intervals contain the true mean" Any value not within that CI, is statistically different.
What is statistical power? How can we increase it?
--Power is 1-B, where B is Type II error. --Statistical power is your ability to accurately detect an effect (Or the probability of correctly rejecting a false null hypothesis; when we have support for the alternative hypothesis). There are several ways to increase power: i. Increasing mean (i.e., look at larger effect sizes, increasing signal) ii. Decrease the critical value (i.e., setting a more lenient alpha) iii. Increasing sample size (increasing signal) iv. Reducing SD, by reducing noise
Describe the process of conducting null hypothesis significance testing.
1. Define H0 (opposite of what we expect to be true) and H1 (we have determined significance and rejected the null). 2. Choose your α level (value at which p is statistically significant, the value at which you are comfortable making a type I error, often times 0.05). 3. Collect data. 4. Define your sampling distribution using your null hypothesis and either the knowns about the population or estimates of the population from your sample. 5. Calculate the probability of your data or more extreme under the null. (To get this probability, you'll need to calculate some kind of test statistic. such as a z, t, or chi sq.) 6. Compare your probability (p-value) to your α level and decide whether your data are "statistically significant" (reject the null) or not (fail to reject the null).
What are null and alternative hypotheses?
A null hypothesis (H0) is a statement of no effect. So for example, a null hypothesis may state that there is no relationship between X and Y, or people in a sample group are no different from the rest of the population. According to probability theory, our sample space must cover all possible elementary events. So, we also create an alternative hypothesis (H1) which is every possible event not represented by our null hypothesis. Thus, the alternative hypothesis is a statement of some effect (e.g., this sample group is different from the rest of the population, people who receive this intervention perform better than those who don't, people with this score perform worse on this task, etc.). Typically, the alternative hypothesis is what you actually think is true, or hope to find support for, and you're testing it against what you think isn't true, in hoping to reject the null.
What is the difference between a population and a sample?
A population is the entire group that you want to draw conclusions about. A sample is the specific group that you will collect data from.
What is an "assumption," and why is it important to evaluate assumptions?
An assumption is the requirements you must fulfill before you can conduct your analysis, that allow you to determine in part what tests to run, and also if you will be able to correctly draw conclusions from your analysis Example: When using a one-sample t-test, you need to be able to assume that there is both... --Normality: Assume that the sampling distribution of the mean is normally distributed. Which we can assume when 1) The population distribution is normal. 2) And two, when we have a big sample. AND --Independence: Assume that the observations are independent of one another. So collecting a score from Participant A doesn't tell me anything about what Participant B will say.
Why do we study statistics?
Because we cannot measure entire populations but sometimes need to be able to make accurate inferences about a population. Statistics allows us to make claims about a population based on probability; it is data-driven and less biased than anecdotal claims about populations or groups of people. From slides: it provides an essential aid in signal detection, provide a universal language for communicating findings, and is required for competent evaluation of others' work
What is the relationship between type I and type II error?
Both occur within NHST; They are mutually exclusive, and they are inversely related meaning that they can't happen at the same time however, as the rate of type I error goes up, the rate of type II errors goes down. A type I error (false-positive) occurs if an investigator rejects a null hypothesis that is actually true in the population; a type II error (false-negative) occurs if the investigator fails to reject a null hypothesis that is actually false in the population.
What is construct validity, and how can we establish it?
Constructs are broad concepts or topics for a study, that are often not directly observable, latent variables that we somehow have to then define and understand. Construct validity is the validity of the inference that our operationalization of units, treatments, observations, or settings represents well the construct of which it we are assuming it to represent. It is whether your test measures what it claims to measure. Does our operationalization actually measure the underlying construct we want it to? Is it a good proxy for what we are actually hoping to measure? To establish construct validity, we use theory and tests that have been previously conducted to think of constructs that should be somehow related to this construct of interest, and then test for convergent and divergent validity. i.e., if we are trying to capture "self-esteem", does confidence actually measure or relate to that construct.
Name one of the major critiques of NSHT, how it has affected the culture of science, and one way we can try to mitigate the concerns.
In NHST - we either reject or retain the null, and only one can be true. In this, there is a major replication crisis and a high prevalence of false positives in NHST. Research has been prioritized for finding statistically significant results (p<0.05), as you are much more likely to get published when rejecting the null hypothesis (publication bias, replication crisis). This matters for researchers' careers as it is how you advance your career and ultimately get tenure in academia. This leads to p-hacking, involving data selection and tweaking until your p-value is small enough. And this leads us to adopt the thinking that any finding of statistical significance is true and does not require replication. When a measure of success becomes the target, it stops being a good proxy. To mitigate this, many researchers have turned to open science practices that facilitate transparency in conducting research, such as publishing registered reports identifying how you will collect and run your data analysis BEFORE you've collected your data.
How is internal validity different from external validity?
Internal validity is the validity of the inference that your X and Y variables are causally related (as opposed to some alternate explanation for the significant effect you find, like testing effects/maturation- some other variable like the passage of time). External validity is the validity of the inference that a causal relation between operations generalizes to other units, treatments, observations, or settings (thinking about ecological validity, or temporal validity here- real world settings and ).
How is probability used in (frequentist) statistics?
Probability in frequentist statistics is the long-run frequency of repeatable experiments. It is used in frequentist statistics to talk about the likelihood of events occurring if you do them an infinite amount of times. --The real implication here is that any one result of a study should not be tested; we need to replicate it multiple times to really be able to draw any strong conclusions. Otherwise, we don't know whether or not our first result was a fluke. --objective: the probability of an event is necessarily grounded in the world. --The only way that probability statements can make sense is if they refer to (a sequence of) events that occur in the physical universe; i.e., something that is repeatable. -- There is some debate about this perspective such that we live in a world where infinite events never or rarely actually occur. -- People like frequentist- null hypothesis testing built around it, considered objective because everyone calculates probability the same way and events are observable Example: You couldn't assert according to frequentist statistics, that the probability of America winning the world cup this year is 20%, because it isn't a repeatable event.
What is a sampling distribution?
The distribution of a statistic across an infinite number of samples, often the distribution of sample means A frequency distribution generated by taking repeated random samples from a population and computing a simple statistic for each sample
Why and how do we use a normal or t distribution when studying means?
The normal and t distribution are used in NHST as a way of describing and comparing means to the mean of a population. Where you have a set of observations and most of the observations fall close to the mean of the population, while the rest of the observations make up the tails on either side. A normal distribution is a bell-shaped, symmetrical distribution of means. It is continuous, and the area underneath the curve must equal 1. It is fully described by two parameters: mean and standard deviation, telling us how many SD's a predictable case falls from the mean. Many variables are normally distributed. The t distribution is a type of distribution that is used in situations where you think the data might actually follow a normal distribution. But you don't know the mean or you don't know the standard deviation of the population. It has heavier tails, and it is used for smaller sample sizes. It is the sampling distribution of the difference between means of populations. As sample size increases, it converges on the normal distribution. Assumptions: Homogeneity of variance Independence Normality
What is a p-value and how is it used in research?
The p-value is the probability that you have gotten your data (or more extreme), given that the null hypothesis is true. This value is the proportion of the area under the curve of the sampling distribution that is as extreme, or more extreme, as the test statistic value. So in research, a very small p-value (p<0.05 normally in psychological science), means that such extreme observed data would be very unlikely under the null hypothesis. When your p-value is less than the alpha value, your result is said to be "significant" meaning that you think you found data that supports (but does not prove) your alternative hypothesis.
What kind of errors can you make under NHST, and what do they mean? Which is worse?
Type I: finding a significant result when in reality there is no effect, or rejecting a null hypothesis when you should fail to reject it (false positive, related to alpha level -- means your alpha was too generous). Type II: failing to reject the null/failing to find a significant result when in reality the null is false and there is a significant relationship (false negative, related to your experiment's power--means it was underpowered) Which is worse? I would pose that traditionally, Type I is considered worse. This error is easily published and can mislead people in a public way, as well as influence anyone using your findings to conduct or inform their own research. (i.e., determining that a treatment works for a significant group of people could be particularly detrimental). Type II is also bad because it might discourage you from pursuing research on a real potential difference you could find. (i.e., finding the cure for cancer but determining that it doesn't actually work).
What does it mean for something to be statistically significant?
When the probability that your data came from the population under the null (aka your p-value) is less than the alpha value, your result is said to be "significant," meaning that you think you found data that supports (but does not prove) your alternative hypothesis.