Data Analysis Homework
Steps of Null Hypothesis Significance Testing
1. Restate your research question as hypotheses about populations. Null hypothesis would describe no change between population 1 and population 2. Research hypothesis would describe a change between population 1 and population 2 2. Determine the characteristics of comparison distribution under the null hypothesis. This means population 2 becomes the comparison distribution representing a situation where the null is true 3. Determine the critical cutoff point that corresponds to an alpha or significance level of .05. This marks 5% of scores being in the tail of the distribution. (this reflects back to the logic of critical thresholds) 4. Determine the sample score on the comparison distribution, meaning one would convert the sample data into a standardized score in order to locate it on the comparison distribution. (relates to logic because it helps determine if this score would be extreme enough to pass the critical cutoff point) 5. Compare the significance level with the sample standardized score and decide whether to reject or fail to reject the null. If the score exceeds the critical cutoff pointer significance level, reject the null, if it did not fail to reject the null. (Significance leads to the inference that population 1 is different than population 2)
What is a sampling distribution and what role does it play in inferential statistics when N>1?
A sampling distribution is a standardized distribution of means that is created by collecting new scores to create a new sample, taking the average, and plotting it on a sampling distribution. You repeat this process infinitely, which the resulting distribution should likely be normal in shape. The data for the samples comes from the parent population where the null hypothesis is true It is important for inferential statistics when N>1 because one needs to compare like groups. It is mathematically incorrect to compare a sample of 20 to a standardized distribution of 1, so one needs to compare apples to apples and compare that sample of 20 to a standardized distribution of means or sample distribution.
F ratio of 1 F ratio >1
An F ratio of 1 represents no real difference between means. Because F=1 is no real difference between groups, the F distribution is positively skewed. An F ratio greater than 1 represents that there could be some real differences between group means. We aim to have F > 1, because that is theoretically the only direction it can go in order to see difference between means. The larger the F ratio, the more "realness" of between group differences we will see.
Example
An example from my area of research would be with aggression scores and treatment groups. Each sample will have within group variation due to general differences among people. For instance there will be within group variation in the control, CBT, and CBT + drug groups, because all people are different and will react slightly different to any given treatment. However, if a treatment were to work, one would also see between groups variation in my study, where there is a large difference between the mean of one or more group. For instance, if CBT was the best treatment, one would see a significant difference in the mean from CBT and the mean from the control. This denotes an actual change in population due to the treatment, not just differences in people.
a formal paragraph statistically and substantively interpreting the results
At a .05 alpha level, we have sufficient evidence to reject the null hypothesis that the average physical aggression score is the same for participants who are given no treatment, who are given CBT for aggression, or who are given CBT + drug for aggression; F(2, 117) = 18.53, p <.001. Based on our study, we can conclude that the average physical aggression is not the same for participants who are given no treatment, who are given CBT for aggression, or who are given CBT + drug for aggression. Follow-up tests are required to determine which groups are responsible for the significant omnibus finding.
Why does post hoc exploit data?
Bc its from a sample and the sample can be messed up, non representative sample or skewed. Say we got a false positive (say its significant when really its not) which is ego depletion.
Between groups variation and within groups variation in an F ratio
Between-groups variation is the difference in means between the different comparison groups. It is the degree to which the metaphorical "signal" in your statistical experiment is showing a true difference between the different comparison groups. Just bc population means are different doesn't mean within group variation goes away Within-group variation is the variation within one particular sample or the variation among people in that sample. This is the metaphorical "noise" in the statistical experiment that could hide a true difference among means. Within is caused by sampling error (sample means can all be dif) even if population means are the same.
What need to know for NHST ind and dep t tests
Comparison distribution test stat degrees of freedom* look in notes research question that can be answered, ind: compares 2 independent group means at same ecological level, can be natural or experimentally manipulated. Mean of comparison distribution for z/t tests capture null hype, no dif, or zero. ind: means no dif between groups, distribution of differences between means dep: calculate difference score for each participant, change over time. comp dist: no change over time or avg dif score made up of change scores
Dependent samples t test
Conducting a dependent samples t-test with a sample of prisoners from LA. These prisoners will have a pre-measure of how many coping strategies are used to deal with anger, then an anger cognitive behavior treatment will be administered, and then the prisoners will have a post measure of how many coping strategies are used to deal with anger. This test will be utilized because we are comparing the same group twice, and the scores are dependent on each other because they come from the same participants. It is also used to identify if the treatment was successful in increasing the number of coping strategies used to deal with anger. The dependent measure of coping strategies is a count of the different coping strategies employed, ranging from 1-10. The variable is approximately normally distributed for both pre and post scores. Null hypothesis: µ1= µ2 Research hypothesis: µ1 < µ2 This t test is also appropriate because we are estimating the population standard deviation.
Independent samples t-test
Conducting an independent samples t test with LA prisoners, and their measure of anger, but with the added difference of gender. The study is looking to identify whether the measure of trait anger differs between a male prisoner sample and a female prisoner sample. This is a naturally occurring group of gender, so it is not being manipulated here, these two groups do not overlap, and they are on the same ecological level. Female prisoners are coded as 1, male prisoners are coded as 2. This test is also appropriate because we are estimating population standard deviation, along with comparing two independent group means against each other. For instance if we take into account the metaphor again, here we would be comparing a snowstorm to a snowstorm. The outcome variable of trait anger ranges from 1-10 and is approximately normally distributed. Null hypothesis: µ1= µ2 Research hypothesis: µ1 < µ2
Continuous random variable sample space Random phenomenon/statistical experiment
Continuous random variable: such as a t score or z score, plotted on the x axis, sample space. Quantitative (values are numbers) for which you can divide between any 2 observed values a theoretically infinite amount of times, and an outcome of a statistical experiment sample space: the x axis, all possible outcomes or events Random phenomenon/statistical experiment : 1. prespecify all possible outcomes, have more than one possible outcome, don't know the outcome in advance/determined in part by chance
Discrete Continuous
Discrete: Number of times a participant executes an "electric shock" to the confederate. They cannot execute half of a shock, they can either give a shock or not give a shock, so it is a discrete variable since it has to be a specific number of times. Continuous: Time on a Stroop Task to test reaction time. It is continuous because you can divide it as finely as you want, such as in milliseconds.
Explain MCPs and when they would be appropriate Dunn-Bonferroni Dunnett
Dunn-Bonferroni o This would be appropriate to conduct if it was an a priori or planned comparison that I decided I would conduct before the experiment data collection began. It is also appropriate for pairwise or complex comparisons. Since I peaked at the group means before decididng on these comparisons (ie since I knew the group means from week 4's homework), none of these pairwise comparisons of CBT and CBT + drug, Control and CBT, or Control and average of CBT and CBT + drug would be appropriate for the Dunn-Bonferroni MCP. Dunnett o This would be appropriate to conduct if it was a priori or a planned comparison that I decided to conduct before looking at the data, and if the groups were pairwise comparisons. It would also only be appropriate if I was comparing the group mean to a reference group or in other words if I were to anchor my comparison to the same reference group each time. If I made the decision before the experiment started and were only doing two pairwise comparisons with CBT and CBT + drug and CBT and control, this MCP of Dunnet would be appropriate. However, if I wanted to add in the complex comparison of control and average of CBT and CBT + drug, it would not be appropriate since I am not using the same reference group for comparison and it is no longer a pairwise comparison, it is a complex comparison. I also peaked at the data (since it was from week 4 homework), so this would also disqualify this MCP.
How to interpret all parts of f ratio write up
F (dfb, dfw) = f ratio, p value Df between is between group variance a-1 or groups -1 Df within add df of all groups On f distribution produced test stat to get real between group variance estimate Interpretation: 3% likelihood of getting a test statistic this extreme if the null hypothesis is in fact true. 3% chance of a false positive (saying something is real when it is not)
Provide a formal paragraph interpreting MCPs
Follow up tests were required to determine which groups are responsible for the significant omnibus finding. In particular, after looking at the group means I made a post-hoc decision to conduct two pairwise comparisons of CBT and CBT + drug and CBT and control. As a result, either Tukey or Scheffe multiple comparison procedures were appropriate to adjust our family wise alpha level so it remains at target level of alpha=.05. We compared the relative power of MCPs and since Scheffe is less powerful, Turkey was chosen At a .0 alpha level, we fail to reject the null hypothesis that the average physical agg score is the same for participants who are given CBT or who are given CBT + drug [ F(2,117) = 18.53, p=.42]. Based on the Tukey correction, we can conclude those given CBT are relatively the same physical aggression score (M=20.88, SD= _) that those who are given CBT + drug (M=23.35, SD=_)
From describing the first figures in 4.1 handout
From the 4.1 handout, the first figure shown is illustrating within group variation. For the figures on the left side, there is showing very low within group variation since all of the means are relatively close to the population mean (their means are the dots, the population mean is the triangle and all three population means are the same). For the figures on the right, all the population means are again the same, however there is much more within group variation, meaning the population distributions are wider. This is due to the sample mean of each picture being farther away from the actual population mean (triangle). The wider distributions mean there is much more noise obscuring the samples taken, which would make it harder to tell if there is an actual between groups difference in the population, since there is so much more within group variance in each sample taken to estimate the population.
GLM
General linear modeling linear relations among things statistical model Broadly applicable stat model for linear relations Underlying system of equations for regression based statistical models ANOVA: restricted case of regression, restricted to categorical explanatory variables. Foundational framework that allows us to run tests Yij (i and j are subscripts) one way anova, common symbol in GLM notation. Y is outcome variable score, subscripts denote score, i is individual score for the ith participant, for the jth group
MCPs
If choosing between 2 appropriate mcps you can let data speak for itself and choose the one with the lowest p value. Impose mcps after significant omnibus finding and if we conduct more than one follow up test Mcps inflate the alpha level, one comparison gets all 5% of alpha. MCP test creeps up alpha level and we need to bring it back down to 5%. Thats what the mcps adjust for MCPs are not the follow up tests, the follow up test is the test itself (independent means t test)
Based on Lane and Sandor (2009), write a one- to two-paragraph critique of a peer-reviewed journal article
In 2013, Psychological Science published an article examining the effects of transcranial direct current stimulation on anger rumination (Kelly, Hortensius, & Harmon-Jones, 2013). Since conflicting research within the depression rumination and anger rumination literature exists between left versus right cortical activation, the researchers aimed to solve this debate, and determine if anger rumination was increased with either left or right frontal cortical activation. Their main analyses revealed a significant difference in mean scores for state rumination between the sham stimulation and increase in relative right frontal cortical activity, and between an increase in relative left frontal cortical activity and increase in relative right frontal cortical activity. There was also a significant difference in means of ruminative thoughts between the sham stimulation and increase in relative right frontal cortical activity, and between an increase in relative left frontal cortical activity and increase in relative right frontal cortical activity. In order to display this graphically, the authors chose to depict these results in two bar graphs (one for state rumination and one for ruminative thoughts) that displayed both standard error bars and asterisks to indicate significance at the top of the graph. While this was better than many graphs since it displayed inferential information, this is not the most efficient way to display data for understanding. Specifically, this technique of displaying means with standard error bars only accounts for inferences involving individual means (Lane & Sandor, 2009). Even further, the use of standard error bars can be particularly problematic because it is difficult to determine differences with standard errors of the differences among means (Lane & Sandor, 2009). For instance, if one were to view figure 1a and 1b in Kelly et al.'s article, it is hard to assess the overlap in standard error bars between conditions, thus making it difficult to determine if there were significant differences. Along with this, the added significant bars at the top of the graph make the graph visually overwhelming. Lane and Sandor also recommend avoiding this because it implies a dichotomous decision to reject the null along with neglecting confidence intervals (2009). In order to improve these graphs, I would choose to display the information as box plots. By doing this, one can view the medians easily, along with new information of the range and interquartile range. This would make the conditions easier to compare than with the standard error bars of the original bar graph. I would also add two more graphs, one in regards to state rumination and one in regards to ruminative thoughts, to show the confidence intervals of both significant differences: the sham stimulation and increase in relative right frontal cortical activity, and between an increase in relative left frontal cortical activity and increase in relative right frontal cortical activity. This would be added in order to show the confidence intervals directly. By changing the original graph into box plots, and adding confidence intervals, it will increase the readability and effectiveness of this information.
Explain the logic of null hypothesis significance testing (NHST)
NHST is utilized by analyzing a sample in order to make inferences about a population. In essence, one can draw a conclusion based on a sample of n=1 if a score is extreme enough and exceeds the critical threshold of your alpha level. If it is extreme enough and passes the critical cutoff point, the result is statistically significant. Significance allows you to make an inference about the population.
How does calculating a z score for a single score relate to inferential stats
In the case of a single score to infer about a population. The first step in making such an inference is to calculate a z score from the the raw score of the participant. Then one can compare that z score to the z distribution using a z table to decide whether that score went beyond the alpha level or critical threshold needed to reach significance. If it does, one can reject the null hypothesis and infer that this score is significantly different than the general population, meaning that person/group/snowflake is significantly different than the population/snowstorm. If it doesn't, one would fail to reject the null and infer that there is no difference between the score/snowflake and population/snow storm. Inferential stats is when you use samples to make inferences about populations.
Independent Dependent
Independent: Cognitive behavioral therapy for anger management Dependent: Aggressive behavior CBT would explain the outcome change of aggressive behavior, so it is therefore an independent variable. It is an explanatory variable in nature, meaning it is the aspect that is explains the variation in outcome scores, and the aspect we are manipulating. Aggressive behavior is an outcome variable in nature, where it would be a product of the explanatory variable, and thereby vary based on the CBT treatment. Aggressive behavior is the consequence and has variation in scores, and that is what we try to explain through science.
Interval: Ratio
Interval: Distance between any 2 sequential values is the same. Example would be temperature on the Fahrenheit scale, 0 does not mean an absence of temperature here. numeric, quant Ratio: Aggression measured through the amount of hot sauce allocated to the confederate to eat. A dependent measure for aggression that has been used is with hot sauce. The participant (as a punishment to the confederate) can choose however much hot sauce they would like the confederate to eat, and then the weight in grams is measured for the total score. The score can be as low as 0 g of hot sauce (the absence of hot sauce and thereby absence of aggression-ratio), and interval because the amount of hot sauce can be divided into equal intervals (in this case grams) to measure how high someone's aggression is. Both interval and ratio are numeric in nature because the values are numbers.
Logic of Null Hypothesis Significance Testing NHST
Logic needs to be applied flexibly to our own research after the steps are done. Often compare two populations at same ecological level or different ecological level (UCI students to gen pop). This is compared on a distribution where the null hypothesis is true. This is the case bc it is the null hypothesis we test. We test this by seeing how far we can depart from the mean or how far you are in the tail of the distribution. If depart enough from the mean of the null hypothesis, we're allowed to reject it. But since it is based on a study, we are allowed to call the null hypothesis into question This is the bedrock of inferential statistics, even after the structure/steps are taken away
Mediator Moderator
Mediator: Investigate the mediating role of anger rumination in the association between trait anger and aggression (seeing if anger rumination accounts for any indirect effect on anger and aggression). So in this example seeing if anger has an effect on anger rumination which then has an effect on aggression. Trait anger could partly lead to rumination which then would increase aggression. Partial mediation model Moderator: Investigating the moderating role of gender between anger, anger rumination, and aggression. See if there is more or less of an effect of anger rumination on women or men. The moderator of gender could change the relationship between anger rumination and aggression, and can vary as a function of strength or direction of relation. For example, I would hypothesize that women ruminate more than men, and that could explain their increase in aggression indirectly. However, I would hypothesize men ruminate less than women, so their aggression would be a direct result of anger.
Types of variables and examples Nominal Ordinal
Nominal: Also categorical. Gender. Would look at gender differences between anger rumination. Genders cannot be classified in a rank order or ratio, so they are categorical and nominal in nature. Ordinal: Numeric, quant. A likert type scale on pain rating to be used when looking at committed samples when investigating if anger (and thereby aggression) is affected by pain. This would be used in a clinical sample and correlational in nature, because it is unethical to manipulate pain. This is numeric in nature because the values are numbers. This scale can be ordered where a 7 is greater than a score of 2, but intervals are not equal. For example, a score between a 9 and a 10 on pain rating could be much further apart than a 1 and 2 on pain rating.
Interpreting formal results One sample t test Dependent samples t test Independent samples t test
One sample t-test At a .05 alpha level, we have sufficient evidence to reject the null hypothesis that the average trait anger score of LA prisoners and people in general is the same, t(99) = 3.32, p= .001. Based on our study, we can conclude that LA prisoners are significantly angrier on average (M= 5.67) than people in general (µ=5). Dependent samples t test At a .05 alpha level, we have sufficient evidence to reject the null hypothesis that the average amount of coping strategies used by LA prisoners to deal with anger from before to after an anger cognitive behavioral therapy is not different; t(49) = -31.89, p= <.001. Based on our study, we can conclude that LA prisoners coping strategies significantly increase from before (M= 3.74, SD=1.32) to after (M=7.00, SD=1.13) an anger cognitive behavioral therapy with an average difference of 3.26 on the amount of coping strategies used. Null= avg difference score of LA prisoners is the same as avg dif score of ppl who don't change from before to after intervention • Independent samples t test o At a .05 alpha level, we have sufficient evidence to reject the null hypothesis that the average trait anger score of LA female prisoners and LA male prisoners is the same, t(98) = -13.87, p<.001 . Based on our study, we can conclude that LA male prisoners are significantly angrier on average (M= 7.09, SD=1.09) than LA female prisoners (M=3.79, SD=1.28).
Probability theory P or f(x) 0 <= p <= 1 pdf
P or f(x): probabilities, the y axis 0 <= p <= 1 the y axis range, the range of probabilities on y axis, this allows us to say the height of the distribution allows us to tell the probability of x axis pdf: probability density function, a model where precise curve of it is derived from integral calculus, to get a precise probability to each value
What does p-value mean
P refers to the likelihood of getting a test statistic this extreme if the null hypothesis is true, In other words, p is the probability that the experiment got it wrong or committed a type 1 error: when you say it is significant, but it really is not, a false positive.
Parametric Non Parametric
Parametric: parameter: fixed quantity about a population. Makes assumptions about things in the population, about population parameters. Ex. t test, f test, any test in this class Non parametric: does not make those assumptions about population parameters
Predictor Criterion
Predictor: Anger rumination Criterion: Aggression To see if the amount of anger rumination a person experiences can predict or account for their level of aggression. Anger rumination is the explanatory and predictor variable because it is the thing that explains the variation in the outcome variable. Aggression is the criterion and outcome variable because it is the aspect that varies based on the explanatory variable, aggression is the consequence or product of anger rumination. Since nothing is being manipulated, this would constitute a predictor/criterion relationship with regression.
p(S)=1
Probability of entire sample space adds up to 1. Add events up to get sample space outcomes and 1. 100% of scores fall in this distribution
Statistical vs practical significance
Simply bc we crossed statistical significance threshold, doesn't mean that is is a practical or meaningful result bc we can mess with stats bc of a large sample size sigma squared m = sigma squared / N. sigma m = sq root of sigma squared/ N variance of distribution of means gets smaller when dividing by a bigger number M- mu m /sigma m = z for this if denominator is smaller, test stat is bigger
What are standard(ized) scores and why are they valuable?
Standardized score are in SD units since the formula to calculate a z score is (x - mu)/st dev. It is much easier to compare standard scores than raw scores, especially across different scales or studies, bc standard scores let all data be on the same playing field. Different studies/scales can have all dif types of raw scores measurements (1-100 20-50, etc) so it is difficult to compare unlike scores. Standardization takes all raw scores and converts them to the same metric. All of the information in the raw score is conserved in the standard score and makes comparison easier Standardized scores are valuable because once you calculate a raw score into a standard score you are able to use a z-distribution in order to devise more information. The benefit of using the z-distribution is that it is perfectly symmetrical, unimodal, and where we know the mean (0) and standard deviation (1). This distribution falls into a special 34-14-2 rule, where 34% of scores fall between the first standard deviation, another 14% falls between the first and second standard deviation, and 2% of scores fall between after two standard deviations or more. This is important because with the utilization of this, and a z-chart, you can calculate how many scores fall above or below a certain point. This is integral in determining if a score is statistically significant in null hypothesis significant testing, and standardized scores allow us to determine those results.
F ratio in general what does it mean
The F ratio is the estimate of between groups variation divided by within group variation, where it removes the contribution of within group differences to determine if there is a true group difference. It formally weighs these two aspects in order to get rid of "noise" in order to see the real difference. The fact that it is an estimate is important because if it wasn't, and we just knew the population values, then there would be no point in conducting these tests since we already know the values. The estimate is how the ratio is built, and how we are able to use it to make inferences about our intended population. The ratio of between group (between/within) divided by within is important because it essentially removes that contribution of error from within group variance, to just leave between group or an actual real difference
a one-way ANOVA why it is appropriate for this statistical test
The example study is trying to identify if physical aggression can be reduced with cognitive behavioral therapy (CBT) or cognitive behavioral therapy with a drug (CBT + drug). This is important to identify to see what is the best treatment option for people who are high in physical aggression—a more nuanced view of understanding the slight differences between treatment options—in order to help prevent violent outbursts. The explanatory variable groups in the investigation are 1-control (no treatment given for aggression), 2- CBT for aggression, and 3- CBT + drug for aggression. The outcome variable is the Buss Perry Physical Aggression score. A one-way ANOVA is the correct test for this analysis because we are comparing 3 independent group means against one another and a one-way ANOVA is used when you compare two or more group means. This test is also relevant because we have one categorical explanatory variable (the different treatment types) and one continuous outcome variable (the score on Buss Perry's Physical Aggression Questionnaire).
From describing the second figures in 4.1 handout
The second figure on the 4.1 handout is demonstrating between groups variation. The three figures on the left all have the same population means, and although there is some within group variation, the means of each group are still relatively similar to the population mean (triangle). The three figures on the right however, do not have the same population mean at all, as denoted by the triangles (means) and distributions being completely different. However, while there is still some within group variation, as denoted by the dots and differences of scores, there is much difference between the group means, specifically with the second group mean being much farther to the right, which depicts actual between groups variation.
t-test for independent means
The test statistic is standardized, so they are in SD units. The test statistic is how far away from the mean of the comparison distribution where the mean is always 0 and standard deviations increase by 1. How extreme the test statistic is, is how one deduces whether population 1 is significantly different from population 2 (which is also the comparison distribution this test statistic is tested against). If the test statistic surpasses the critical cutoff point one sets, than the two populations are different.
Mathematically why does does Ψ = 0 when the null hypothesis is true?
This act of canceling out occurs in two ways. In the population it is needed so Psi makes sense. You can express the null hypothesis in two ways µ1= µ2 or Ψ = 0. The canceling out is explained mathematically by the equation: Ψ = a ∑ cj µj j=1 Starting from group 1 and cycling all the way up to the last group, sum up the product of the contrast coefficient and population mean for each group. The weight of the coefficients balances the scale out, where the coefficient connected to the mean decides on what comparisons are being made. The means of the population decide whether the scale becomes unbalanced, and if so the null hypothesis is false. In a sample we also reset the scale, but now it is a sample mean against sample mean and an estimate of psi. Conceptually the coefficients serve the same purpose, but if in the sample if psi is unbalanced, you must adopt language of a sample, meaning you reject or fail to reject the null id the estimate of psi is large enough.
Tukey Scheffe
Tukey o This MCP can be used for planned or post hoc comparisons, but only with pairwise comparisons. If I were to just compare CBT and CBT + drug and CBT and control, even though I looked at the group means first and made the decision to run this test post-hoc, this MCP would still be appropriate to use. However, if I wanted to include control and average of CBT and CBT + drug, this test would no longer qualify since this is an addition of a complex comparison. Scheffé This MCP can be used for planned or post hoc comparisons and pairwise or complex comparisons. One can use this MCP under any research scenario, and because of this, this is the most conservative and least powerful MCP. Since I made the decision after I looked at the data or post hoc, and want to compare both pairwise and complex comparisons of CBT and CBT + drug, Control and CBT, or Control and average of CBT and CBT + drug, this MCP would be the only procedure I could use that would be appropriate to satisfy all requirements of the follow up studies.
When and why do we use multiple comparison procedures (MCPs)?
We use multiple comparison procedures after an initial significant "omnibus" test. We also use MCP's when we have more than one comparison test. For instance in a one-way ANOVA, if we have sufficient evidence to reject the null hypothesis, that means that we have found a significant difference in means between one of the groups we tested. Since a one-way ANOVA is an omnibus test, it does not tell us where the significant differences occurred. This is why and when we need multiple comparison procedures. We use multiple comparison procedures as the procedures applied to conduct follow up tests to the omnibus test, to tell us where the exact significant difference(s) occurred. Multiple follow up comparisons will inflate our family wise alpha, so we use MCP's in order to adjust our family wise alpha, so all of the follow up tests equate to a total family wise alpha of whatever you choose, most likely alpha=.05.
Explaining 4.1 handout graphs
Wider distribution: has bigger within group variation and more variation, can sample from more places meaning you get more sampling error Smaller/narrower distribution: less within group variance, less variation, limited to sample from less within group variation so less sampling error
Example
With an example from my area of research, if we were to conduct a one-way ANOVA for aggression with the treatment groups of control, CBT, and CBT + drug, and we resulted in a score of F= 1.07, it would not be a significant result because there is no true difference between means. With the score practically being 1, this means that the between group variation and within group variation cancelled each other out, and there is no true difference between the different group means. However, if we were to run the same test, but this time received the result of F= 39.43, this would be a significant result and show differences between group means. This would mean that the between group variation was much greater than the within group variation, depicting a true difference between at least two of the treatment groups.
Fac a levels or groups p-value alpha level a priori
a is number of levels or groups levels or groups are referring to the same thing p-value: precise probability assigned to our test statistic alpha level: threshold probability value we must exceed in order for our results to be statistically significant a priori: before the fact, planned. Should choose alpha/finish line before conducting the study
s squared
estimate of population variance the sample variance
event (A) 0 ≤ p(A) ≤ 1 continuous random variable
event (A) One of the outcomes in a sample space. One of the values of a random variable. A subset of outcomes. Can assign specific probability to it. Tied to the x axis of a probability distribution. "How likely is it that this event will occur". Combination of this term with sample space creates the probability distribution, where one assigns probabilities to each event. 0 ≤ p(A) ≤ 1 One rule. Probability of any event is between 0 and 1 or 0% and 100% likely to occur. Range of values along the y-axis. continuous random variable Random variable is a variable who's outcome of a random phenomena or statistical experiment. Outcome option is continuous. Sample space is comprised of an infinitely divisible sample space. Examples: normal distribution, z distribution, f distribution. Sample space is the range of values for this type of variable
Probability theory look at picture in notes long run relative frequency sample space
long-run relative frequency The proportion of times a certain event will occur (relative frequency) over repeated statistical experimentation. Underlies the classic examples such as dice throw or coin toss. So a point on the distribution curve is the long run relative frequency of that specific event sample space (S) List or set of all possible outcomes (of a statistical experiment). Range of values of a random variable. Comprised of all the possible events. Combination of this term with event creates the probability distribution, where one assigns probabilities to each event. x axis reflects sample space
Notation of NHST
need to know what the steps mean and what notation is and what it means look through handouts for these
When would these tests be appropriate? One sample t test
one sample t test: In, the general population, a score of 5 on this measure of anger is average, but it is not known how much people vary on this trait. In this study, we are interested to see if LA county prisoners have higher trait anger scores than the general population. This is the appropriate test because we know the population mean, average=5, but we do not know the standard deviation of this distribution. The range for the anger measure in the general population is 1-10, where the average was 5. This test was appropriate because we know the population mean = 5, but we do not know the standard deviation of this distribution. In the general population, the outcome variable of anger is normally distributed because some people will have very little or very high anger, while most people fall in the middle. This is also the appropriate test because we are comparing one individual sample group mean to the much bigger scale of the general population, and one sample t-tests are used in this case. Take into the consideration the metaphor: snowflake to snowstorm, the one individual sample group mean is the snowflake, and the large, expansive general population is the snowstorm. continuous outcome variable Null: mu 1 equal mu 2, null mu 1 > mu 2
p(s) = 1 Event (A) LRRF
p(s) = 1 all area under the curve/sample space Event (A) each t score is an event and can assign a probability per each event. Know probability by locating event on curve of distribution and see corresponding height on y axis. t score is an outcome LRRF: long run relative frequency, use as title for the probability theory figure, this figure falls under the long run relative frequency interpretation of probability
probability density function statistical experiment or random phenomenon
probability density function: Used to create continuous random variable distribution. It is more complicated to create because of the nature of the variable, and is created through the use of integrated calculus. It assigns a precise probability for the likelihood of each possible value on the sample space. This is the type we will mostly be using for our distributions. The point on the curve that gives the precise probability of test statistic and how likely to observe the sample data given the null were true statistical experiment or random phenomenon: The experiment can have more than 1 possible outcome, and each possible outcome can be specified in advance. This is also known as a sample space. Whichever event we get from the sample space happens by chance. In NHST, this is what allows us to compare the sample to a sampling distribution drawn from the population. If a test statistic exceeds the critical cutoff point determined by your alpha level, given if the null were true, this test statistic is not likely to have occurred by chance.
Test statistic
statistic produced by our sample data on which we conduct an inferential test. Bc its stat based ons ample, its also a sample statistic on which we conduct an inferential test How? calculate the test stat. Raw mean to standard mean to find diff between standard mean, plot test stat along comparison list, assign probability value to that test stat
t-test for a single sample t-test for dependent means
t-test for a single sample: The test statistic is standardized and based on degrees of freedom, so they are in SD units. The test statistic is how far away from the mean of the comparison distribution where the mean is always 0 and standard deviations increase by 1. How extreme the test statistic is, is how one deduces whether this sample is significantly different from the target population the test statistic is tested against. If the test statistic surpasses the critical cutoff point one sets, than it is significantly different, and one can reject the null hypothesis that the two populations are the same. t-test for dependent means: The test statistic is standardized, so they are in SD units. The test statistic is how far away from the mean of the comparison distribution where the mean is always 0 and standard deviations increase by 1. In this case, the comparison distribution is of mean difference scores, so how extreme the test statistic is, is how one deduces whether the mean difference score of this test statistic is significantly different from the mean difference score of the target population (0) the test statistic is tested against. If the test statistic surpasses the critical cutoff point one sets, than it is significantly different.
Comparison distributions for: t-test for a single sample
t-test for a single sample: •The t distribution for a single sample has thicker tails than a z distribution, but it is still unimodal and symmetrical, however it is now kurtotic as well. This is different from the z distribution because we are now estimating the population variance and need to build in room for error. The thicker tails make it harder to pass the finish line. The t distribution is actually a family of t distributions with t distributions varying based on the sample size or proxy of degrees of freedom (N-1). The bigger the sample size, the better the estimate they are of the population, making it closer and closer to a normal distribution with the tails thinning out. •The comparison distribution is comprised of scores from the target population, the t distribution has thicker tails than a z distribution, but it is still unimodal and symmetric, however it is kurtotic as well. Since we are estimating the population variance, we need to build in room for error, and this is what the t distribution does. The mean for this distribution is 0. •The comparison distribution is related to the null hypothesis because the comparison distribution is the distribution depicted if the null hypothesis were true.
Psi and contrast coefficients
under umbrella of mcps and follow up tests when conducting follow up test we need hypotheses (null and research) for each test conducted Null: mu 1 = mu 2 translate null into coefficients to assign to means mu1-mu2 =0 if null hyp is true, the two means are the same so if you subtract them they should equal 0. This is why when you add up the coefficients they need to equal 0. At this you can assign coefficients to population 1 mu 1 - 1 mu 2 = 0. coefficient for mu 2 is -1
Explain what the test statistic means: z-test for a single score z-test for a single sample
z-test for a single score The test statistic is standardized, so they are in SD units. The test statistic is how far away from the mean of the comparison distribution where the mean is always 0 and standard deviations increase by 1. How extreme the test statistic is, is how one deduces whether the single score is significantly different from the target population the test statistic is tested against. If the test statistic surpasses the critical cutoff point one sets, than it is significantly different. This is used when one knows both the mean and standard deviation for the population. z-test for a single sample The test statistic is standardized, so they are in SD units. The test statistic is the magnitude from the mean of the comparison distribution where the mean is always 0 and standard deviations increase by 1. How extreme the test statistic is, is how one deduces if the z-test statistic gathered in the sample is significantly different enough from the target population to suggest that we should reject the null hypothesis. If the test statistic surpasses the critical cutoff point one sets, than it is significantly different.
Comparison distributions for: z-test for a single score z-test for a single sample
z-test for a single score: • The comparison distribution is comprised of single scores, since this is a z test for a single score. The comparison distribution needs to be the same, meaning it is comparable to the initial level of data collected. • The mean of the comparison distribution is zero. • The comparison distribution is related to the null hypothesis because the comparison distribution is the distribution depicted if the null hypothesis were true. • The comparison distribution for a z test for a single score is the comparison distribution where the null hypothesis is true, in this case the standardized z distribution. It is normal in shape, unimodal, and symmetrical. We also know the population mean and population variance, so we are able to compare it. z-test for a single sample: • The comparison distribution for a z-test for a single sample is now a distribution of means or sampling distribution. •The comparison distribution is comprised of scores from the target population, the t distribution has thicker tails than a z distribution, but it is still unimodal and symmetric, however it is kurtotic as well. Since we are estimating the population variance, we need to build in room for error, and this is what the t distribution does. The mean for this distribution is 0. •The comparison distribution is related to the null hypothesis because the comparison distribution is the distribution depicted if the null hypothesis were true.
comparison distribution for t-test for dependent means
• For dependent means there is a comparison distribution of difference scores within an infinite # of sample of participants, measured at two different time points (or theoretically questions). It is still unimodal and symmetrical, however it is now kurtotic as well. This is different from the z distribution because we are now estimating the population variance and need to build in room for error, just like the other t test distributions. According to the null hypothesis for these tests, there is no difference, so the mean difference score is 0. So the mean of the comparison distribution is always 0. Since this is t distribution, there are thicker tails that make it harder to pass the finish line. It also varies based on the sample size, but since it is the same participants it is being estimated again by proxy degrees of freedom (N-1). The bigger the sample size, the better the estimate they are of the population. •The comparison distribution is related to the null hypothesis because the comparison distribution is the distribution depicted if the null hypothesis were true. To create a comparison distribution, one takes samples from difference scores of the target population with the same sample size as the one you want to compare it to, calculate the difference score of that sample, and plot it on a probability distribution, and do this an infinite number of times, it would create the probability distribution.
comparison distribution for t-test for independent means
• The comparison distribution in a t test for independent means is a sampling distribution of differences between two different group means. This is different from the z distribution because we are now estimating the population variance and need to build in room for error, just like the other t test distributions. The null hypothesis states there is no difference between the means, so the comparison distribution will be a mean of 0. It also varies based on the sample size, but since there are two populations being estimated the proxy of degrees of freedom is (N-2). The bigger the sample size, the better the estimate they are of the population. • The comparison distribution is related to the null hypothesis because the comparison distribution is the distribution depicted if the null hypothesis were true. To create a comparison distribution, one takes samples from difference between two group means of the target population with the same sample size as the one you want to compare it to, calculate the mean difference of that sample, and plot it on a probability distribution, and do this an infinite number of times, it would create the probability distribution. The null hypothesis states there is no difference between these two independent groups.
Conceptually why does does Ψ = 0 when the null hypothesis is true?
• We estimate psi like we estimate a population mean (ie with sample data). If the null hypothesis is true, psi equals zero, if the research hypothesis is true, then psi does not equal zero. This is the case because psi is essentially the "balancing" act. If the null hypothesis is true, the means of the two groups should cancel each other out (since there is no difference), and therefore make psi zero.