Advanced stats-Chapter 2

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Standard error (standard error of the mean)

*The standard deviation of sample means. *Could be calculated by taking the difference between each sample mean and the overall mean, squaring these differences, adding them up, and then dividing by the number of samples. *Then take the square root of that value. *A large standard error (relative to the sample mean) means that there is a lot of variability between the means of different samples and so the sample we have might not be representative of the population. *A small standard error indicates that most sample means are similar to the population mean and so our sample is likely to be an accurate reflection of the population. *In the real word, we cannot collect hundreds of samples and so we rely on approximations.

Effect size measures

*There are several effect size measures that can be used: 1) Cohen's d 2) Pearson's r 3) Glass' triangle 4) Hedges' g 5) Odds ratio/risk rates *Pearson's r is a good intuitive measure -apart from when group sizes are different.

Bonferroni correction

*Used to ensure that the cumulative type I error remains below .05. *Controls familywise error. *Does cause us to lose statistical power.

Central limit theorem

*Which tells us that as samples get large (usually 30 or greater) the sampling distribution has a normal distribution with a mean equal to the population mean. *When the sample is small (less than 30) the sampling distribution is not normal; it has a different shape, known as the t-distribution.

Degrees of freedom

*relate to the number of observations that are free to vary *The number of scores used to compute the total adjusted for the fact that we're trying to estimate the population value.

Calculating confidence intervals in small populations

*For small samples the distribution is not normal; it has a t-distribution. *To construct a CI in a small sample we use the value for t

Meta-analysis

*Involves computing effect sizes for a series of studies that investigated the same research question, and taking an average of those effect sizes. *In meta analysis each effect size is weighted by its precision before the average is computed.

Calculating error

*A deviation is the difference between the mean and an actual data point. *Deviations can be calculated by taking each score and subtracting the mean from it.

All or none thinking

*A major problem with NHST is that it encourages all or none thinking: -if a p <.05 than an effect is significant, but if p> .05 is not. *The dogmatic application of the .05 rule can mislead us.

Sample

*A smaller (but hopefully representative) collection of units from a population used to determine truths about that population. -the bigger the sample, the more likely it is to reflect the whole population.

Effect sizes

*A standardized measure of the size of an effect. -Standardized = comparable across studies -Not as reliant on the sample size -Allows people to objectively evaluate the size of observed effect. *d = -0.2 (small), effect explains 1% of the total variance -0.5 (medium), and effect explains 9% of total variance - 0.8 (large) effect explains for 25% of the variance. **The size of effect should be placed within the research context.

One tailed test

*A statistical model that tests a directional hypothesis

Two tailed test

*A statistical model that tests a non-directional hypothesis

Modern approaches to theory testing

*APA set up a task force to produce guidelines for reporting data in their journals. -they acknowledged the limitations on NHST, but didn't change it. *they require scientists to report useful things like CI and effect sizes to help evaluate their research findings without dogmatic reliance on p-values.

Parameters

*Are estimated from the data (rather than being measured) and are (usually) constants believed to represent some fundamental truth about the relations between variables in the model. *EX: -mean/median -correlation/regression coefficients

Confidence intervals

*Boundaries in which we believe the population will fall. *If the interval is small, the sample mean must be very close to the true mean. *If the interval is wide then the sample mean could be very different from the true mean, indicating that it is a bad representation of the population. *A CI for the mean is a range of scores constructed such that the population mean will fall within this range in 95% of samples. **The CI is not an interval within which we are 95% confident that the population mean will fall.

Familywise/experimentwise error rate

*Error rate across statistical tests conducted on the same data.

The basic principles of NHST

*Is a blend of Fisher's idea of using the probability value p as an index of the weight of evidence against a null hypothesis, and Jerzy Neyman & Egon Pearson's idea of testing a null hypothesis against its alternative hypothesis. 1) We assume that the null hypothesis is true (i.e. there is no effect) 2) We fit a statistical model to our data that represents the alternative hypothesis and see how well it fits (in terms of the variance it explains) 3) To determine how well the model fits the data, we calculate the probability (called the p value) of getting that "model" if the null hypothesis were true. 4) If that probability is very small (.05 or less) then we conclude that the model fits the data well and we assume our initial prediction is true: we gain confidence in the alternative hypothesis. **This process only works if we make our predictions BEFORE we collect the data.

T-distribution

*Is a family of probability distributions that change shape as the sample size gets bigger (when the sample is very big, it has the shape of a normal distribution.

Sampling distribution

*Is the frequency distribution of sample means (or whatever parameter you're trying to measure) from the same population. *This tells us about the behavior of samples from the population, and you'll notice that it is centered at the same value as the mean of the population. *The average of all sample means is the population mean.

Null hypothesis significance testing (NHST)

*Most commonly taught approach to testing research questions with statistical models. *It arose out of two different approaches 1) Ronald Fisher's idea of computing probabilities to evaluate evidence 2) Jerzy Nayman and Egon Pearson's idea of competing hypotheses

Types of hypothesis (Neyman & Pearson)

*Neyman & Pearson believed that scientific statements should be spilt into testable hypotheses: 1) Alternative hypothesis (or experimental hypothesis) -effect will be present 2) Null hypothesis -effect is absent

Type I error

*Occurs when we believe that there is a genuine effect in our population, when in fact, there isn't. *The probability is the a-level (usually .05)

Type II error

*Occurs when we believe that there is no effect in the population when, in reality, there is. *The probability is the b-level (often .02)

Fisher's p-value

*Only when there is a 5% chance (or .05 probability) of getting the data we have if no effect exists are we we confident enough to accept that the effect is genuine. *Fisher's basic point was that you should calculate the probability of an event and evaluate this probability within the research context. *Although Fisher felt a p = .01 would be strong evidence to back up a hypothesis, and p = .20 would be weak evidence, he never said p = .05 was in any way a special number.

The correlation coefficient

*Pearson's r, is a measure of the strength of relationship between two variables. *Also used as an effect size -r = .10 (small effect): Explains 1% of total variance -r = .30 (medium effect): Accounts for 9% of the total variance -r = .50 (large effect): Accounts for 25% of the total variance *The size of the effect should be places within the research context.

Test statistics

*Ratio of systematic variance or effect to error -if our model is good then we'd expect it to be able to explain more variance than it can't explain. *A statistic for which the frequency of particular values is known *The more variation our model explains compared to the variance it can't explain, the bigger the test statistic will be and the more unlikely it is to occur by chance. *As statistics get bigger the probability of them occurring gets smaller.

Sample size and statistical significance

*Sample size effects the standard error and hence the significance. 1) Sample size affects whether a difference between samples is deemed significant or not. -larger samples have more power to detect effects 2) Even a difference of practically zero can be deemed significant if the sample size is big enough.

Sampling variation

*Samples will vary because they contain different members of the population.

Directional hypothesis

*States that an effect will occur, but it also states the direction of the effect.

Non-directional hypothesis

*States that effect will occur, but it doesn't state the direction of the effect.

Error bars

*The CI is usually displayed by this *Can represent the standard deviation or the standard error, but more often than not it shows the 95% CI of the mean. *Are useful, because if the bars of any two means do not overlap then we can infer that these means are from different populations-they are significantly different.

Statistical Power

*The ability of a test to find an effect (type I or type II error) *This is the probability that a given test will find an effect assuming that one exists in the population. *The power of a test can be expressed as 1-beta *Therefore we usually aim to achieve a power of .8, or 80% chance of detecting an effect if one genuinely exists.

Population

*The collection of units (be they people, plankton, plants, cities, suicidal authors, etc) to which we want to generalize a set of findings or a statistical model.

Fit

*The degree to which a statistical model represents the data collected. *If our model is a poor fit of the observed data then the predictions we make from it will be equally poor.

The mean as a statistical model

*The mean is a hypothetical value (i.e. it doesn't have to be a value that actually exists in the data set). -it is a model created to summarize the data. *As such, the mean is a simple statistical model.

The mean

*The mean is the value from which the (squared) score deviate least (it has the least error)

Method of least squares

*The principle of minimizing the sum of squared error

Confidence intervals and statistical significance

1) 95% confidence intervals that just about touch end to end represent a p value for testing the null hypothesis of no differences of approximately .01. 2) If there is a gap between the upper end of one 95% CI and the lower end of another then p <.01 3) A p-value of .05 is represented by moderate overlap between the bars

The power of a statistical test depends on the following:

1) How big an effect size actually is, because bigger effects will be easier to spot. 2) How strict we are about deciding that an effect is significant. -the more strict the harder it will be to find an effect. 3) The sample size: larger sample sizes = less sampling error

Reasons why one-tailed tests are not a good idea:

1) If the result of a one-tailed test is in the opposite direction to what you expected, you cannot and must not reject the null hypothesis!! 2) A related point is that one tailed tests are appropriate only if a result in the opposite direction to that expected would result in the same action as a non-significant result. 3) Finally, one tailed tests encourage cheating. **Bottom line is that you should use one tailed tests only if you have a very good reason to do so.

Building statistical models

1) Scientists build statistical models of real-world processes in an attempt to predict how these processes operate under certain conditions. 2) The statistical model we build must represent the data collected (the observed data) as closely as possible. *Testing hypotheses involves building statistical models of the phenomenon of interest.

What can we conclude from statistical significance testing? (pg 75)

1) The effect is important -statistical significance is not the same thing as actual importance because the p-value from which we determine significance is affected by sample size. 2) A non-significant result means that the null hypothesis is true? -no, a non significant result tells us that the effect is not big enough to be found but it doesn't tell us that the effect is zero. 3) A significant result means that the null hypothesis is false? -no

Use the total error

1) We could just take the error between the mean and the data and add them. -we would get a 0 because some are positive and some are negative so deviations cancel out. 2) Therefore, we square each deviation 3) If we add these squared deviations we get the Sum of squared errors (SS)

We can use the _________________ and the ______________ to assess the fit of a model.

sum of squared error; mean squared error *Large values relative to the data indicate a lack of fit.


Kaugnay na mga set ng pag-aaral

NUR101 EXAM 3: Assessment Objectives (Units 4 & 5, SL 5)

View Set

Excel GO! - Complete Study Guide 1-3, 5-6

View Set