Module 9: Inferential Statistical Methods
Standard error of the mean
aka: SEM the standard deviation of a sampling distribution of the mean the smaller the SEM- that is, the less variable the sample means- the more accurate are the means as estimates of the population value SEM = Standard deviation / √(Sample size) increasing the sample size will increase the accuracy of the estimate
Multiple comparison procedures
aka: post hoc tests statistical analysis that's function is to isolate the differences between group means that are responsible for rejecting the overall ANOVA null hypothesis
The alternate hypothesis
aka: the research hypothesis (HΑ) hypothesizes that there is a relationship between the variables
McNemar's test
appropriate when the proportions being compared are from 2 paired groups (eg. when a pretest-posttest design is used to compare changes in proportions on a dichotomous variable)
Inferential statistics
based on the laws of probability, provide a means for drawing conclusions about a population, given data from a sample
A critical region
by selecting a significance level, researchers establish a decision rule to reject the null hypothesis if the test statistic at or beyond these limits indicates whether the null hypothesis is improbable given the results
A one-sample t-test
can be used to compare mean values of a single group to a hypothesized value
Nonparametric tests
do not estimate parameters involves less restrictive assumptions about the shape of the variables' distribution than do parametric tests aka: distribution-free statistics most useful when data cannot in any manner be construed as interval-level, when the the distribution level is markedly non-normal, or when the sample size is very small
Parametric tests
involve estimation of a parameter, require measurements on at least an interval scale, and involve several assumptions, such as the assumption that the variables are normally distributed in the population more powerful and preferred than nonparametric tests
Point estimation
involves calculating a single descriptive statistic to estimate the population parameter convey no information about margin of error and so inferences about the accuracy of a parameter cannot be made with point estimations
The Kruskal-Wallis test
nonparametric based on assigning ranks to the scores of various groups used when the number of groups is greater than two- and a one-way test for independent samples is desired
Sum of squares within groups
or SSW the sum of squared deviations of each individual score from its own group's mean indicates variability attributable to individual differences, measurement error, and so on
Sum of squares between groups
or the SSB the sum of squared deviations of individual group means from the overall grand mean for all participants. SSB reflects variability in scores attributed to the independent variable
Two broad classes of statistical tests
parametric and nonparametric
Degrees of freedom
refers to the number of observations free to vary about a parameter abbreviated df
Type I error
rejecting a null hypothesis that is true a false positive conclusion
The null hypothesis
states that there is no relationship between variables it cannot be demonstrated directly that the research hypothesis is correct but, using theoretical sampling distributions, it can be show that the null hypothesis has a high probability of being incorrect Researchers seek to reject the null hypothesis through various statistical tests
Test for dependent groups
the appropriate statistical test in crossover design since the same people are used in all conditions
A sampling distribution of the mean
the basis of inferential statistics; theoretical a frequency polygon using the means of multiple samples drawn from a population When data are normally distributed, 68% of the values fall in the 1 SD from the mean
The Mann-Whitney U test
the nonparametric analog of an independent group's t-test, involves assigning ranks to the two groups of scores the sum of the ranks for the two groups can be compared by calculating the U statistic
A binomial distribution
the probability distribution of the number of "successes" (eg. heads) in a sequence of independent yes/no trials (eg. coin toss), each of which yields "success" with a specified probability
Sampling error
the tendency for statistics to fluctuate from one sample to another
Central limit theorem
the theory that when samples are large, the theoretical distribution of sample means tends to follow a normal distribution- even if the variable itself is not normally distributed in the population With small sample sizes, you cannot rely on the central theorem, so probability values could be wrong if a parametric test is used
Two-tailed tests
used in most hypothesis testing situations both tails of the sampling distribution are used to determine improbable values
Parameter estimation
used to estimate a parameter- for example, a mean, a proportion, or a mean difference between groups can take two forms: point estimation or interval estimation
The chi-squared test
used to hypothesize about group differences in proportions, as when a crosstabs table has been created computed by comparing observed frequencies (ie. values observed in the data) and expected frequencies enables us to decide whether a difference in proportions of this magnitude is likely to reflect a real treatment effect or only chance fluctuations expected frequencies are the cell frequencies that would be found if there was no relationship between two variables
Fisher's exact test
used to test the significance of differences of proportions when total sample size is small or when there are cells with small frequencies (5 or fewer)
The Wilcoxon signed-rank test
used when ordinal-level data are paired (dependant) involves taking the difference between paired scores and ranking the absolute difference
Interval estimation
useful because it indicates a range of values within which the parameter has a specified probability of lying with interval estimation, researchers construct a confidence interval (CI) around the estimate the upper and lower limits are confidence limits calculating confidence limits involves using the SEM
Test for independent groups
when comparisons involves different people (eg. men vs. women), the study used a between-subjects design
Friedman test
when multiple measures are obtained from the same subjects test for "analysis of variance" by ranks can be used
Statistically significant
when researchers calculate a test statistic that is beyond the critical limit significant means that obtained results are not likely to have been the result of chance at a specified level of probability
Statistical inference consists of 2 techniques:
1. Estimation of parameters 2. Hypothesis testing
The process of testing hypothesis
1. Select an appropriate test statistic 2. Establish the level of significance 3. Select a one-tailed or two tailed test 4. Compute a test statistic 5. Determine the degrees of freedom 6. Compare the test statistic with a tabled value
Researchers can make 2 types of statistical error:
1. rejecting a true null hypothesis or 2. accepting a false null hypothesis
Negative result
A non significant result means that an observed result could reflect chance fluctuations when the null hypothesis is retained (ie. when the results are nonsignificant) negative results are usually inconclusive and difficult to interpret a nonsignificant result indicates that the result could have occurred as a result of chance and provides no evidence that the research hypothesis is or is not correct
Analysis of variance
ANOVA the parametric procedure for testing differences between means when there are 3 or more groups computes the statistic, F-ratio decomposed total variability in a dependent variable into two parts: 1. Variability attributable to the independent variable and 2. All other variability, such as individual differences, measurement error, and so on variation between groups is contrasted to variability within groups to get an F-ratio when difference between groups are large relative to variation within groups, the probability is high that the independent variable is related to, or has caused, group differences
Main effects vs. interaction effects
Interaction concerns whether the effect of one independent variable is consistent for all levels of a second independent variable
One-way vs Two way vs Repeated-measures ANOVA
One-way ANOVA- when there are 3 or more groups Two-way ANOVA- when there is more than one independent variable Repeated-measures ANOVA- (RM-ANOVA) when there are multiple means being compared over time
Statistical tests to measure the magnitude of bivariate relationships and to test whether the relationship is significantly different from zero includes...
Pearson's r for continuous data Spearman's rho and Kendall's tau for ordinal-level data The phi coefficient and Cramer's V for nominal-level data. A point-biserial correlation coefficient can be computed when one variable is dichotomous and the other is continuous
Test statistics
Researcher compute test statistics with their data. For every test statistic, there is a related theoretical distribution. The value of computed test statistic is compared to values of the critical limits for the applicable distribution.
Level of significance
Researchers control the risk of a Type I error by selecting a level of significance, which signifies the probability of incorrectly rejecting a true null hypothesis The 2 most frequently used significance levels (referred to as alpha, or α) are .05 and .01 the minimum acceptable level for α usually is .05 a stricter level (.01) may be needed when the decision has important consequences
The mean square
The variance is conventionally referred to as the mean square or MS
Paired t-test
When means for the two sets of scores are not independent, researchers should use a paired t-test- a t-test for dependant groups
One-tailed test
When researchers have a strong basis for a one-direction hypothesis, they sometimes use a one-tailed test in one tailed tests, the critical region of improbability values is in only one tail of the distribution- the tail corresponding to the direction of the hypothesis one-tailed tests are less conservative, thus it is easier to reject the null hypothesis with a one-tailed test one-tailed tests are controversial
Type II error
a false negative conclusion accepting a false null hypothesis
T-test
a parametric procedure for testing differences in groups can be used when there are 2 independent groups and also when the sample is dependent