Exam #2, Chapters 8,12,13,14
random sampling
a subset of a statistical population in which each member of the subset has an equal probability of being chosen
critical value
actual score that falls at the alpha level
central limit theorem
application? info required? creates sample distribution of the mean without actually taking many sample means. gives you M, SD of the sampling distribution of your known M, SD of the population, and N, gives you the shape of the sampling distribution of the mean, gives population mean and variance, approaches a normal distribution as N increases, (N>30 = normal)
alpha level
area under the curve of the rejection region, the probability of Type I error in any hypothesis test
rejection region
area where there is an extremely low probability value of the null hypothesis being true
lower the N, flatter the distribution, DF=N-1
basic ides of the shape of the t distributions and how it varies by degrees of freedom
2, 1
how many critical values are there for a 2-tailed test? one tailed?
compare data to the prediction, and decide if retains or rejects
how to make a decision
H0:muD = 0 H1:muD does not equal 0 (2 tailed) H1:muD>0 (one tailed positive) H1:muD<0 (one tailed negative) (MuD=Mu1-Mu2)
hypothesis notation for a paired t-test
H0:mu1-mu2=0 (null) H1:mu1-mu2 does not equal 0 (2-tailed) H1:mu1-mu2<0 (one tailed negative) H1:mu1-mu2>0 (one tailed positive)
hypothesis notation for independent samples t-test
Mu1-Mu2
in all casses MusubscriptD =_________________
degrees of freedom
is a correction factory, says that given a particular sample mean only this many observations are really free to vary
the prediction of the researcher (research question); the data itself
decision on one or two tailed test is based on __________________________ NOT __________________
95%
default confidence interval
N-1
degrees of freedom calculation for a one-sample t-test
(N1+N2)-2
degrees of freedom calculation for independent t-tests
independent samples t-test formula
denominator often written as standard error of the difference
z-test: known population M and SD denominator is population SD/rootN can compare z-value with critical value using normal distribution table t-test: hypothesized M, unknown SD population SD estimated using sample SD denominator is sample SD/rootN cannot use Appendix E.10 to compare t-values with critical values
differences between Z-test and t-test
discrete probability distribution
distribution of a variable that can only take on certain values, tells you the probability for the possible outcomes x axis = possible values y axis = probability of each value e.g. number of times heads will be flipped on 2 flips
continuous probability distribution
distribution that takes on a range of possible values, probability that a score falls in a certain slice of a distribution x axis = possible values y axis = density (probability) of each value e.g. maternity age at first child birth
standard error
first symbol (before the = sign)
row = calculated degree of freedom column = alpha level and one vs. 2 tailed test (if extreme enough to be in the rejection region, reject the null and retain the alternative, etc..) need to know the direction is 1 tailed test or 2. look at the dfm (one-tailed alpha level is .05
how to find critical values using appendix E.6
one-sample t-test
used when the population variance is unknown, compares sample mean to a hypothesized population value, need to have a target sample mean, target sample standard deviation, sample size, population mean assuming the hypothesized mean of the population (null hypothesis), computed in a similar way to z-statistic but we use the sample SD to estimate the population SD., need to know target sample mean, target sample SD, target sample size, hypothesized mean
z-test
useful when comparing a sample mean to a population and when you know the population mean and the population variance (OR SD), construct sample mean to a z score using the central limit theorem , find the probability of getting that z score using area under the curve, must know population mean and SD
each tail of alpha for a 2-tailed test
(1-C)/2
area under 1 extreme
1 tailed alpha level
alpha level
1-C
confidence level
1-alpha =.95, "I have 95% confidence that the mean is between this particular score and that particular score".
sum of the area under both extremes
2 tailed alpha level
one-sample, independent
2 types of confidence intervals
"...95% CI [bottom z*, top z*]
APA style confidence interval
confidence level
C
population mean of difference scores for paired t-test
Mu with a subscript D
population mean of score set #1 for paired t-test
Mu1 (first one without the squiggle)
population mean of score set #2 for paired t-test
Mu2 (second one after the minus sign without the squiggle)
number of difference scores
N subscript D
standard deviation of difference scores
S subscript D
standard error of difference scores
S subscript Dbar
paired t-test formula
SD is OVER rootN
standard error
SD of the sampling distribution of the mean
sample size, larger
__________ matters in a sample distribution; _______________ samples will have means closer to the population size
sampling distribution
allows us to quantify sampling error more precisely, need to know how to quantify sampling error for more precise judgement, probability distribution under repeated sampling from the population of a given statistic, smoothed histogram that shows how often you get different results when sampling from a population, most interested in this of the mean, distribution of sample means not a distribution, plots relative frequency of sample means as a histogram, numbers refer to population parameters, can be created from any statistic,
1-data are sampled from a normally distributed population 2-difference scored is approximately normal
assumptions of a paired t-test
1-data are independently sampled from a normally distributed population but is robust to violation of normality assumptions 2-mean of the population and SD are known
assumptions of a z-test
sample comes from a non-special population
assumptions of hypothesis testing
1-data are independently sampled from a normally distributed population but also robust enough to violations of normality assumptions
assumptions of one-sample t-test
1- data are independently sampled 2- from a normally distributed population 3- variance of the two groups is the same in the population 4-normality assumption 5-homogeneity of variance assumption
assumptions of the independent samples t-test
confidence intervals
calculate a range of potential population means based on our sample data, asks what population means are plausible, use critical values to give a range of means where the null is retained, always 2 tailed
proof (proves), these findings show support for
can never use the word _____________ when stating non-statistical or statistical evidence. Instead, you use phrases like ______________________.
z-test formula
change X to Xbar, and the denominator to (little)sigma/the square root of N
one-sample confidence interval
confidence interval of population mean (mu, one-sample t-test is opposing statistical test)
independent confidence interval
confidence interval of population mean difference (mu1-mu2, independent samples t-test is opposing statistical test), constructs a range of possible values of mean difference
z score that cuts off the upper 0.05 (5%), z score that cuts off 0.025 (2.5%) on either side
critical value of z for a 1 tailed tests and 2 tailed test
+/- 1.65
critical values of z for a 1 tailed test
+/- 1.96
critical values of z for a 2 tailed test
italicize notation letters but NOT numbers, include degrees of freedom in parentheses, if t falls in rejection region, write "p<.05" or report exact p (is using SPSS, if not either exact p is available or write n.s. ((meaning not significant))) e.g. "t(11)=0.53, n.s. (with italics), e.g. "t(7)=-2.50, p<.05"
how do you report the results of a t-test in APA style?
Xbar2>Xbar1, evidence in favor
if you have a negative mean difference of independent confidence intervals...
reject, retain
in statistical hypothesis testing, you either ________________ or _____________ the null hypothesis
1- need info: M and SD from 2 independent samples 2- appropriate test: independent samples t-test 3-t=X1bar-X2bar/root(S1^2/N1+S2^2/N2) 4-df=(N1+N2)-2 5-two tailed test, .05 alpha level, use Appendix E.6 6-compare calculated t statistic to critical values (2 tailed = 2 critical values positive and negative, if >or< critical values, reject the null)
independent samples t-test steps
central limit theorem notation
muXbar = mean of a sampling distribution of the mean (little)sigmaXbar= SD of the sampling distribution of the mean (standard error)
the 95%
multiplying t.05*standard error in the confidence interval equation gives us...
non-statistical hypothesis testing
must weigh evidence for and against 2 hypotheses (e.g. guilty, innocent), makes decisions based on the evidence presented
Type I error
reject the null when the null is true, true in nature but decide it isn't true (convicting an innocent person)
the alternative hypothesis is retained and the likelihood of random chance is slim to none
rejecting the null means...
0.05
rejection region on a 1 tailed test z test (1 tailed significance number)
0.025
rejection region on a 2 tailed test z test (2-tailed significance number)
non-directional tests
rejects the null if observed mean is substantially higher or lower than the population mean (2-tailed) use if not sure of a certain value in either direction
directional tests
rejects the null only if mean difference is in expected direction (one-tailed because only values in 1 tail towards rejecting the null) want in one direction because you are sure of a certain value
the range of means based on alpha level and standard error, for a given alpha level it gives you the likely range of population means, confidence interval is the area that retains the null, alpha level is the area that rejects the null (rejection regions)
relationship between alpha level and confidence interval
confidence level and alpha MUST ALWAYS sum to 1 because of the way we define hypothesis testing (confidence level = retain null, alpha level = reject null, therefore alpha level + confidence level = 1), default confidence level is 95% of the confidence interval, default alpha level is 5% of the alpha level
relationship between alpha level and confidence level
Type II error
retains the null when the null is false, didn't get a mean significance enough to reject the null hypothesis (letting a guilty person go)
difference scores
score #1 - score #2, transforms 2 sets of related scores into one set of scores that can be used to calculate significance
target sample
specific single sample being studied
1-establish what info you have 2-decide if the t-test is appropriate (if yes, continue) 3-calculate the t-statistic using formula 4-calculate degrees of freedom 5- use appendix E.6 to find critical values (row = df, column = alpha level and one vs. 2 tailed test) 6-compare your t-statistic to critical value (if more extreme than critical value reject null, if not retain null)
steps of a one-sample t-test steps
1- establish alpha level, rejection region, and critical values (critical values = z score) 2-find standard error of the population 3-convert observed sample mean to z-score 4-compare calculated z score to the critical value 5- reject or retain the null
steps of a z-test
1-state hypothesis 2-set criteria for your decisions 3-collect data and compute sample statistics 4-make decisions
steps of creating a sample distribution
1-determine direction or no direction of the test 2-set the alpha level 3-create sampling distribution 4-find critical values to draw rejection regions 5-reject/retain the null hypothesis based on whether the target sample is in the rejection region
steps of hypothesis testing
1-subtract X2 from X1 to get the mean difference (mean of difference scores) and SD of difference scores 2-establish what info you have (add in hypothesis notation) 3-decide if the t-test is appropriate (if yes, continue) 4-calculate the t-statistic using formula 5-calculate degrees of freedom 6- use appendix E.6 to find critical values (row = df, column = alpha level and one vs. 2 tailed test) 7-compare your t-statistic to critical value (if more extreme than critical value reject null, if not retain null)
steps to a paired t-test
confidence interval formula one sample
t.05
sampling error
the idea that is we take many random samples from the same population, the statistics we obtain will vary across samples, introduces error in a statistical sense because values of error in a sample differ from the population
beta level
the probability of Type II error in any hypothesis test
alternative hypothesis
the research hypothesis, opposite of the null, what you are trying to prove is not due to chance
paired t-test
understanding, application, info required, use when we have paired of related/dependent observations, (e.g. pre and post tests, pairs from significant others/family, longitudinal data), analyzes difference scores
independent samples t-test
understanding, application, info required, research situations most common test in actual research, unrelated groups/variables being measured, compares 2 unrelated sample means, want to know is samples could have come from populations with the same mean (if true, difference in group means is due only to sampling error)
standard error of the difference
used in an independent t-test as the denominator of the formula
standard alpha level
usually set of .05. Assuming that the null hypothesis is true, this means we may reject the null only if the observed data are so unusual that they would have occurred by chance at most 5 % of the time
z-test, one-sample t-test
what are the 2 types of one group/sets of data tests?
paired t-test, independent samples t-test
what are the 2 types of two group/sets of data tests?
make the alpha level 0.01, find the critical values of 0.1 of the alpha level based on the calculated degrees of freedom, and proceed with that new t value
what do you do if question asks for 99% instead of 95%?
null hypothesis
what we are trying to reject, no association between 2 variables, hypothesis of no difference between groups or no relationship, rejection of this is desirable because it means we have an effect
could mean an advantage for 1 group over the other, represents a range in which you would retain the null
when the confidence interval is "...95% CI [-, +] in independent confidence interval formulas
They allow us to rule that out the null hypothesis and prove the alternative process, helps create regions of rejection that reject the data is due to random chance
why are sampling distributions important for hypothesis testing?
critical value
z*