Exam 1
Confidence interval
an interval believed to contain the parameter. range of values that surround the stat
if p<0.05
it is statistically significant and reject null
z-score depends on
confidence level
find the proportion of the normal curve that corresponds to z > 1.50 a) p = 0.9332 b) p = 0.5000 c) p = 0.4332 d) p = 0.0668
d
descriptive statistics vs inferential statistics
descriptive statistics: describes data (avg mean, std, summarize sample) inferential statistics: make inferences about causation (t-test, correlation coefficient) -> used to analyze data to tthen make inferences about population
cluster
set of data points grouped together
Cl for a proportion: what percent of 18-22 year old americans report being "very happy"? GSS data: 35 of n=164 say they are very happy. 31% for all ages say they are very happy. use a 95% Cl
1. 0.100 - 0.95 = 0.05 --> need z-score value of 0.05 -->0.05/2=0.025--> 1.96 for 95% Cl: 0.213 + - 1.96(0.032) = 0.213 + - 0.063 the limits of the 95% Cl are 0.15 and 0.28. We are 95% confident the population proportion who are "very happy" is btw 0.15 and 0.28
95% Cl for difference btw two reaction times (mean) Example (reaction times): Group 1: Ȳ1 = 585.2 milliseconds, s1 = 89.6, n = 32 Group 2: Ȳ2 = 533.7, s2 = 65.3, n = 32
51.4 ± 39.2, or (12.2, 90.6). We are 95% confident that the population mean reaction time (μ1) for Group 1 is between 12.2 and 90.6 milliseconds higher than the population mean reaction time (μ2) for Group 2.
Cl comparing two proportions (independent samples): "Have you engaged in unplanned sexual activities because of drinking alcohol?" 1993: 19.2% yes of n = 12,708 2001: 21.3% yes of n = 8783 What is 95% CI for the change in proportion saying "yes"?
95% CI for change in population proportion: 0.021 ± 1.96(0.0056) = 0.021 ± 0.011, or (0.01, 0.032).
effect size: for two groups Example: Therapy A, mean = 20; Therapy B, mean = 40; Average (pooled) standard deviation, s = 9.35
Cohen's d = (40 - 20)/9.35 = 2.1. Mean for therapy B is about two standard deviations larger than the mean for therapy A.
what does confidence intervals tell us
Confidence levels tell us the long-term rate at which a certain type of confidence interval will successfully capture the parameter of interest.
in general, hypothesis testing significance test for difference of means is
Null hypothesis: No difference; "no effect" H0: μ2 - μ1 = 0 (μ1 = μ2) Ha: μ2 - μ1 ≠ 0 (μ1 ≠ μ2)
Parameter vs. Statistic
Parameter: a characteristic or measure of a POPULATION Statistic: a characteristic or measure of SAMPLE
Membership in MENSA requires a score of 130 on the Stanford-Binet 5 IQ test, which has μ = 100 and σ = 15. What proportion of the population qualifies for MENSA? a) p = 0.0228 b) p = 0.9772 c) p = 0.4772 d) p = 0.0456
a
Which is an example of a parameter: a) mean score of all US students on a measure of life satisfaction b) mean score of all randomly selected sample of CofC students on a measure of life satisfaction c) mean score of one math class grades to be a sample of the population of all math classes,
a
box plots
a discriptive statistic, can be used for categorical data such as happiness and family income
what is the assumption with inferential statistics
assumes the sample is representative of the pop
which is an example of a t-test for paired samples: a) post-test - pre-test score b) whether first year graduate salaries differed based on gender c) compare cholesterol levels in 1952 and cholesterol levels in 1962 for each subject. d) both a and c
d
independent vs dependent variable
independent: manipulated/controlled by experimenter dependent: observed; assess effect of manipulation of the manipulation (treatment)
se for difference btw two means Example (reaction times): Group 1: Ȳ1 = 585.2 milliseconds, s1 = 89.6, n = 32 Group 2: Ȳ2 = 533.7, s2 = 65.3, n = 32
mean difference = 51.4 se = 19.6
what happens to the mean when the distribution is skewed
mean is pulled in direction of longer tail, relative to median b/c mean is sensitive to "outliers". thus the mean isnt good measure of center.
central tendency
measure of center/avg mean and median are two ways to finding central tendency symmetric distribution, mean = median normal bell-shaped curve
correlation coefficent (r)
measures the strength and direction of association btw two numerical variables
Cl comparing two means (independent samples) the df would be
n-2
if you had a proportion of 0.114 and Cl is (-0.0151, 0.237) is it statistically significant?
no, zero is included
define sampling error, why it is always present, and how you minimize it
numerical difference btw statistics and parameter always present b/c sample isnt identical to population minimize this by taking random sample from pop
common classifications for variables would be all of the following except: a) categorical b) quantitative c) discrete d) continuous e) parameter
parameter
in a study looking at how tutoring impacts test scores, what would be the dependent variable and the independent variable?
participants' test scores = dependent tutoring = independent
Population vs sample
population: entire population, all units of interest - mean, variance, etc. in the larger population sample: subset of units from population - mean, variance, etc. in the subset of units
what is the benefit to using random sampling ?
reduces statistical bias statistical bias = when a stat/stat model is unrepresentative of population
calculate the variability: Life Satisfaction scores in a sample of college students (n = 9): y = 2, 3, 7, 5, 6, 7, 5, 6, 4
s = 1.7
what is variability?
sample variance (variance is always positive) and standard deviation
best way to get unbias statistics
simple random sampling: each possible sample of size n has the same chance of being selected
distribution of z-scores
standard normal distribution
what is prefered for skewed distributions
the mean is sensitive to "outliers" therefore the median is preffered
if a 95% Cl includes zero
there will be no statistical significance
Cl for a population mean: Anorexia study Weight measured before and after treatment y = weight at end - weight at beginning (y = change in weight) For n=17 girls receiving "family therapy": y = 11.4, 11.0, 5.5, 9.4, 13.6, -2.9, -0.1, 7.4, 21.5, -5.3, -3.8, 13.4, 13.1, 9.0, 3.9, 5.7, 10.7 Mean = 7.265, Standard Deviation = 7.157 calc. 95% Cl
use t-distribution b/c we are given sample mean and std Standard Error: see = s/squareroot(n) Since n = 17, df = 16, t critical value for 95% confidence is 2.12 95% CI for population mean weight change is: 7.265 + - 2.12(1.736) or (3.58,10.94) We are 95% confident that µ is between 3.58 and 10.94 pounds. Common interpretation. Technically—with repeated random samples, 95% of the 95% CIs would contain µ. Is ours one of the 95%? We hope so. 5% chance of Type I error.
if the p-value is greater than 0.05
we fail to reject the null hypothesis, the result is not "statistically significant"
What do we use statistics for?
we use statistics to make inferences about parameters
central limit theory
when n >= 30, the distribution of sample means is approximatley normally distributed more "perfectly" normal as n gets larger the mean of the distribution of sample means is an ubiased statistic its an unbiased estimate of pop. mean
if you had a proportion if 0.484 and Cl is (0.3995, 0.570) is this statistically significant?
yes because zero isn't included
Cl for a proportion: what percent of 18-22 year old americans report being "very happy"? GSS data: 35 of n=164 say they are very happy. 31% for all ages say they are very happy. use a 99% Cl
z-score is now 2.58 b/c .100-.99 = .01 --> need a z-score value of 0.01-->0.01/2=0.05 --> 2.58 99% Cl: 0.213 + - 2.58(0.03) = .213 + - 0.083 limits (0.13, 0.30) note that greater confidence requires wider Cl
when do you use z-test vs t-test
z-test is when the population standard deviation is known if the sample size is less than 30 or the population parameters is not known use a t-test