STA2023 Exam 2
z score of 90, 95, 99, 92, 97 confidence levels what do you do if you don't know the z score?
1.645, 1.96, 2.58, 1.75, 2.17 you subtract the value from 1 and divide by two. then find that tail area in the middle of the z score table.
what percentiles would u use if u wanted to compute the 90% CI w bootstrap
5th and 95th
The distribution of a statistic is called
A sampling distribution of the statistic
What are the two types of statistical inferences
Confidence intervals and significance tests
X in sampling
Count of successes, binomial
Confidence intervals
Given a region that is likely to contain a parameter and we have no preconceived notion of what our parameter should be; we simply want to estimate it
random variable
a numerical measurement of the outcome of a random phenomenon.
a sampling distribution refers to the distribution of
a sample statistic
Significance tests
statistical tests that show how likely it is that a study's results occurred merely by chance 1. Check to see which claim ab the population is supported by the data 2. Someone proposes a value of a parameter (pop). We disagree w that value, so we take a sample to try to see if the data supports that claim, or if it supports what we believe is true. 3. They have a very elaborate vocab, but the basic idea behind it is simple
difference between t and z tables
t tables are more spread out (fatter tails and lower peaks) than z tables
how to read the notation for a t distribution
t(right tail area) EX: t(.01) is a t-score with a probability .01 to the right
a smaller spread means?
that we have more values of the estimate closer to the parameter being estimated
what two things does the margin of error depend on
the confidence level we want and the standard error of our estimator
margin of error depends on
the confidence level we want, and the standard error of our estimator
distribution of a random variable (last digit of a phone number) vs. the avg of these random variables
the distribution of a random variable is expected to be uniform and discrete. the distribution of the avg of the random variables should be centered around the mean, more bell shaped, and a lot less discrete
Sampling distribution
the distribution of values taken by the statistic in all possible samples of the same size from the same population The distribution of the sample statistic (p hat)
sampling distribution of x bar
the distribution of values taken by the statistic in all possible samples of the same size from the same population distribution of the statistic (x bar); you take repeated samples and plot them to see the pattern
standard error
the standard deviation of a sampling distribution
what do you first identify when working on sampling distribution problems
the type of problem: sample mean or sample proportion and if it is normal
for 95% of all random samples, that formula will produce an interval that contains the _____ and only 5% of samples will ____
the unknown parameter P, give intervals that miss p
for samples greater than the amount of 30, we use what t value
the z value given for that particular confidence interval
how do you reduce bias?
use random samples x bar and p hat are both unbiased because they are centered around mu and p
how do we figure out how far off a sample prediction will be?
use the normal distribution 90, 99, 95
will higher confidence levels have wider or slimmer intervals
wider
CIs for pop mean are always centered around ____
x bar
estimator of mu is
x bar
confidence interval equation for a population mean (mu)
x bar +/- t(s/√n)
symbols for sample statistic mean, standard deviation, and proportion
x bar, s, p hat
what sample statistic estimates the population parameters of mean, standard deviation, and proportion
x bar, s, p hat
z for sampling distribution of x bar and p hat
x bar: z = (x bar - mu) / (st dev / square root of n) p hat: z = (p hat - p) / (square root of p(1-p)/n)
general formula for z
z = observation - mean / st dev
population proportion equation for n
z^2*p hat*1-p hat / m^2 ANSWER MUST BE INTEGER, ROUD UP NO MATTER THE DECIMAL VALUES
the exact standard error of the sample proportion equals
square root of p(1-p)/n can use p hat to estimate
standard error of the sampling distribution of x bar equation
standard error of x bar = st dev of x / square root of n
what are the values of mean, standard error, and shape of the sampling distribution of x bar when x is a random variable
1. the same the mean of original distribution ( Mx bar = Mx) 2. smaller than the standard dev of the original distribution 3. regardless of shape of the original distribution (x) the distribution of x bar becomes bell shaped as n inc.
confidence interval for a population mean is valid if
1. the original distribution is normally distributed OR n is large 2. data is from a random sample (SRS) or randomized experiment
population mean equation for n
(z*s/m)^2 ANSWER MUST BE INTEGER, ROUND UP NO MATTER THE DECIMAL VALUE
what value of p hat do we guess when not given one (for CIs)
.05 if you have no clue or use a guess for p hat from a previous study if available
Null hypothesis
1. A statement or idea that can be falsified, or proved wrong; states what we want to disprove 2. Ho is the symbol 3. P = Proportion given
for large sample sizes... (5) CI for pop mean
1. CLT guarantees that the sample mean has a normal distribution when n is large 2. s is a good estimator of pop st dev 3. t distribution gets close to normal -- think of the z distribution as a t w df = infinity 4. makes very little difference to use z or t 5. if df are not on table we can use the z table instead. minicab can give exact values.
Assumptions of significance tests
1. SRS 2. Categorical data 3. NPo >/= 15 4. N(1-Po) >/= 15 This is used to make sure p hat is approx normal
Alternative hypothesis
1. States what we want to prove 2. Ha is its symbol 3. P is either >, <, or not equal to the proportion given (same value for null)
sample proportion characteristics (5)
1. data is categorical (yes/no) 2. X = number of successes in sample ; original distribution/population 3. p hat = proportion of successes in sample (= x/n) it is a STATISTIC 4. sampling distribution: p hat N(p, square root of p(1-p)/n) 5. np >/= 15 AND n(1-p) >/= 15 for it to be normally distributed
sample mean characteristics (5)
1. data is quantitative 2. X = one individual measurement ; original distribution/population 3. x bar = sample mean which is a STATISTIC 4. sample distribution: x bar N(mu, st dev of pop/square root of n) - aka pop mean, standard error 5. n >/= 30 or if stated population is normal (aka if x is normal)
t distribution characteristics (2)
1. family of distributions indexed by their degrees of freedom 2. all symmetric and bell-shaped, all centered at zero
how large does the sample size, n, have to be?
1. it depends on the shape of the original population 2. if the population is normally distributed, the sampling distribution of x bar will be normal for any n 3. if the population is far from normally distributed, n=30 is large enough in most cases for the sampling distribution of x bar to be considered normal 4. in general, the closer to normal (bell shaped) the original distribution is, the smaller n needs to be 5. and for any shape distribution, as n inc, the sampling distribution of x bar will get closer to normal
how do you compute a bootstrap confidence interval
1. re-sample w replacement from the original sample to create a new sample of the same size as the original. compute a sample statistic 2. using a statistical software package, re-sample thousands of times (10000) and compute a new sample statistic each tome 3. this will result in thousands of sample stats for the bootstrap samples 4. to find the 95% confidence interval, find the central 95% of the sample stats by using the 2.5th percentile and the 97.5th percentiles
for small sample sizes... (6) CI for pop mean
1. t procedures are very sensitive to skewness or outliers in the go population 2. s might be far from pop st dev 3. t distribution still far from normal 4. need to use t table AND need original population to be normal 5. impossible to check population, we only have a small sample 6. plot data and make sure it could have come from a normal distribution. perfect symmetry of the sample is not important, but there should be no major outliers
the length of intervals depend on what three things
1. the confidence level (determined by t) 2. the standard deviation, s 3. the sample size, n
Does p hat have a binomial distribution
No, it can be approx normal under the right conditions
When sample size increases what happens to the graph
Normal graph becomes better, st dev decreases, and it is less skewed
Mean of the standard distribution of p hat
P
The sampling distribution of p hat is approximately ___ with a mean = ____ and standard error = _____ as long as the expected number pf successes (____) and failures (_____) are each ____ or larger
P , standard error equation, np, n(1-p)
Normal approximation (and what it has to be in between) for proportions
P hat = N(p, square root of p(1-p)/ n) N p must be greater than or equal to 15 N(1-p) must be greater than or equal to 15
Statistics are ____ variables, which have ______
Random, distribution
Standard error of the distribution of p hat
Square root of p(I-p)/n
Sample proportion
Symbol: p hat Data: categorical, not binomial but can be normal if w the right conditions p hat = x/n = number of successes / n P hat should be the mean
Sample mean
Symbol: x bar Data: quantitative
The sampling distribution of the statistic is
The distribution that specified all possible value a statistic can take and a pattern that emerges if we take many samples and compute the statistics from each one
statistical inference
The process of using data obtained from a sample to make estimates or test hypotheses about the characteristics of a population.
central limit theorem
The sampling distribution of the mean will approach the normal distribution as n increases (n>30). for a random and representative sample (SRS) with a large sample size n, the sampling distribution of the sample mean is approximately normal with mean nu (same as original distribution) and standard error sigma / square root of n (the original standard deviation divided by the square root of n)
How can we study the sampling distribution of a statisitc
Through simulations
Why do we say standard error
We use this to refer to the standard deviation of sampling distribution, and to distinguish it from the st dev of an ordinary probability distribution
X, x bar, x bar estimate, and how results would look from different samples (sampling distribution of x bar)
X = heights x bar = avg of the numbers x bar estimate = the population mean the results would vary around the mean if we took different samples
why is the normal distribution the most important one in stats
bc normal distribution can be used to approximate the sampling distribution of the statistic
we can only talk about probability _____ we take the sample. _____ we talk about confidence
before, after
bigger confidence mean ____ interval
bigger
bootstrap confidence interval
calculating non-parametric confidence intervals for parameter estimates. In this context, the bootstrap simulates the frequentist concept of obtaining estimates from repeated similar experiments used for all parameters can be used when we can't find a confidence interval formula or a standard error formula mathematically. for example, we have a formula for the CI for the pop mean or proportion, but not for the confidence interval for the pop median, st dev, or mean
bias has to do with the ______ of the sampling distribution
center
where does an unbiased statistic have its sampling distribution?
centered at the parameter being estimated
When we make inferences about ONE POPULATION PROPORTION, what assumptions do we need to make?
data is categorical, successes and failures both are greater than or equal to 15, simple random sample
if we aren't given range in pop mean CI problems (to find n) what do we do
divide the range by 6
small variability, biased
dots are close together and outside the target
large variability, biased
dots are spread out and outside of the target
small variability, unbiased
dots within the target but close together
large variability, unbiased
dots within the target but spread out
point estimation
estimate an unknown parameter using a single statistic (e.g. xˉ , p̂ )
confidence interval (CI) for population proportion skeleton
estimator +/- margin of error
confidence intervals are of the form
estimator +/- margin of error p hat +/- z√(p hat * 1 p hat) / n
as df increases, t
gets closer to z
small sample method for confidence intervals for the pop proportion
if you don't have 15 failures or success, you can add 2 (only add 2 once) to your calculation of the sample proportion. still works if it doesn't make the value greater than 15 p hat = x+2/N+4
if you want a lot of confidence, but not a huge interval, what do you do
inc the sample size
how do you reduce standard error
increase n we know this bc n is always in the denominator
MARK ALL THAT ARE TRUE!! We can use the Normal (Z) table to find probabilities about: individuals, if the population is Normal individuals, if the population is NOT Normal averages based on small n, if the population is Normal averages based on small n, if the population is NOT Normal averages based on large n, if the population is Normal averages based on large n, if the population is NOT Normal count of successes out of n independent trials sample proportion of successes out of n independent trials, when np and n(1-p) is large enough
individuals, if the population is Normal averages based on small n, if the population is Normal averages based on large n, if the population is Normal averages based on large n, if the population is NOT Normal sample proportion of successes out of n independent trials, when np and n(1-p) is large enough
as n increases, what happens to the graphs
it becomes normal around the mean
symbols for population parameter mean, standard deviation, and proportion
lc mu, lc sigma, p
margin of error for a CI for pop mean is
m = t * s/√n
margin of error for a CI for p is
m = z√p hat * (1-p hat) / n
a smaller sample for CI of mu means
more variability and fatter tails
how do you find df if only given n
n-1 = df
when can you use the small sample alternative
only works for CI not for sampling distribution or sig test
the confidence interval for pop proportion is
p hat +/- z√(p hat * 1 p hat) / n
confidence intervals are statements about the ____ mean, not the ____ mean or about _____
population, sample, individuals
if we were to take a sample and calculate the sample mean, would the sample mean be exactly equal to the population mean
probably not, but close
if we were to take a sample and compute the sample proportion, would the sample proportion be exactly equal to the population proportion
probably not, but close
what do we use to estimate population st dev
s
in CIs, we usually have no control over ___, but we can control the _____ and the ______
s, confidence level, sample size
bigger sample size means ____ interval
smaller
larger n means
smaller standard deviation, less spread out, less deviation, more normal
standard error has to do with the ____ of the sampling distribution
spread