STA2023 Exam 2

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

z score of 90, 95, 99, 92, 97 confidence levels what do you do if you don't know the z score?

1.645, 1.96, 2.58, 1.75, 2.17 you subtract the value from 1 and divide by two. then find that tail area in the middle of the z score table.

what percentiles would u use if u wanted to compute the 90% CI w bootstrap

5th and 95th

The distribution of a statistic is called

A sampling distribution of the statistic

What are the two types of statistical inferences

Confidence intervals and significance tests

X in sampling

Count of successes, binomial

Confidence intervals

Given a region that is likely to contain a parameter and we have no preconceived notion of what our parameter should be; we simply want to estimate it

random variable

a numerical measurement of the outcome of a random phenomenon.

a sampling distribution refers to the distribution of

a sample statistic

Significance tests

statistical tests that show how likely it is that a study's results occurred merely by chance 1. Check to see which claim ab the population is supported by the data 2. Someone proposes a value of a parameter (pop). We disagree w that value, so we take a sample to try to see if the data supports that claim, or if it supports what we believe is true. 3. They have a very elaborate vocab, but the basic idea behind it is simple

difference between t and z tables

t tables are more spread out (fatter tails and lower peaks) than z tables

how to read the notation for a t distribution

t(right tail area) EX: t(.01) is a t-score with a probability .01 to the right

a smaller spread means?

that we have more values of the estimate closer to the parameter being estimated

what two things does the margin of error depend on

the confidence level we want and the standard error of our estimator

margin of error depends on

the confidence level we want, and the standard error of our estimator

distribution of a random variable (last digit of a phone number) vs. the avg of these random variables

the distribution of a random variable is expected to be uniform and discrete. the distribution of the avg of the random variables should be centered around the mean, more bell shaped, and a lot less discrete

Sampling distribution

the distribution of values taken by the statistic in all possible samples of the same size from the same population The distribution of the sample statistic (p hat)

sampling distribution of x bar

the distribution of values taken by the statistic in all possible samples of the same size from the same population distribution of the statistic (x bar); you take repeated samples and plot them to see the pattern

standard error

the standard deviation of a sampling distribution

what do you first identify when working on sampling distribution problems

the type of problem: sample mean or sample proportion and if it is normal

for 95% of all random samples, that formula will produce an interval that contains the _____ and only 5% of samples will ____

the unknown parameter P, give intervals that miss p

for samples greater than the amount of 30, we use what t value

the z value given for that particular confidence interval

how do you reduce bias?

use random samples x bar and p hat are both unbiased because they are centered around mu and p

how do we figure out how far off a sample prediction will be?

use the normal distribution 90, 99, 95

will higher confidence levels have wider or slimmer intervals

wider

CIs for pop mean are always centered around ____

x bar

estimator of mu is

x bar

confidence interval equation for a population mean (mu)

x bar +/- t(s/√n)

symbols for sample statistic mean, standard deviation, and proportion

x bar, s, p hat

what sample statistic estimates the population parameters of mean, standard deviation, and proportion

x bar, s, p hat

z for sampling distribution of x bar and p hat

x bar: z = (x bar - mu) / (st dev / square root of n) p hat: z = (p hat - p) / (square root of p(1-p)/n)

general formula for z

z = observation - mean / st dev

population proportion equation for n

z^2*p hat*1-p hat / m^2 ANSWER MUST BE INTEGER, ROUD UP NO MATTER THE DECIMAL VALUES

the exact standard error of the sample proportion equals

square root of p(1-p)/n can use p hat to estimate

standard error of the sampling distribution of x bar equation

standard error of x bar = st dev of x / square root of n

what are the values of mean, standard error, and shape of the sampling distribution of x bar when x is a random variable

1. the same the mean of original distribution ( Mx bar = Mx) 2. smaller than the standard dev of the original distribution 3. regardless of shape of the original distribution (x) the distribution of x bar becomes bell shaped as n inc.

confidence interval for a population mean is valid if

1. the original distribution is normally distributed OR n is large 2. data is from a random sample (SRS) or randomized experiment

population mean equation for n

(z*s/m)^2 ANSWER MUST BE INTEGER, ROUND UP NO MATTER THE DECIMAL VALUE

what value of p hat do we guess when not given one (for CIs)

.05 if you have no clue or use a guess for p hat from a previous study if available

Null hypothesis

1. A statement or idea that can be falsified, or proved wrong; states what we want to disprove 2. Ho is the symbol 3. P = Proportion given

for large sample sizes... (5) CI for pop mean

1. CLT guarantees that the sample mean has a normal distribution when n is large 2. s is a good estimator of pop st dev 3. t distribution gets close to normal -- think of the z distribution as a t w df = infinity 4. makes very little difference to use z or t 5. if df are not on table we can use the z table instead. minicab can give exact values.

Assumptions of significance tests

1. SRS 2. Categorical data 3. NPo >/= 15 4. N(1-Po) >/= 15 This is used to make sure p hat is approx normal

Alternative hypothesis

1. States what we want to prove 2. Ha is its symbol 3. P is either >, <, or not equal to the proportion given (same value for null)

sample proportion characteristics (5)

1. data is categorical (yes/no) 2. X = number of successes in sample ; original distribution/population 3. p hat = proportion of successes in sample (= x/n) it is a STATISTIC 4. sampling distribution: p hat N(p, square root of p(1-p)/n) 5. np >/= 15 AND n(1-p) >/= 15 for it to be normally distributed

sample mean characteristics (5)

1. data is quantitative 2. X = one individual measurement ; original distribution/population 3. x bar = sample mean which is a STATISTIC 4. sample distribution: x bar N(mu, st dev of pop/square root of n) - aka pop mean, standard error 5. n >/= 30 or if stated population is normal (aka if x is normal)

t distribution characteristics (2)

1. family of distributions indexed by their degrees of freedom 2. all symmetric and bell-shaped, all centered at zero

how large does the sample size, n, have to be?

1. it depends on the shape of the original population 2. if the population is normally distributed, the sampling distribution of x bar will be normal for any n 3. if the population is far from normally distributed, n=30 is large enough in most cases for the sampling distribution of x bar to be considered normal 4. in general, the closer to normal (bell shaped) the original distribution is, the smaller n needs to be 5. and for any shape distribution, as n inc, the sampling distribution of x bar will get closer to normal

how do you compute a bootstrap confidence interval

1. re-sample w replacement from the original sample to create a new sample of the same size as the original. compute a sample statistic 2. using a statistical software package, re-sample thousands of times (10000) and compute a new sample statistic each tome 3. this will result in thousands of sample stats for the bootstrap samples 4. to find the 95% confidence interval, find the central 95% of the sample stats by using the 2.5th percentile and the 97.5th percentiles

for small sample sizes... (6) CI for pop mean

1. t procedures are very sensitive to skewness or outliers in the go population 2. s might be far from pop st dev 3. t distribution still far from normal 4. need to use t table AND need original population to be normal 5. impossible to check population, we only have a small sample 6. plot data and make sure it could have come from a normal distribution. perfect symmetry of the sample is not important, but there should be no major outliers

the length of intervals depend on what three things

1. the confidence level (determined by t) 2. the standard deviation, s 3. the sample size, n

Does p hat have a binomial distribution

No, it can be approx normal under the right conditions

When sample size increases what happens to the graph

Normal graph becomes better, st dev decreases, and it is less skewed

Mean of the standard distribution of p hat

P

The sampling distribution of p hat is approximately ___ with a mean = ____ and standard error = _____ as long as the expected number pf successes (____) and failures (_____) are each ____ or larger

P , standard error equation, np, n(1-p)

Normal approximation (and what it has to be in between) for proportions

P hat = N(p, square root of p(1-p)/ n) N p must be greater than or equal to 15 N(1-p) must be greater than or equal to 15

Statistics are ____ variables, which have ______

Random, distribution

Standard error of the distribution of p hat

Square root of p(I-p)/n

Sample proportion

Symbol: p hat Data: categorical, not binomial but can be normal if w the right conditions p hat = x/n = number of successes / n P hat should be the mean

Sample mean

Symbol: x bar Data: quantitative

The sampling distribution of the statistic is

The distribution that specified all possible value a statistic can take and a pattern that emerges if we take many samples and compute the statistics from each one

statistical inference

The process of using data obtained from a sample to make estimates or test hypotheses about the characteristics of a population.

central limit theorem

The sampling distribution of the mean will approach the normal distribution as n increases (n>30). for a random and representative sample (SRS) with a large sample size n, the sampling distribution of the sample mean is approximately normal with mean nu (same as original distribution) and standard error sigma / square root of n (the original standard deviation divided by the square root of n)

How can we study the sampling distribution of a statisitc

Through simulations

Why do we say standard error

We use this to refer to the standard deviation of sampling distribution, and to distinguish it from the st dev of an ordinary probability distribution

X, x bar, x bar estimate, and how results would look from different samples (sampling distribution of x bar)

X = heights x bar = avg of the numbers x bar estimate = the population mean the results would vary around the mean if we took different samples

why is the normal distribution the most important one in stats

bc normal distribution can be used to approximate the sampling distribution of the statistic

we can only talk about probability _____ we take the sample. _____ we talk about confidence

before, after

bigger confidence mean ____ interval

bigger

bootstrap confidence interval

calculating non-parametric confidence intervals for parameter estimates. In this context, the bootstrap simulates the frequentist concept of obtaining estimates from repeated similar experiments used for all parameters can be used when we can't find a confidence interval formula or a standard error formula mathematically. for example, we have a formula for the CI for the pop mean or proportion, but not for the confidence interval for the pop median, st dev, or mean

bias has to do with the ______ of the sampling distribution

center

where does an unbiased statistic have its sampling distribution?

centered at the parameter being estimated

When we make inferences about ONE POPULATION PROPORTION, what assumptions do we need to make?

data is categorical, successes and failures both are greater than or equal to 15, simple random sample

if we aren't given range in pop mean CI problems (to find n) what do we do

divide the range by 6

small variability, biased

dots are close together and outside the target

large variability, biased

dots are spread out and outside of the target

small variability, unbiased

dots within the target but close together

large variability, unbiased

dots within the target but spread out

point estimation

estimate an unknown parameter using a single statistic (e.g. xˉ , p̂ )

confidence interval (CI) for population proportion skeleton

estimator +/- margin of error

confidence intervals are of the form

estimator +/- margin of error p hat +/- z√(p hat * 1 p hat) / n

as df increases, t

gets closer to z

small sample method for confidence intervals for the pop proportion

if you don't have 15 failures or success, you can add 2 (only add 2 once) to your calculation of the sample proportion. still works if it doesn't make the value greater than 15 p hat = x+2/N+4

if you want a lot of confidence, but not a huge interval, what do you do

inc the sample size

how do you reduce standard error

increase n we know this bc n is always in the denominator

MARK ALL THAT ARE TRUE!! We can use the Normal (Z) table to find probabilities about: individuals, if the population is Normal individuals, if the population is NOT Normal averages based on small n, if the population is Normal averages based on small n, if the population is NOT Normal averages based on large n, if the population is Normal averages based on large n, if the population is NOT Normal count of successes out of n independent trials sample proportion of successes out of n independent trials, when np and n(1-p) is large enough

individuals, if the population is Normal averages based on small n, if the population is Normal averages based on large n, if the population is Normal averages based on large n, if the population is NOT Normal sample proportion of successes out of n independent trials, when np and n(1-p) is large enough

as n increases, what happens to the graphs

it becomes normal around the mean

symbols for population parameter mean, standard deviation, and proportion

lc mu, lc sigma, p

margin of error for a CI for pop mean is

m = t * s/√n

margin of error for a CI for p is

m = z√p hat * (1-p hat) / n

a smaller sample for CI of mu means

more variability and fatter tails

how do you find df if only given n

n-1 = df

when can you use the small sample alternative

only works for CI not for sampling distribution or sig test

the confidence interval for pop proportion is

p hat +/- z√(p hat * 1 p hat) / n

confidence intervals are statements about the ____ mean, not the ____ mean or about _____

population, sample, individuals

if we were to take a sample and calculate the sample mean, would the sample mean be exactly equal to the population mean

probably not, but close

if we were to take a sample and compute the sample proportion, would the sample proportion be exactly equal to the population proportion

probably not, but close

what do we use to estimate population st dev

s

in CIs, we usually have no control over ___, but we can control the _____ and the ______

s, confidence level, sample size

bigger sample size means ____ interval

smaller

larger n means

smaller standard deviation, less spread out, less deviation, more normal

standard error has to do with the ____ of the sampling distribution

spread


Kaugnay na mga set ng pag-aaral

Chapter 1: Managing Effectively in a Changing World

View Set

Ch 9 Material Requirements Planning

View Set

CS week 6+7 (design)+ tables+forms

View Set

Ch. 19: Everyday Theology (THEO 104 LUO)

View Set

N450 - Perry Potter Ch 29 - Blood Transfusions

View Set

ATI RN Fundamentals Online Practice 2019 A

View Set