Stats Test 2

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Natural variation of x bar is within

+ or - 3 σ/√ n of mean

How large a sample do I need?

1. A small level of significance requires a larger sample. 2. Depending on the effect size, higher power requires a larger sample size. 3. Detecting a small effect size requires a larger sample size. 4. A two-sided test requires a larger sample size than a one-sided test.

Four elements of test significance

1. Claim 1 and Claim 2: opposing claims about an unknown parameter. Presumption is for claim 1 unless there is strong evidence against it. 2. Outcome: standardized outcome that measures how far the outcome diverges from claim 1. 3. Assessment of Evidence: How likely is it to get this outcome if claim 1 is true? 4. Conclusion: An outcome that would rarely happen if claim 1 is true is good evidence that claim 1 is not true; hence we believe claim 2 is true

Constructing a control chart for x-

1. Draw a horizontal centerline at mu 2 draw horizontal control limits at µ ± 3σ/√ n 3. plot the means (x-) from samples of size n against time

Format of table for t distributions

1. Each t- distribution is determined by its degrees of freedom: df = n − 1 for this procedure 2. If the actual df is not on the table, use df that is closest to the actual df without going over 3. The t* values are found in the body of the table

Sampling distribution of x bar tells us:

1. On average x bar will give us the right answer. We call this property unbiasedness. 2. as sample size increases, the accuracy of x bar increases (smaller standard deviation of the sampling distribution)

Out of control signals

1. One point above the upper control limit or below the lower control limit. 2. Run of 9 points in a row on same side of the centerline (as unlikely as one point outside the control limits)

How to find a one sided t- test

1. calculate t- statistic 2. find the degrees of freedom n-1 3. go to the degrees of freedom on the table where they meet the t- statistic. 4. then go down to the p one sided and you have your p-value range. p-value > alpha fail to reject

two sided test not equal

1. calculate t- statistic 2. find the degrees of freedom n-1 3. go to the degrees of freedom on the table where they meet the t- statistic. 4. then go down to the p two sided and you have your p-value range. p-value > alpha fail to reject

Statistical inference

1. confidence interval to estimate a parameter 2. test of significance to assess a claim about a parameter

3 ways to determine normality

1. is normally distributed 2. n>30 3. not extremely skewed no outliers

Key take aways of 23

1. smaller α requires a larger sample to achieve desired power 2. higher power requires a larger sample 3. smaller effect size requires a larger sample 4. two-sided test requires a larger sample size than a one-sided test

Proper interpretation of a confidence interval should have the following 3 things:

1. statement of confidence 2. Parameter in context. 3. Calculated interval Examples: We are approximately 90% confident that the interval (7.8 months, 10.8 months) contains the true mean number of months married students at this church-sponsored university dated before getting engaged.

Important properties of a confidence interval

1. the margin of error controls the width of the interval 2. as sample size increases, m and width decrease 3. as confidence increases, m and width increase

Sample

A subgroup of the population from which we obtain information Example: 170 BYU full-time students

The sampling distribution of a sample mean X¯ is a theoretical probability distribution. It describes the distribution of:

All sample means from all possible random samples of the same size taken from the same population Variation in sample mean values tied to size of each sample NOT the number of samples

Sampling distribution allows us to:

Assess uncertainty of sample results If we knew the spread of the sampling distribution we would know how far our x bar might be from the true mean mu.

Which of the following shows the conditions that must be met by one-sample t procedures?

B. Normality of the population or large sample size, and randomness in the data collection

Two studies were done on the same set of data, where study A was a two-sided test and study B was a one-sided test. The p-value of the test corresponding to study A was found to be 0.040. What is the p-value for study II?

C. The p-value must be 0.020

Distribution of x bar for all possible SRSs of size n from a population with mean mu and standard deviation of sigma

Center: mean of sampling distribution of x bar is mu Spread: standard deviation of sampling distribution of x bar is sigma/sqr n Shape: is approximately normal OR n>30 Central Limit Theorem

Unrealistically simple case

Conclusion about mu gather data using SRS σ known sampling distribution is normal

Which of the following confidence levels and significance levels are appropriate for using a confidence interval approach to hypothesis testing?

Confidence Level = 90% and α = 0.1

Skewedness

Depends on which tail is longer

Which is harder?

Detecting a difference when the data are noisy (highly variable) bad Detecting a difference when the data do not vary much

What influences power?

Effect size • Variability in measurements • Chosen significance level (α) • Sample size

Estimate

Estimate: a specific value of an estimator. For example, • the average value of the n = 47 claims is $1800 • the proportion of infected cutting boards for n = 144 households is 10.4% • the estimator of the population proportion p is the sample proportion p

Type II Error:

Fail to reject H0 when it is false. In trial context: Pronounce not guilty when defendant is guilty

Key take aways from chapter 14

For large sample, we might expect the sample mean to not stray too far from mu. But a small samples will have greater variability. Sampling distribution is different than the other stuff we have been talking about its like adding a bunch of tests together (x-) to get an overall normal curve. x-=u Standard deviation for sampling distributions is sigma/sqr n as n increases sigma decreases Central limit theorem n>30 is normal

• null hypothesis

H0 represents claim 1 (Example: µ = 300 ppm) always involves an = (no difference)

alternative hypothesis

Ha represents claim 2 (Example: µ > 300 ppm). Always involves inequality (<, >, or /=) outcome is represented by a standardized statistic called a test statistic

The West Clermont local school district in Ohio claims that the mean ACT score of its students is 21. Students in the school district, believing that the mean was greater than 21, took a random sample of the ACT scores of 40 students and found the mean to be 27.

How likely is it that we will find that the mean ACT score in the West Clermont school district is as high as 27 if the true mean is 21?

Why is sampling distribution so important?

If a sampling distribution has a lot of variability, then if you took another sample it's likely you would get a very different result about 95% of the time the sample mean will be within 2 standard deviations (2σ/√n) of the population mean This tells us how close the sample statistic should be to the population parameter. (remember that sigma is used when the population standard deviation is known when not we will use s. Also t* is used with s and Z* is used with σ.)

Consider the following confidence interval interpretation: "We are 95% confident that we have found a confidence interval that contains the true mean number of Utah high school students involved in a car accident per month". Is this interpretation of a confidence interval correct or incorrect? Why or why not?

Incorrect. It does not state the actual confidence interval

Key take aways of chapter 19

Margin of error m = t* √s/ n number of people you need for a study n = (z ∗σ / m)^2

A population is known to have a normal distribution with a mean of 180 and a standard deviation of 36. If a sample of size 40 is taken, what would be its shape?

Normal NOT APPROXIMATELY NORMAL! IF THEY TELL YOU IT IS NORMAL THEN IT IS JUST NORMAL

According to the central limit theorem what is the shape of the sampling distribution of x bar when the population is normal?

Normal regardless of sample size

Parameter

Numerical fact about the population Example: mu- average GPA of all full time BYU students

Statistic

Numerical fact about the sample x bar- average GPA of the 170 students in our sample.

Key concepts of chapter 21

One sided and two sided p tests (see above) Reject H0 if its low and fail to reject H0 if its not low (it is p-value and low is below alpha)

β

Probability(Type II error) • Probability(fail to reject H0 when it is false)

Power

Probability(reject H0 when it is false) • 1 − β

Point estimate

Quantitative data example: Based on a sample of n = 47 policies, we estimate that the average premium at this agency is approximately $1800. Categorical data example: Based on a sample of n = 144 households, we estimate that the proportion of infected bamboo cutting boards is approximately 10.4%. Is an estimate of one single number. It is not very accurate. Intervals are better.

interval estimation

Quantitative data example: Based on a sample of n = 47 policies, we estimate that the average premium at this agency is between $1,700 and $1,900. Categorical data example: Based on a sample of n = 144 households, we estimate that the proportion of infected bamboo cutting boards is between 8.4% and 12.4% gives an interval estimate of porportions. More accurate than point estimate because it has a range of possible values.

What is a natural source of variation in productions? What is unnatural?

Raw material, human performance, equipment performance, measurements Bad batch of raw material, broken machine, poorly trained operator

Type I Error:

Reject H0 when it is true. In trial context: Pronounce guilty when defendant is innocent

If p-value ≤ α, reject H0 or If p-value > α, fail to reject H0 Key take aways from chapter 20

Rejecting H0 means "difference between claimed parameter value and calculated statistic is real" • Fail to reject H0 means "difference could be due to chance"

All college students in the U.S. have a mean age of μ = 21.3 in 2014. Suppose you randomly select two samples of students from this population, and you calculate the sample mean for each. Sample 1 has a size of n = 200, and Sample 2 has a size of n = 30. Which sample is more likely to get a sample mean of 24 or more?

Sample 2 is more likely This is because smaller sample sizes are more likely to stray from the mu mean.

Central Limit Theorem

Shape gets more normal as n increases n>30 is considered large CLT allows us to use the standard normal table to compute approximate probabilities associated with x bar.

Wouldn't be shocked if this was the written portion of the test

Step 1: STATE the problem Step 2: PLAN • Select procedure: one-sample t confidence interval for means • Select confidence level • State parameter of interest in context Step 3: SOLVE • Collect and plot data • Calculate x¯ and s • Check conditions • Randomness of data: SRS • Normality of population distribution or large sample size: plot data and check for outliers or n > 30 • Calculate confidence interval using the formula x¯ ± t ? √s n Step 4: CONCLUDE • Interpret confidence interval in context by including • statement of confidence • parameter of interest • calculated interval

Population

The entire group of individuals that is the target of interest. Example: All BYU full-time students (so if they are talking about BYU students keep it with BYU students not all students)

if all possible samples of size 80 are taken instead of size 20 how would this change the mean and standard deviation of the sampling distribution?

The mean would stay the same and the standard deviation would decrease. Because dividing by a larger number causes the number to be smaller over all. σ/vn x bar = mu ALWAYS with sampling distributions

If all possible samples of size 10 are taken from a population instead of all possible samples of size 50 how does this change the mean and standard deviation of the sampling distributions of x bar?

The mean would stay the same and the standard deviation would increase. Dividing by a smaller population gives a larger standard deviation

A tire manufacturer has a 60,000 mile warranty for tread life. The manufacturer considers the overall tire quality to be acceptable if less than 8% are worn out at 60,000 miles. A study was done and researchers were 98% confident that the proportion of tires that are worn out at 60,000 miles lies between 7.8% and 9.6%. What is the parameter of interest?

The proportion of all tires that are worn out after 60,000 miles NOT THE MEAN!!!!

True or False: The t-distribution and the standard normal table have the same center.

True

Estimator

a general statistic that estimates the parameter. For example, • the estimator of the population mean µ is the sample mean of x

Test statistic

a number that summarizes the data for a test of significance compares an estimate of the parameter from sample data with the value of the parameter given in the null hypothesis measures how far sample data diverge from H0 • large values are not consistent with H0; give evidence against H0 • used to find probability of obtaining sample data IF H0 were true • example of a test statistic t = x¯−µ0 s/ √ n

• tests of significance

almost always assume (for the sake of argument) a claim that the researchers think is not true • if good evidence against the claim, opposite of the claim must be true Example: Researchers for Center for Environmental Health thought toy company's claim was false, so they assumed it was true. If they could show that observed difference was unlikely assuming toy company's claim, then there is strong evidence that their claim was false

P-value

calculate a measure of the strength of agreement between the test statistic and H0

Main take aways from chapter 18

confidence interval x¯ ± t* √s/ n Look at writing assignment be really careful on wording of things

Increasing alpha

decreases beta and increases power. Power and alpha go up and down together alpha and beta are opposite.

Key take aways from chapter 17

different estimations Point estimate (43%) interval estimation (43% - 46%) hypothesis testing (Ho = 43% HA = u > 43%)

Inference:

drawing conclusions about a population (using a parameter) based on data from a sample (using a statistic) with a measure of uncertainty.

Increasing the confidence level will lead to a smaller margin of error.

false

Pre- specified cutoff for p-value

if p-value ≤ α, difference is statistically significant; reject H0 and conclude it is false If p-value ≤ α: reject H0 declare observed difference statistically significant If p-value > α: do not reject H0 do not declare observed difference statistically significant 0-α believe Ha

Decreasing alpha

increases Beta and decreases power.

Increasing n

increases power and decreases beta. Any time you move power it will move beta in the opposite direction.

Statistical Process Control

is a method of quality control (e.g., inspection) which employs statistical methods to monitor and control a process. This helps ensure the process operates efficiently, producing more specification-conforming products with less waste (rework or scrap).

not =

is a two sided test all else is one sided

Increasing your sample size will decrease width of your confidence interval

n = (z ∗σ / m)^2 ALWAYS ROUND UP don't want half a person.

Observed effect

numerator of test statistic

Chapter 16 key take aways

pretty much just need to know that something is out of control if it goes above or below the line found by (µ ± 3σ/√ n) and if there is a run of nine points on one side of the mean then it is out of control.

Out of Control Process:

process exhibits unnatural variation over time

In Control Process:

process whose output exhibits only natural variation over time

X¯ Control Chart:

statistical tool for monitoring an input or an output of a process that has variation, alerting us when a problem or unnatural variation has occurred

Properties of t distribution

symmetric • bell shaped • mean = 0 • the smaller the df, the larger the spread • because more uncertainty due to s • the larger the df, the closer the t-distribution to the standard normal

Significance depends on sample size

t = (x¯ − µo )/ s/√n Significance depends on: the size of the observed effect (numerator of test statistic) measures how far the sample mean deviates from the hypothesized u0 THE LARGER THE OBSERVED EFFECT, THE SMALLER THE P-VALUE THE LARGER THE SAMPLE SIZE THE SMALLER THE P-VALUE

Distribution of test statistics

t = x-u0/(s/sqr n) width df=n-1 GIVES TO THE RIGHT

Hypothesis testing

testing out expectations through experience H0 and HA testing

Suppose we take all possible samples of the same size from a population and for each sample, we compute x bar. The mean of these x bar values will be exactly equal to the mean of the population mu from which the samples were taken

true

The name of this quantity, s/sqr n, is the standard error of x̅.

true

True or False: When comparing the z-distribution and the t-distribution, both have the same center at 0.

true

We fail to reject the null hypothesis when there is not enough evidence in support of the alternative hypothesis.

true

Large samples

unimportant differences can be statistically significant

If sigma is unknow

use x¯ ± t* s /√ n replacing sigma with s. then use table to find t* Remember to use df = n-1 REMEMBER 1. must be SRS 2. Normal

Key take aways from chapter 15

we need to have srs and normality to calculate probabilities. We have two choices if these two conditions are met. (we are finding probability through z scores like we were for the last test) 1. No sample size is given: z= x - µ /σ then use the normal table and remember the left side gives the first decimal and the top gives the second 2. Sample size is given: z= x - µ /(σ/√ n) as n increases sigma decreases you only really have to do two things for this unit either z= x - µ /(σ/√ n) or z= x - µ /σ

Suppose a 95% confidence interval was made to estimate the monthly cost of Internet service instead of a 90% confidence interval. Fill in the blank: The 95% confidence interval will be _______ the 90% confidence interval.

wider than

Confidence interval

x¯ ± t* √s n Always check conditions

General formula for c% confidence interval

x¯ ± z* σ /√ n only use z* if you know sigma otherwise use t* Use the table to find z*

Margin of error

x¯ ±t* √s/ n m = t* √s/ n interchangeable with z* and sigma maximum difference between statistic and parameter at stated confidence level. Accounts for uncertainty due to sampling variability only. Does not account for non-response, undercoverage, bad

related measure of uncertainty is

α (probability of rejecting falsely claim 1 or H0)

stuff about alpha beta and power

α is the probability of a Type I error. β is the probability of a Type II error. Power is the probability of rejecting H0 when it's false. Power is good!

p-value info

• A number between 0 and 1: 0 ≤ p-value ≤ 1 The probability of getting a test statistic as extreme or more extreme than observed if H0 were true Probability on statistic • Computed assuming H0 is true • A measure of the strength of agreement between the observed test statistic and H0 • Measures evidence against H0 "if low reject the H0" meaning if the p-value is low it is evidence against H0 so you can reject it. It is evidence for HA.

α

• Level of significance •Probability(Type I error) •Probability(reject H0 when it is true)

THE LARGER THE OBSERVED EFFECT, THE SMALLER THE P-VALUE

• Size of the sample: n √s n : measures how much random variation we expect

• Low risk of false negative

• a study with this is powerful

• Low risk of false positive

• a study with this is safe

THE LARGER THE SAMPLE SIZE THE SMALLER THE P-VALUE

→ Sample size may be too small to detect significance → Sample size may be so large that results are always significant When sample size is large, check for practical importance: Results are declared statistically significant when p-value ≤ α Results are declared practically important when the observed effect (numerator of test statistic) is large or important enough to matter. Practical importance is not the same as statistical significance Practical importance is determined by common sense

p-value for a two-sided test = 2 times p-value for a one-sided test.

→ Two-sided test requires stronger evidence than one-sided test.

Small samples

⇒ important difference may not be statistically significant


Ensembles d'études connexes

Transferring energy in the atmosphere (Conduction, convection and latent heat)

View Set

World Civilizations Module 1 Quiz Questions

View Set

Chapter 38 oxygenation and perfusion

View Set

QUANTIFIERS - NOT MANY, NOT MUCH

View Set