Business Statistics Ch 15 & 16 (Week 7)
level C confidence interval
**A level C confidence interval for a parameter has two parts: ~An interval calculated from the data, which has the form: "estimate"±"margin of error" ~A confidence level C, which gives the probability that the interval will capture the true parameter value in repeated samples. That is, the confidence level is the success rate for the method.
Statistical inference
**After we have selected a sample, we know the responses of the individuals in the sample. However, the reason for taking the sample is to infer from that data some conclusion about the wider population represented by the sample. **Statistical inference provides methods for drawing conclusions about a population from sample data.
Parameters and statistics
**As we begin to use sample data to draw conclusions about a wider population, we must be clear about whether a number describes a sample or a population. **A parameter is a number that describes the population. In practice, the value of a parameter is not known because we can rarely examine the entire population. **A statistic is a number that can be computed from the sample data without making use of any unknown parameters. In practice, we often use a statistic to estimate an unknown parameter. **Remember p and s: parameters come from populations and statistics come from samples. **We write μ (the Greek letter mu) for the mean of the population and σ (the Greek letter sigma) for the standard deviation of the population. We write x ̅ ("x-bar") for the mean of the sample and s for the standard deviation of the sample.
The sampling distribution of x ̅ (part 2)
**Because the standard deviation of the sampling distribution of x ̅ is σ∕√n, the averages are less variable than individual observations, and averages are less variable than the results of small samples. **Not only is the standard deviation of the distribution of x ̅ smaller than the standard deviation of individual observations, it gets smaller as we take larger samples. The results of large samples are less variable than the results of small samples. Note: While the standard deviation of the distribution of x ̅ gets smaller, it does so at the rate of √n, not n. To cut the sampling distribution's standard deviation in half, for instance, you must take a sample four times as large, not just twice as large.
The law of large numbers
**If x ̅ is rarely exactly right and varies from sample to sample, why is it nonetheless a reasonable estimate of the population mean μ? **Here is one answer: If we keep taking larger and larger samples, the statistic x ̅ is guaranteed to get closer and closer to the parameter "μ". Law of large numbers **Draw observations at random from any population with finite mean μ. As the number of observations drawn increases, the mean x ̅ of the observed values tends to get closer and closer to the mean μ of the population.
Confidence intervals for a population mean
**In our NHANES example, wanting "95% confidence" dictated going out 2 standard deviations in both directions from the mean—if we change our confidence level C, we will change the number of standard deviations. The text includes a table with the most common multiples: **Once we have these, we may build any level C confidence interval we wish. CONFIDENCE INTERVAL FOR THE MEAN OF A NORMAL POPULATION **Draw an SRS of size n from a Normal population having unknown mean μ and known standard deviation σ. A level C confidence interval for μ is: x ̅±z^∗ σ/√n **Some examples of critical values, z^∗, corresponding to the confidence level C are given above.
Confidence interval
**In our previous example, the 95% confidence interval was x ̅±0.6. **Most confidence intervals will have a form similar to this: estimate"±"margin of error **The margin of error ± 0.6 shows how accurate we believe our guess is; the margin being based on the variability of the estimate. A level C confidence interval for a parameter has two parts: ~~An interval calculated from the data, which has the form: estimate"±"margin of error ~~A confidence level C, which gives the probability that the interval will capture the true parameter value in repeated samples. That is, the confidence level is the success rate for the method.
The central limit theorem
**Most population distributions are not Normal. What is the shape of the sampling distribution of sample means when the population distribution isn't Normal? **A remarkable fact is that as the sample size increases, the distribution of sample means changes its shape: it looks less like that of the population and more like a Normal distribution! **Draw an SRS of size n from any population with mean μ and finite standard deviation σ. The central limit theorem says that when n is large, the sampling distribution of the sample mean x ̅ is approximately Normal: x ̅ is approximately N(μ, σ⁄√n) **The central limit theorem allows us to use Normal probability calculations to answer questions about sample means from many observations, even when the population distribution is not Normal.
Confidence level
**The confidence level is the overall capture rate if the method is used many times. The sample mean will vary from sample to sample, but when we use the method "estimate"±"margin of error" to get an interval based on each sample, C% of these intervals capture the unknown population mean µ. INTERPRETING A CONFIDENCE LEVEL **The confidence level is the success rate of the method that produces the interval. We don't know whether the 95% confidence interval from a particular sample is one of the 95% that capture μ or one of the unlucky 5% that miss. **To say that we are 95% confident that the unknown μ lies between 26.2 and 27.4 is shorthand for "We got these numbers using a method that gives correct results 95% of the time."
Sampling distributions
**The law of large numbers assures us that if we measure enough subjects, the statistic x ̅ will eventually get very close to the unknown parameter μ. **If we took every one of the possible samples of a certain size, calculated the sample mean for each, and graphed all of those values, we'd have a sampling distribution. **If we use software to imitate chance behavior to carry out tasks such as exploring sampling distributions, this is called simulation. **The population distribution of a variable is the distribution of values of the variable among all individuals in the population. **The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. **Be careful: The population distribution describes the individuals that make up the population. A sampling distribution describes how a statistic varies in many samples from the population.
Statistical estimation
**The process of statistical inference involves using information from a sample to draw conclusions about a wider population. **Different random samples yield different statistics. We need to be able to describe the sampling distribution of possible statistic values in order to perform statistical inference. **We can think of a statistic as a random variable because it takes numerical values that describe the outcomes of the random sampling process. Therefore, we can examine its probability distribution using concepts we learned in earlier chapters.
Sampling distributions and statistical significance (part 2)
**The sampling distribution of a sample statistic is determined by the particular sample statistic we are interested in, the distribution of the population of individual values from which the sample statistic is computed, and the method by which samples are selected from the population. **The sampling distribution allows us to determine the probability of observing any particular value of the sample statistic in another such sample from the population. We said that an observed effect so large that it would rarely occur by chance is called statistically significant. **Consider the second graph on the previous slide: We may decide, based on our observed set of 1000 samples, that because we say only 2 with variances above 200, that that is a statistically significant event.
Confidence intervals: the four-step process
**The steps in finding a confidence interval mirror the overall four-step process for organizing statistical problems. The Four-Step Process ~State: What is the practical question that requires estimating a parameter? ~Plan: Identify the parameter, choose a level of confidence, and select the type of confidence interval that fits your situation. ~Solve: Carry out the work in two phases: 1. Check the conditions for the interval that you plan to use. 2. Calculate the confidence interval. ~Conclude: Return to the practical question to describe your results in this setting.
How confidence intervals behave
**The 𝑧 confidence interval for the mean of a Normal population illustrates several important properties that are shared by all confidence intervals in common use: the user chooses the confidence level and the margin of error follows; we would like high confidence and a small margin of error; high confidence suggests our method almost always gives correct answers; and a small margin of error suggests we have pinned down the parameter precisely. How do we get a small margin of error? **The margin of error for the z confidence interval is: z^∗ σ/√n. **The margin of error gets smaller when: ~~z^∗ gets smaller (the same as a lower confidence level 𝐶). ~~𝜎 is smaller—it is easier to pin down µ when 𝜎 is smaller. ~~𝑛 gets larger—since 𝑛 is under the square root sign, we must take four times as many observations to cut the margin of error in half.
Simple conditions for inference about a mean
**This chapter presents the basic reasoning of statistical inference. We start with a setting that is too simple to be realistic. Simple Conditions for Inference About a Mean 1.We have a simple random sample (SRS) from the population of interest. There is no nonresponse or other practical difficulty. The population is large compared to the size of the sample. 2.The variable we measure has an exactly Normal distribution N(μ, σ) in the population. 3.We don't know the population mean μ, but we do know the population standard deviation σ. Note: The conditions that we have a perfect SRS, that the population is exactly Normal, and that we know the population standard deviation are all unrealistic.
The sampling distribution of x ̅ (part 3)
**We have described the center and variability of the sampling distribution of a sample mean x ̅, but not its shape. The shape of the sampling distribution depends on the shape of the population distribution. **In one important case there is a simple relationship between the two distributions: if the population distribution is Normal, then so is the sampling distribution of the sample mean. SAMPLING DISTRIBUTION OF A SAMPLE MEAN **If individual observations have the N(μ, σ) distribution, then the sample mean x ̅ of an SRS of size n has the N(μ, σ⁄√n) distribution.
Sampling distributions and statistical significance (part 1)
**We have looked carefully at the sampling distribution of a sample mean. **However, any statistic we can calculate from a sample will have a sampling distribution.
The sampling distribution of x ̅ (part 1)
**When we choose many SRSs from a population, the sampling distribution of the sample mean is centered at the population mean µ and is less spread out than the population distribution. MEAN AND STANDARD DEVIATION OF A SAMPLE MEAN **Suppose that x ̅ is the mean of an SRS of size n drawn from a large population with mean μ and standard deviation σ. Then the sampling distribution of x ̅ has mean μ and standard deviation σ∕√n **Because the mean of the statistic x ̅ is always equal to the mean μ of the population (that is, the sampling distribution of x ̅ is centered at μ), we say the statistic x ̅ is an unbiased estimator of the parameter μ. Note: on any particular sample, x ̅ may fall above or below μ.
The reasoning of statistical estimation (part 2)
2.The average BMI x ̅ of an SRS of 654 young women has standard deviation σ∕√n=7.5∕〖√654=0.3〗, rounded. 3.The "95" part of the 68-95-99.7 rule for Normal distributions says that x ̅ is within 0.6 (2 standard deviations) of its mean, m, in 95% of all samples. So if we construct the interval [x ̅-0.6, x ̅+0.6] and estimate that m lies in the interval, we will be correct 95% of the time. 4.Adding and subtracting 0.6 from our sample mean of 26.8, we get the interval [26.2, 27.4]. For this we say that we are 95% confident that the mean BMI, m, of all young women is some value in that interval—no lower than 26.2 and no higher than 27.4.
Parameter
A parameter is a number that describes the population. In practice, the value of a parameter is not known because we can rarely examine the entire population.
Statistic
A statistic is a number that can be computed from the sample data without making use of any unknown parameters. In practice, we often use a statistic to estimate an unknown parameter.
The reasoning of statistical estimation (part 1)
An NHANES report gives data for 654 women aged 20 to 29 years. The mean BMI of these 654 women is x ̅=26.8. On the basis of this sample, we want to estimate the mean BMI μ in the population of all 20.6 million women in this age group. To match the "simple conditions," we will treat the NHANES sample as an SRS from a Normal population with known standard deviation σ=7.5. 1.To estimate the unknown population mean BMI μ, use the mean x ̅=26.8 of the random sample. We don't expect x ̅ to be exactly equal to m, so we want to say how accurate this estimate is.
Confidence Interval
An estimated range of values that seem reasonable based on what we've observed. It's center is still the sample mean, but we've got some room on either side for our uncertainty.
sampling distribution
Different random samples yield different statistics. We need to be able to describe the sampling distribution of possible statistic values in order to perform statistical inference.
sampling distribution.
If we took every one of the possible samples of a certain size, calculated the sample mean for each, and graphed all of those values, we'd have a sampling distribution.
simulation.
If we use software to imitate chance behavior to carry out tasks such as exploring sampling distributions, this is called simulation.
Percentiles
Tell you what percentage of the population has a score or value that's lower than yours.
The Four-Step Process
The Four-Step Process **State: What is the practical question that requires estimating a parameter? **Plan: Identify the parameter, choose a level of confidence, and select the type of confidence interval that fits your situation. **Solve: Carry out the work in two phases: 1. Check the conditions for the interval that you plan to use. 2. Calculate the confidence interval. **Conclude: Return to the practical question to describe your results in this setting.
population distribution
The population distribution of a variable is the distribution of values of the variable among all individuals in the population.
statistical inference
The process of statistical inference involves using information from a sample to draw conclusions about a wider population.
statistically significant
The sampling distribution allows us to determine the probability of observing any particular value of the sample statistic in another such sample from the population. We said that an observed effect so large that it would rarely occur by chance is called statistically significant.
sampling distribution
The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population.
random variable
We can think of a statistic as a random variable because it takes numerical values that describe the outcomes of the random sampling process. Therefore, we can examine its probability distribution using concepts we learned in earlier chapters.
Z-Scores
Z-Scores in general allow us to compare things that are not on the same scale, as long as they're normally distributed
Suppose the population standard deviation is σ = 5, an SRS of n = 100 is obtained, and the confidence level is chosen to be 98%. The margin of error for estimating a mean µ is given by: a) 1.165. b) 0.1165. c) 1.228. d) 0.1228.
a) 1.165.
Researchers doing a study comparing time spent on social media and time spent on studying randomly sampled 200 students at a major university. They found that students in the sample spent an average of 2.3 hours per day on social media and an average of 1.8 hours per day on studying. If all the students at the university in fact spent 2.2 hours per day on studying, with a standard deviation of 2 hours, the shape of the sampling distribution of the sample average time spent studying is: a) Normal, centered at 2.2 b) Normal, centered at 2.2 only if the studying times are Normally distributed c) Normal, centered at 1.8 d) Normal, centered at 1.8 only if all samples have a sample average of 1.8 hours studying
a) Normal, centered at 2.2
A veterinary researcher takes an SRS of 60 horses presenting with colic whose average age is 12 years. The average age of all horses seen at the veterinary clinic was determined to be 10 years. The researcher concludes that horses with colic are older. The value 10 years is: a) a population mean b) a sample mean c) a sample distribution d) a sample variance
a) a population mean
The variability of a statistic is described by: a) the spread of its sampling distribution b) the amount of bias present c) the vagueness in the wording of the question used to collect the sample data d) the stability of the population it describes
a) the spread of its sampling distribution
An SRS of 25 recent birth records at the local hospital was selected. In the sample, the average birth weight was x = 119.6 ounces. Suppose the standard deviation is known to be σ = 6.5 ounces. Assume that in the population of all babies born in this hospital, the birth weights follow a Normal distribution, with mean µ. The standard deviation of the sampling distribution of the mean is: a) 6.52 ounces b) 1.30 ounces c) 0.38 ounces d) 0.02 ounces
b) 1.30 ounces
The time (in days) until maturity of a certain variety of tomato plant is Normally distributed, with mean µ and standard deviation σ = 2.4. I select a simple random sample of four plants of this variety and measure the time until maturity. The sample yields = 65. A 95% confidence interval for µ (in days) is: a) 65 ± 1.97. b) 65 ± 2.35. c) 65 ± 3.95. d) 65 ± 4.7.
b) 65 ± 2.35.
A medical researcher treats 400 subjects with high cholesterol using a new drug. After two months of taking the drug, the average decrease in cholesterol level is = 90 and the researcher assumes that the decrease in cholesterol follows a Normal distribution, with unknown mean µ and standard deviation σ = 30. A 95% confidence interval for µ is: a) 90 ± 1.96. b) 90 ± 2.94. c) 90 ± 3.92. d) 90 ± 58.8.
b) 90 ± 2.94.
Which of the following statements is false? a) By the central limit theorem, the sample mean from a population with finite mean and variance will be approximately Normally distributed for sufficiently large samples. b) By the central limit theorem, a sample mean must be based on samples from a population that is Normally distributed. c) By the law of large numbers, the sample mean will be close to the true mean in most samples for large sample sizes. d) If the true population mean is given by µ, then the mean of the sampling distribution of the sample average is also µ in proper random samples from the population.
b) By the central limit theorem, a sample mean must be based on samples from a population that is Normally distributed.
I collect a random sample of size n from a population and compute a 95% confidence interval for the proportion I observe from the population. What could I do to produce a new confidence interval with a smaller width (smaller margin of error) based on these same data? a) I could use a larger confidence level. b) I could use a smaller confidence level. c) I could use the same confidence level but compute the interval n times; approximately 5% of these intervals will be larger. d) Nothing can guarantee absolutely that I will get a smaller interval; I can only say the chance of obtaining a smaller interval is 0.05.
b) I could use a smaller confidence level.
A veterinary researcher takes an SRS of 60 horses presenting with colic whose average age is 12 years. The average age of all horses seen at the veterinary clinic was determined to be 10 years. The researcher concludes that horses with colic are older. The value 12 is: a) a population mean b) a sample mean c) a variance of the sample mean d) none of the answer options
b) a sample mean
A statistic is said to be unbiased if: a) the person computing it doesn't favor any particular outcome b) the mean of its sampling distribution is equal to the true value of the parameter being estimated c) the person who calculated the statistic and the subjects whose responses make up the statistic were truthful d) it is used for only honest purposes
b) the mean of its sampling distribution is equal to the true value of the parameter being estimated
The scores of a certain population on the Wechsler Intelligence Scale for Children (WISC) are thought to be Normally distributed, with mean µ and standard deviation σ = 10. A simple random sample of 25 children from this population is taken and each is given the WISC. The mean of the 25 scores is = 104.32. Based on these data, a 95% confidence interval for µ is: a) 104.32 ± 0.78. b) 104.32 ± 3.29. c) 104.32 ± 3.92. d) 104.32 ± 19.6.
c) 104.32 ± 3.92.
You measure the lifetime of a random sample of 64 tires of a certain brand. The sample mean is = 50 months. Suppose that the lifetimes for tires of this brand follow a Normal distribution, with unknown mean µ and standard deviation σ = 5 months A 99% confidence interval for µ is: a) 49.8 to 50.2. b) 48.78 to 51.22. c) 48.39 to 51.61. d) 40.2 to 59.8.
c) 48.39 to 51.61.
Suppose that the population of the scores of all high school seniors who took the SAT Math test this year follows a Normal distribution, with mean µ and standard deviation σ = 100. You read a report that says, "On the basis of a simple random sample of 100 high school seniors who took the SAT Math test this year, a confidence interval for µ is 512.00 ± 25.76." The confidence level for this interval is: a) 90%. b) 95%. c) 99%. d) > 99.9%.
c) 99%.
An SRS of 25 recent birth records at the local hospital was selected. In the sample, the average birth weight was x = 119.6 ounces. Suppose the standard deviation is known to be σ = 6.5 ounces. Assume that in the population of all babies born in this hospital, the birth weights follow a Normal distribution, with mean µ. Based on the 25 recent birth records, the sampling distribution of the sample mean x can be represented by: a) N(119.6, 1.30) b) N(119.6, 6.5) c) N(µ, 1.30) d) N(µ, 6.5)
c) N(µ, 1.30)
A sample of n = 25 diners at a local restaurant had a mean lunch bill of $16 with a standard deviation of σ = $4. We obtain a 95% confidence interval as (14.43, 17.57). Which of the following is not required for the confidence interval? a) The population of lunch bills is Normally distributed. b) The standard deviation σ is known. c) The sample size is at least 30. d) The sample is a random sample.
c) The sample size is at least 30.
The distribution of actual weights of 8-ounce wedges of cheddar cheese produced at a dairy is Normal, with mean 8.1 ounces and standard deviation 0.2 ounces. A sample of 10 of these cheese wedges is selected. The distribution of the sample mean of the weights of cheese wedges is: a) approximately Normal, with mean 8.1 and standard deviation 0.020 b) approximately Normal, with mean 8.1 and standard deviation 0.2 c) approximately Normal, with mean 8.1 and standard deviation 0.063 d) impossible to determine, because the sample size is too small
c) approximately Normal, with mean 8.1 and standard deviation 0.063
The law of large numbers states that as the number of observations drawn at random from a population with finite mean µ increases, the mean of the observed values: a) gets larger and larger b) gets smaller and smaller c) tends to get closer and closer to the population mean µ d) fluctuates steadily between 1 standard deviation above and 1 standard deviation below the mean
c) tends to get closer and closer to the population mean µ
All confidence intervals have the form: a) estimate + z* standard error. b) estimate ± standard error. c) estimate ± z* margin of error. d) estimate ± margin of error.
d) estimate ± margin of error.