Week 7
An estimator, T, is said to be an unbiased estimator of a parameter theta if...
E(T) = theta ^if the expected value of that estimator is the true parameter
Probabilistic Interpretation
In repeated sampling from a normally distributed population with a known SD, 100(1-alpha) percent of all intervals in the form x-bar = z(sub 1-alpha/2) x standard error will in the long run include the population mean (mu)
How do you know if the population variances are equal?
In unknown, use the rule of thumb: - if the larger sample variance is more than *twice* as large as the smaller sample variance then the population variances are unequal - later we will learn the testing procedure for equal variances
interval estimate
consists of a range of values (with a lower and upper bound) constructed to have a specific probability (or the confidence) of including the population parameter ^gives you some info about the precision of your estimate
overcome the non-normal distribution using ______
the Central Limit Theorem
Z Values
the exact z value corresponding with a confidence level of 0.95 (95%) is 1.96. You can use any confidence level that you wish usually 0.90, 0.95, and 0.99 are used the corresponding z values (or reliability coefficients) are 1.645, 1.96, and 2.58
In a study of factors thought to be responsible for the adverse effects of smoking on human reproduction, cadmium level determinations were made on placenta tissue in two groups of mothers. Of the 18 non-smoking mothers, the mean was 14.5 with a standard deviation (SD) of 1.2. Of the 14 smoking mothers, the mean was 18.7 with a SD of 1.4. Construct a 95% confidence interval for the difference between the population means. Interpret the interval.
(-5.139, -3.261) We are 95% confident that the difference in population mean cadmium levels, calculated as the mean for non-smokers minus the mean for smokers, is between -5.139 and -3.261 Non-smokers had a lower mean cadmium level
Estimating mu
- If we wish to estimate the mean of a normally distributed population, we can draw a random sample and calculate the mean of the sample - This is a good estimate but since a random sample involves chance, we do not expect the sample mean to equal the population mean - It may be more meaningful to communicate information about the probable magnitude of mu
Estimation
- Uses sample data to calculate a statistic - a statistic is an approximation of the parameter of the population from which the sample was drawn - inference from a sample ~point ~interval estimates
If 𝜎 is not known
- can we substitute s (the sample standard deviation) for 𝜎 and use standard normal distribution? - what if the sample size is not large?
Other unbiased estimates of their corresponding parameters:
- difference between two sample means - sample proportion - difference between two sample proportions
when using the t-distribution to determine a CI b/w two means: two situations considered to compute s(sub x-bar1 - x-bar2)
1. Population variances equal 2. Population variances not equal
The health of the bear population in a national park is monitored by periodic measurements taken from anesthetized bears. A sample of 54 bears has a mean weight of 182.9 lbs. Assuming that σ is known to be 121.8 lbs, find a 99% CI for the mean bear weight of the population.
182.9 +/- 42.7632 (140.14, 225.66) We are 99% confident that the population mean bear weight is between 140.12 and 223.66 pounds What aspect of this example is unrealistic? - in reality we may not always know the population mean or SD
If we wanted to construct an interval that contains about 95% of all possible values of x-bar we could use the value of ______ of the sampling distribution below and above the mean
2 SD
A researcher wishes to estimate the population mean of some enzyme in a certain population. The population variance of the variable is known to be 45 and the variable is approximately normally distributed. A sample of 10 individuals yielded an average of 22. What is an approximate 95% confidence interval (CI) for 𝜇 based on this sample?
22 +/- 4.24 (17.76, 26.24)
about ___% of the values of a normal distribution lie within +/- 1 SD of the mean
68%
A study was conducted to estimate hospital costs for accident victims who wore seat belts. 30 randomly selected cases have a distribution that appears to be bell-shaped with a mean of $9,004 and a standard deviation of $5,629. Construct the 99% confidence interval for the mean of all such costs. Interpret the interval.
9004 +/- 2832.78 (6171.22, 11836.78) We are 99% confident that the population mean hospital cost is between $6,171.22 and $11,836.78.
About ___% of the values lie within +/-2 SD of the mean
95%
Statistical Inference
A procedure by which we use information from a sample, which is drawn from a population, to reach a conclusion about the population
Estimation Examples
An investigator is interested in the proportion of patients with a certain disease who respond to a new treatment The Health Department wants to know the mean age of new cases of Hepatitis C
Two general areas of Statistical Inference
Estimation Hypothesis Testing
Population variances equal
Given the assumption that the two population variances are equal, the sample variances we compute are just two estimates of the same quantity -- the common variance Therefore, we can use both estimates to obtain a pooled estimate of the common variance
CI for the difference between 2 population means
If samples are drawn from 2 *independent populations*, sometimes we may want to estimate the difference between these 2 means, 𝜇1−𝜇2. From Chapter 5, we know that 𝑋-bar1−𝑋-bar2 is an unbiased estimate of 𝜇1−𝜇2 and that 𝑋-bar1−𝑋-bar2 is approximately normally distributed, so we can use the theory of normal distributions to compute the CI for 𝜇1−𝜇2 ^difference in sample means is an unbiased estimate of the difference in the true population means
Unknown Population Variance
It is almost always the case that you don't know your population mean,𝜇, (which is why we would use this estimation procedure), then you also don't know your population variance. If we don't know our population variance, can we use the normal distribution?
Sampling Distributions and Estimation
Since we are interested in using sample mean as an estimator, we can use information we learned about the sampling distribution of the sample mean, x-bar
Random vs. Non-random Samples
The data is obtained by random sample - subjects are randomly selected from the population - subjects are randomly assigned to treatment groups - the validity of statistical procedures relies on this In the real world this is not always possible. You must take this into consideration when you make conclusions - you can only generalize the results to the pool from where you selected your random sample
Confidence Intervals using t
The general interval estimate still applies: estimator +/- (reliability coefficient)x(standard error) *the reliability coefficient and Std Error change* ^reliability coefficient is now based off of the t distribution ^standard error is now based off of the sample SD
t Distribution
The t distribution, like the z distribution, has been extensively tabulated When using this distribution you must take into account both the confidence level and the degrees of freedom
What if the population variances are not equal?
Then the previous formula is not appropriate We cannot use the t distribution with df = n1 + n2 - 2, instead we must use another reliability coeffiecient t-prime
The expected value of T, E(T), is obtained by taking the average value of T computed for all possible samples of a certain size drawn from the population
Therefore E(T) = mu-subT
Pooled Variance Estimate
This pooled estimate is obtained by computing the weighted average of the two sample variances
We don't usually know ______, but we do know ____ ....
mu x-bar which is an unbiased estimator for mu We can construct an interval around a point estimate of mu, which is x-bar
When to use t
When: 1. Population variance is unknown, and 2. the sampling distribution of the statistic of interest (x-bar) is normally distributed
Why is estimation useful
Workers in the health sciences field are often interested in parameters, such as proportions or means, of different populations It is usually not feasible (due to cost and/or time limits) to sample the entire population even if it is finite
Non-normal Populations
You cannot always assume the population is normally distributed. However, the Central Limit Theorem tells us that for a large sample, the sampling distribution of 𝑋-bar is approximately normally distributed regardless of the distribution of the individuals in the population.
to solve the unknown population variance problem, way may use the ________
t-distribution
Estimate
a single computed value
point estimate
a single numerical value used to estimate the corresponding population parameter ^derived from sample population ^ex: mean, median, variance, SD, proportion, averages
the expression X-bar = (sum(xi))/n is an example of
an estimator of the population mean mu
Comparisons of the t and z distributions
mean of the t distribution equals zero (like the z distribution) the t distribution are all symmetrical about the mean (like the z distribution) t values range (−∞,∞) (like the z distribution) In general, the t distribution has a variance greater than 1, but the variance approaches 1 when the sample size becomes large - the means that the t distributions are more variable than the z distribution - x-bar and s vary sample to sample (two sources of variability) The shape of the t distribution changes with sample size. (So this means that there is a t distribution for every possible sample size.) As the sample size increases the t distribution becomes more and more like a standard normal distribution. In fact, when the sample size is infinite, the two distributions (t and z) are identical. Compared to the normal distribution the t distribution is less peaked in the center and has higher tails ^more likely that you are going to find extremes in the t than the z distribution
Factors affecting the width of a CI
n -- as the sample size increase, the width of the CI decreases ^as the sample size increases the precision increases s -- as the standard deviation (which reflects the variability of the distribution of individual observations) increases, the width of the CI increases α -- as the desired confidence level increases (α decreases), the width of the CI increases ^alpha = tail areas
CI for the Difference of two means: the number of degrees of freedom used in determining the value of t to use in constructing the interval:
n1 + n2 - 2
Unbiasedness
one criterion for choosing the best estimator
types of Estimates
point estimate interval estimate
another estimate of mu is the ________
sample median
Sampling Distribution of X-bar
the mean of the sampling distribution of x-bar is equal to mu In other words, x-bar is an unbiased estimator of mu
^how does level of confidence related to width?
the more confidence, makes it a wider interval in theory as inc level of confidence, reliability coefficient is larger that results in a wider confidence interval
Degrees of Freedom
the number of independent pieces of information that goes into the estimate of a parameter the number of *degrees of freedom (df)* for a statistic = - number of observations - number of components to be estimated in its calculation The different t distributions are characterized by their degrees of freedom, n-1
Sampled Population
the population from which you draw your sample
Target Population
the population you wish to make an inference about; the population you wish to generalize your results to
precision of the estimate
the quantity obtained by multiplying the reliability coefficient by the standard error of the mean. aka: margin of error
using the t distribution
the requirements for strictly valid use of the t distribution is that the sample must be drawn from a normally distributed population Moderate departures are fine as long as you can assume that the population has at least a mound-shaped distribution
t statistic
the result of using *s* instead of *𝜎* is a distribution with a standard deviation greater than 1, so this distribution is not a standard normal. The resulting distribution, however, is a common distribution, the t-distribution
estimator
the rule that tells us how to compute the estimate
Since the sampling distribution of x-bar is normally distributed....
we can use information we know about normal distributions
When to use z-distribution
x-bar is (approximately) normal distributed the population variance 𝜎 is known