Week 7

Ace your homework & exams now with Quizwiz!

An estimator, T, is said to be an unbiased estimator of a parameter theta if...

E(T) = theta ^if the expected value of that estimator is the true parameter

Probabilistic Interpretation

In repeated sampling from a normally distributed population with a known SD, 100(1-alpha) percent of all intervals in the form x-bar = z(sub 1-alpha/2) x standard error will in the long run include the population mean (mu)

How do you know if the population variances are equal?

In unknown, use the rule of thumb: - if the larger sample variance is more than *twice* as large as the smaller sample variance then the population variances are unequal - later we will learn the testing procedure for equal variances

interval estimate

consists of a range of values (with a lower and upper bound) constructed to have a specific probability (or the confidence) of including the population parameter ^gives you some info about the precision of your estimate

overcome the non-normal distribution using ______

the Central Limit Theorem

Z Values

the exact z value corresponding with a confidence level of 0.95 (95%) is 1.96. You can use any confidence level that you wish usually 0.90, 0.95, and 0.99 are used the corresponding z values (or reliability coefficients) are 1.645, 1.96, and 2.58

In a study of factors thought to be responsible for the adverse effects of smoking on human reproduction, cadmium level determinations were made on placenta tissue in two groups of mothers. Of the 18 non-smoking mothers, the mean was 14.5 with a standard deviation (SD) of 1.2. Of the 14 smoking mothers, the mean was 18.7 with a SD of 1.4. Construct a 95% confidence interval for the difference between the population means. Interpret the interval.

(-5.139, -3.261) We are 95% confident that the difference in population mean cadmium levels, calculated as the mean for non-smokers minus the mean for smokers, is between -5.139 and -3.261 Non-smokers had a lower mean cadmium level

Estimating mu

- If we wish to estimate the mean of a normally distributed population, we can draw a random sample and calculate the mean of the sample - This is a good estimate but since a random sample involves chance, we do not expect the sample mean to equal the population mean - It may be more meaningful to communicate information about the probable magnitude of mu

Estimation

- Uses sample data to calculate a statistic - a statistic is an approximation of the parameter of the population from which the sample was drawn - inference from a sample ~point ~interval estimates

If 𝜎 is not known

- can we substitute s (the sample standard deviation) for 𝜎 and use standard normal distribution? - what if the sample size is not large?

Other unbiased estimates of their corresponding parameters:

- difference between two sample means - sample proportion - difference between two sample proportions

when using the t-distribution to determine a CI b/w two means: two situations considered to compute s(sub x-bar1 - x-bar2)

1. Population variances equal 2. Population variances not equal

The health of the bear population in a national park is monitored by periodic measurements taken from anesthetized bears. A sample of 54 bears has a mean weight of 182.9 lbs. Assuming that σ is known to be 121.8 lbs, find a 99% CI for the mean bear weight of the population.

182.9 +/- 42.7632 (140.14, 225.66) We are 99% confident that the population mean bear weight is between 140.12 and 223.66 pounds What aspect of this example is unrealistic? - in reality we may not always know the population mean or SD

If we wanted to construct an interval that contains about 95% of all possible values of x-bar we could use the value of ______ of the sampling distribution below and above the mean

2 SD

A researcher wishes to estimate the population mean of some enzyme in a certain population. The population variance of the variable is known to be 45 and the variable is approximately normally distributed. A sample of 10 individuals yielded an average of 22. What is an approximate 95% confidence interval (CI) for 𝜇 based on this sample?

22 +/- 4.24 (17.76, 26.24)

about ___% of the values of a normal distribution lie within +/- 1 SD of the mean

68%

A study was conducted to estimate hospital costs for accident victims who wore seat belts. 30 randomly selected cases have a distribution that appears to be bell-shaped with a mean of $9,004 and a standard deviation of $5,629. Construct the 99% confidence interval for the mean of all such costs. Interpret the interval.

9004 +/- 2832.78 (6171.22, 11836.78) We are 99% confident that the population mean hospital cost is between $6,171.22 and $11,836.78.

About ___% of the values lie within +/-2 SD of the mean

95%

Statistical Inference

A procedure by which we use information from a sample, which is drawn from a population, to reach a conclusion about the population

Estimation Examples

An investigator is interested in the proportion of patients with a certain disease who respond to a new treatment The Health Department wants to know the mean age of new cases of Hepatitis C

Two general areas of Statistical Inference

Estimation Hypothesis Testing

Population variances equal

Given the assumption that the two population variances are equal, the sample variances we compute are just two estimates of the same quantity -- the common variance Therefore, we can use both estimates to obtain a pooled estimate of the common variance

CI for the difference between 2 population means

If samples are drawn from 2 *independent populations*, sometimes we may want to estimate the difference between these 2 means, 𝜇1−𝜇2. From Chapter 5, we know that 𝑋-bar1−𝑋-bar2 is an unbiased estimate of 𝜇1−𝜇2 and that 𝑋-bar1−𝑋-bar2 is approximately normally distributed, so we can use the theory of normal distributions to compute the CI for 𝜇1−𝜇2 ^difference in sample means is an unbiased estimate of the difference in the true population means

Unknown Population Variance

It is almost always the case that you don't know your population mean,𝜇, (which is why we would use this estimation procedure), then you also don't know your population variance. If we don't know our population variance, can we use the normal distribution?

Sampling Distributions and Estimation

Since we are interested in using sample mean as an estimator, we can use information we learned about the sampling distribution of the sample mean, x-bar

Random vs. Non-random Samples

The data is obtained by random sample - subjects are randomly selected from the population - subjects are randomly assigned to treatment groups - the validity of statistical procedures relies on this In the real world this is not always possible. You must take this into consideration when you make conclusions - you can only generalize the results to the pool from where you selected your random sample

Confidence Intervals using t

The general interval estimate still applies: estimator +/- (reliability coefficient)x(standard error) *the reliability coefficient and Std Error change* ^reliability coefficient is now based off of the t distribution ^standard error is now based off of the sample SD

t Distribution

The t distribution, like the z distribution, has been extensively tabulated When using this distribution you must take into account both the confidence level and the degrees of freedom

What if the population variances are not equal?

Then the previous formula is not appropriate We cannot use the t distribution with df = n1 + n2 - 2, instead we must use another reliability coeffiecient t-prime

The expected value of T, E(T), is obtained by taking the average value of T computed for all possible samples of a certain size drawn from the population

Therefore E(T) = mu-subT

Pooled Variance Estimate

This pooled estimate is obtained by computing the weighted average of the two sample variances

We don't usually know ______, but we do know ____ ....

mu x-bar which is an unbiased estimator for mu We can construct an interval around a point estimate of mu, which is x-bar

When to use t

When: 1. Population variance is unknown, and 2. the sampling distribution of the statistic of interest (x-bar) is normally distributed

Why is estimation useful

Workers in the health sciences field are often interested in parameters, such as proportions or means, of different populations It is usually not feasible (due to cost and/or time limits) to sample the entire population even if it is finite

Non-normal Populations

You cannot always assume the population is normally distributed. However, the Central Limit Theorem tells us that for a large sample, the sampling distribution of 𝑋-bar is approximately normally distributed regardless of the distribution of the individuals in the population.

to solve the unknown population variance problem, way may use the ________

t-distribution

Estimate

a single computed value

point estimate

a single numerical value used to estimate the corresponding population parameter ^derived from sample population ^ex: mean, median, variance, SD, proportion, averages

the expression X-bar = (sum(xi))/n is an example of

an estimator of the population mean mu

Comparisons of the t and z distributions

mean of the t distribution equals zero (like the z distribution) the t distribution are all symmetrical about the mean (like the z distribution) t values range (−∞,∞) (like the z distribution) In general, the t distribution has a variance greater than 1, but the variance approaches 1 when the sample size becomes large - the means that the t distributions are more variable than the z distribution - x-bar and s vary sample to sample (two sources of variability) The shape of the t distribution changes with sample size. (So this means that there is a t distribution for every possible sample size.) As the sample size increases the t distribution becomes more and more like a standard normal distribution. In fact, when the sample size is infinite, the two distributions (t and z) are identical. Compared to the normal distribution the t distribution is less peaked in the center and has higher tails ^more likely that you are going to find extremes in the t than the z distribution

Factors affecting the width of a CI

n -- as the sample size increase, the width of the CI decreases ^as the sample size increases the precision increases s -- as the standard deviation (which reflects the variability of the distribution of individual observations) increases, the width of the CI increases α -- as the desired confidence level increases (α decreases), the width of the CI increases ^alpha = tail areas

CI for the Difference of two means: the number of degrees of freedom used in determining the value of t to use in constructing the interval:

n1 + n2 - 2

Unbiasedness

one criterion for choosing the best estimator

types of Estimates

point estimate interval estimate

another estimate of mu is the ________

sample median

Sampling Distribution of X-bar

the mean of the sampling distribution of x-bar is equal to mu In other words, x-bar is an unbiased estimator of mu

^how does level of confidence related to width?

the more confidence, makes it a wider interval in theory as inc level of confidence, reliability coefficient is larger that results in a wider confidence interval

Degrees of Freedom

the number of independent pieces of information that goes into the estimate of a parameter the number of *degrees of freedom (df)* for a statistic = - number of observations - number of components to be estimated in its calculation The different t distributions are characterized by their degrees of freedom, n-1

Sampled Population

the population from which you draw your sample

Target Population

the population you wish to make an inference about; the population you wish to generalize your results to

precision of the estimate

the quantity obtained by multiplying the reliability coefficient by the standard error of the mean. aka: margin of error

using the t distribution

the requirements for strictly valid use of the t distribution is that the sample must be drawn from a normally distributed population Moderate departures are fine as long as you can assume that the population has at least a mound-shaped distribution

t statistic

the result of using *s* instead of *𝜎* is a distribution with a standard deviation greater than 1, so this distribution is not a standard normal. The resulting distribution, however, is a common distribution, the t-distribution

estimator

the rule that tells us how to compute the estimate

Since the sampling distribution of x-bar is normally distributed....

we can use information we know about normal distributions

When to use z-distribution

x-bar is (approximately) normal distributed the population variance 𝜎 is known


Related study sets

Making Tough Choices: Unit Test Review

View Set