ISOM 2500 - Topic 3 (Part I)

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Use of sample data to estimate a population parameter ---> 2) Interval Estimator

An interval estimator draws inferences about a population by estimating the value of an unknown parameter using an interval The interval estimator is affected by the sample size (mainly use this one)

When population standard deviation is unknown, we measure standard error by

(s / √n)

Rules to central limit theorem

- If the population is normal, then x̄ is normally distributed for all values of n - If the population is non-normal, then x̄ is approximately normal only for larger values of n (≥ 30) - If the population is EXTREMELY non-normal (e.g. bimodal and highly skewed distributions), the sampling distribution will also be non normal even for moderately large values of n.

Sample size and interval

- Increasing the sample size by fourfold, decreases the width of the interval by half - A larger sample size provides more potential information

Drawbacks of the rejection region method

- Produces a yes or no answer

POWER of a test

- To see how well a test performs, report its power - It is the probability of (correctly) rejecting a false null hypothesis - Power of a test = 1 - β

Inference about a population proportion

...

Rejection region for one-tail and two-tail

...

What are the factors that identify the t-test and the estimator of μ

1. Problem objective: Describe a population 2. Data type: interval 3. Type of descriptive measurement: central location (mean)

What are the factors that identify the Chi-squared test and the estimator of σ^2

1. Problem objective: Describe a population 2. Data type: interval 3. Type of descriptive measurement: variability (variance)

The conclusions drawn from a hypothesis test are?

1. Rejecting the null hypothesis 2. Not rejecting the null hypothesis Rejecting the null hypothesis = strong conclusion Not rejecting the null hypothesis = weak conclusion

Procedure for hypothesis testing

1. State the hypotheses: H0 and H1 2. Select the level of significance: α 3. Determine the test statistic 4. Set the decision rule: determine the rejection (critical) region(s) 5. Take a sample and compute the value of the test statistic 6. Draw a conclusion

Recall: what is a parameter?

A numerical measure of the population

Recall: what is a statistic?

A numerical measure of the sample

Use of sample data to estimate a population parameter ---> 1) Point Estimator

A point estimator draws inferences about a population by estimating the value of an unknown parameter using a single value or point

What is a hypothesis (in statistical sense)?

A statistical hypothesis is a claim or statement about the population parameter

Estimate

An estimate is the value obtained after the observations x1, x2...xn have been substituted into the formula

Estimator

An estimator of the population, a statistic, is the use of sample values X1, X2...Xn, to estimate the parameter

When inferring about a population variance: Test statistic used to test hypotheses about σ^2 is the Chi-squared statistic

x^2 = [(n-1)s^2] / σ^2 Distributed with v = n - 1 degrees of freedom when the population random variable is normally distributed with variance equal to σ^2

The confidence interval estimator of μ when σ is unknown

x̄ ± t (a/2,n-1) (s / √n)

Rejection region for a two-tail test

z < -z (a/2) or z > z (a/2) If the value derived after standardising the test statistic is not less or greater than the values of rejection region, we CANNOT reject the null hypothesis

Population mean

μ = Σ x.P(x)

Difference between 2 sample means: Sampling distribution of x̄1 - x̄2

μ of x̄1 - x̄2 = μ1 - μ2 and σ^2 of x̄1 - x̄2 = (σ^2 of 1 / n1) + (σ^2 of 2 / n2) The standard deviation is the square root of the above

Population variance and standard deviation

σ^2 = Σ [x^2.P(x)] - μ^2 σ = √Σ ([x^2.P(x)] - μ^2)

If the population is finite, the standard error of the sampling distribution of x̄ is

σx̄ = (σ / √n) x [√(N - n) / (N-1)] N is the population size and √(N - n) / (N-1) is called the finite population correction factor Any population that is at least 20 times larger than the sample size will be treated as large

What if the population is non-normal?

Even if the population is non-normal, the results of a t-test and confidence interval estimate are still valid provided that the population is not extremely non-normal **

Sample size to estimate a mean

FILL IN

Test statistic for μ when σ is unknown

If the population SD and mean is unknown and the population is normal, the test statistic for hypothesis testing about μ is: t = (x̄ - μ) / (s / √n) Which is Student t distributed with v = n - 1 degrees of freedom

Relative Efficiency

If there are 2 unbiased estimators, the one who's variance is smaller is said to have relative efficiency

Conclusions of a test hypothesis

If we reject the null hypothesis, we conclude that there is enough statistical evidence to infer that the alternative hypothesis is true If we do not reject the null hypothesis, we conclude that there is not enough statistical evidence to infer that the alternative hypothesis is true

What are the probabilities for the Type I and Type II errors?

P (Type I error) = α P (Type II error) = β * probability of Type I error is small

Chi-squared statistic and probability statement

P (x^2) (1-α/2) < X^2 < (x^2) (α/2) = 1 - α Substituting X^2 from previously

Sample variance and SD

S^2 = [Σ (x - x̄)^2] / n - 1

What is the DECISIONS rule?

Specifies the set of values of the test statistic for which H0 is rejected in favour of H1 and the set of values for which H0 is not rejected

The null and alternative hypothesis take one of the following forms

The below is called a 2 tail-test: H0: x = x0 (simple hypothesis) H1: x ≠ x0 The below are called 1 tail-tests: H0: x ≥ x0 H1: x < x0 (lower-sided) H0: x ≤ x0 H1: x > x0 (upper sided) Everything but the simple are called composite hypotheses

The sample mean X-bar is the "chosen" estimator for the population mean μ

The computed value of these point estimators are known as point estimates An interval estimate describes a range of values within which a population parameter is likely to lie

The sampling distribution of the sample mean

The distribution of X-bar is very different from that of X Sampling distribution of X-bar is normally distributed

Sample mean compared to sample median

The interval provided by a sample mean is NARROWER compared to that of a sample median, which is what is preferred, as it provides more precise information

Note: You cannot make a probability statement about a parameter, as it is not a random variable

blank

Calculating the probability of a Type II error

fill in

Effects on β of changing α

fill in

Sample mean x-bar

fill in

When does a Type I error occur?

When we REJECT a TRUE null hypothesis

Two general procedures for making inferences about populations

1. Estimation 2. Hypothesis testing

List the two types of hypotheses

1. Null Hypothesis (H0) 2. Alternative (research) hypothesis (H1)

Two possible errors made in any hypotheses testing

1. Type I error 2. Type II error

Desirable properties of estimators (sample statistic)

1. Unbiasedness 2. Consistency 3. Relative Efficiency

Consistency

An unbiased estimator is said to be consistent IF the difference between the estimator and the parameter grows smaller as the sample size grows larger. The measure used to gauge closeness is the variance (or standard deviation). x̄ is a consistent estimator of μ, because the variance of x̄ is σ^2 / n. - As n grows larger, the variance of X-bar becomes smaller - This causes an increasing proportion of the sample means to fall close to μ (variance grows smaller) *Draw diagrams to show what happens to the sampling distribution of a consistent estimator when the sample size increases

Unbiasedness (unbiased estimator)

An unbiased estimator of a population is an estimator whose expected value is equal to that parameter Example: 1. Sample mean, x̄, is an unbiased estimator of the population mean μ: E (x̄) = μ 2. The difference between 2 sample means is an unbiased estimator of the difference between 2 population means because E(x̄1 - x̄2) = μ1 - μ2

As the n becomes larger, the shape of the sampling distribution becomes...

BELL SHAPED As n gets larger the sampling distribution of x̄ becomes increasingly bell shaped - this is a phenomenon summarised in the Central Limit Theorem

Test statistic (standardising test statistic)

Distribution of X is normal with (1) SD known and (2) SD unknown: 1. Z = (x̄ - μ) / (σ / √n) ~ N(0,1) 2. T = (x̄ - μ) / (S / √n) ~ t(n - 1) Distribution of X is NON-normal (not highly skewed) with (1) SD known and (2) SD unknown: 1. Z = (x̄ - μ) / (σ / √n) ~ N(0,1) 2. T = (x̄ - μ) / (S / √n) ~ t(n - 1) Distribution of X is NON-normal (highly skewed): Alternative methods (nonparametric techniques) Algebraically, the rejection region is: Z ≤ Za

Information and the Width of the Interval

Doubling the population standard deviation has the effect of doubling the width of the confidence interval estimate Explained: Great deal of variation in the random variable --> makes it more difficult to accurately estimate the population mean --> wider interval DECREASING the confidence level (99%, 95%, 90%), narrows the internal and INCREASING it widens the interval A wider interval is desirable, because it gives more confidence in the estimate

Larger sample sizes

Increasing sample size, makes the sample distribution narrower (because SD decreases as n increases), which is what is preferable as NARROWER distributions represent more information. MORE information = smaller probability of a Type II error Therefore, increasing sample size, also decreases the probability of a Type II error (larger sample sizes = better decisions in the long-run)

Sampling error

Is the absolute difference between the parameters and its statistic

Sampling distribution

Is the probability distribution of a sample statistic It can be created by relying on rules of probability and the laws of expected value and variance to derive the sampling distribution

P-value

Is the probability of observing a test statistic at least as extreme as the one computed given that the null hypothesis is true. Represents the smallest level of significance at which H0 can be rejected. Amount of statistical evidence that supports the alternative hypothesis. P-value = P(X-bar > 178) Use the standardising formula and the Z-table.

The critical value (rejection point)

Is the value in the critical region that separates the rejection region from the non-rejection region

What is a TEST STATISTIC?

It is a random variable whose value is used to determined whether we reject the null hypothesis

Confidence interval estimator

LCL = [(n-1)s^2] / X^2 (α/2) UCL = [(n-1)s^2] / X^2 (1-α/2)

Mean, variance and standard deviation of the sampling distribution

The mean of the sampling distribution of x̄ is equal to the mean of the population μx̄ = μ The variance of the sampling distribution of x̄ is the variance of the population divided by the sample size (n) σ^2x̄ = σ^2 / n The standard deviation of the sampling distribution of x̄ is called the Standard error of the mean, that is σx̄ = σ / √n As the sample size, n, increases the probability that the sample mean will be close to the population mean also increases

Process of hypotheses assigning

The null hypothesis will be given the benefit of the doubt (assume that it is true) - represents the status quo Alternative hypothesis is an assertion that requires evidence. List the null and alternative hypothesis for the following: 1. Person in court on trial for a crime 2. Researchers claim their product can increase its market share 3. Meat company claims they sell 5-lb packages, a bureau investigates whether they are short-changing

Estimating the population mean when the population standard deviation is known

The probability that μ is between x̄ - z(a/2) (σ / √n) AND x̄ + z(a/2) (σ / √n) is 1 - α. The probability 1 - α is called the confidence level The above interval is a random interval The confidence interval estimator is: x̄ ± z (a/2) (σ / √n) The minus sign defines the lower confidence limit and the plus sign defines the upper confidence limit With 100(1-α)% confidence, the parameter μx is between x̄ - z(a/2) (σ / √n) AND x̄ + z(a/2) (σ / √n). LCL and UCL The confidence level is the probability that the interval includes the actual value of μ, we generally set 1 - α close to 1 (usually between 0.90 and 0.99)

Rejection region

The rejection region of a test (critical region) consists of all values of the test statistic for which H0 is rejected

Central Limit Theorem

The sampling distribution of the mean of a random sample drawn from any population is approximately normal for a sufficiently large sample size. The larger the sample size, the more loosely the sampling distribution of x̄ will resemble a normal distribution n ≥ 30

Standard deviation as the standard error

The standard deviation is the standard error of the difference between two means

The value of the sample mean...

The value of the sample mean varies randomly from sample to sample, we can regard x̄ as a new random variable x̄ = x-bar --> denotes the sample mean (mean of the sampling distribution)

Note: by using statistical inference we can never PROVE anything.

There is enough statistical evidence to infer that....

The error probabilities α and β are _______ related. Fill in the dash

They are INVERSELY related. Decreasing one will increase the other and vice-versa Reducing α will increase β. Thus, for a fixed α, in order to make β smaller, we must increase sample size, n.

When do we conduct a one- and two-tail test?

Two-tail: conducted when the alternative hypothesis specifies that the mean is not equal to the value stated in the null hypothesis H0: x = x0 (simple hypothesis) H1: x ≠ x0 One-tail: conducted when the focus is on the left or right of the sampling distribution to know whether there is enough evidence to infer the mean is less or greater than the quantity specified by the null hypothesis H0: x ≥ x0 H1: x < x0 H0: x ≤ x0 H1: x > x0

A test is statistically significant when...

When the null hypothesis (H0) is rejected at the chosen significance level α

When does a Type II error occur?

When we DON'T REJECT a FALSE null hypothesis

Z and the sampling distribution of X-bar (standardising the sample mean)

Z = (x̄ - μ) / (σ / √n)

Describing the P-value

p < 0.01 = highly significant - Overwhelming evidence to infer that the alternative hypothesis is true. 0.01 < p < 0.05 = significant - Strong evidence to infer that the alternative hypothesis is true. 0.05 < p < 0.10 = not statistically significant - Weak evidence to indicate that the alternative hypothesis is true p > 0.10 - Little to no evidence to infer that the alternative hypothesis is true. *Draw a figure summarising these terms (p.365)

P value for a two-tail test

p-value = P (Z < x) + P (Z > x) Where 'x' is the value derived after standardising the test statistic: 2P (Z > |z|) z is the actual value of the test statistic and |z| is its absolute value

Rejection region with Student t-distribution

t > t (α,v)


Kaugnay na mga set ng pag-aaral

Soc Final Exam - Comprehensive Review

View Set

Future Technologies in Safety and Risk Management

View Set

Factoring Polynomials: Difference of Squares / Instruction / Summary / Assignment

View Set

Comma Splices and Fused Sentences (run ons)

View Set

Chapter 62: Musculoskeletal Trauma and Orthopedic Surgery (Lewis)

View Set