STA215 Test 2

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Suppose a student measuring the boiling temperature of a certain liquid observes the readings (in degrees Celsius) 102.5, 101.7, 103.1, 100.9, 100.5, and 102.2 on 6 different samples of the liquid. He calculates the sample mean to be 101.8167. If he knows that the distribution of boiling points is normal, with standard deviation 1.2 degrees, what is the 95% confidence interval for the population mean?

(101.0287, 102.6046)

How is sample x's from normal distribution written?

X ~ N(mu, sigma) Their z-scores are a sample of z's from the standard normal distribution : Z ~ N(0,1)

When determining the confidence interval for mu, when sigma is known, what criteria must be met?

- Original population is normal - Small or large sample size

The amount of impurity in a batch of a chemical product is a random variable with mean value 4.0 g and standard deviation 1.5 g. (unknown distribution) If 50 batches are independently prepared, what is the (approximate) probability that the average amount of impurity in these 50 batches is between 3.5 and 3.8 g?

0.1671

Discuss a 100(1-a)% confidence interval.

100(1-a)% CI for mu is x +- za/2 sigma/rt(n). This confidence interval formula is valid if we have : • A random sample from a normal distribution • The variance is known • All observations are independent • 1-a is called the confidence coefficient • 100(1-a)% is called the confidence level

What is a random variable?

A random variable is a variable that is the outcome of a chance experiment. Think of a random variable as a description of the characteristic you will observe in your experiment. A random variable is denoted using a capital letter, X, Y, etc. Most experiments involve more than one random variable. A random variable is not a number. But it becomes a number after it's been observed. It may be: • Ratio • Interval • A code for a category, Yes =1 and No =0 Observations of random variable X are denoted x. Observations of random variable Y are denoted y. Etc. Examples : Random Variable • Let X be "height of adult males" • Let Y be "sex of adult" coded 1 = F and 0 = Male • Let V be "# heads when 6 different coins are tossed" Observations • x=168, x=173 • y=1, y=1, y=0 • v=2, v=6, v=0

The body length of the female great white shark has sigma = 0.56m. It's believed that the mean body length of the great white shark is mu = 4m. (the males are substantially shorter) A. Prof. Whale believes the mean body length of female great white sharks is longer than 4m. He samples 50, and calculates 𝑥= 4.17m B. Prof. Salmon believes the mean body length of female great white sharks is shorter than 4m. She samples 75 and calculates 𝑥= 3.98m. C. Prof. Guppy believes the mean body length of female great white sharks is not 4m. (She's not prepared to argue that it's longer or shorter than 4m.) She samples 60 and calculates 𝑥= 4.08m.

A. picture B. Picture C. Picture

Describe the process of calculating a test statistic.

Determine the form of an appropriate test statistic by finding a single random variable that: 1. Is formed using both the sample data and the null hypothesis 2. Has a completely known distribution Calculate the observed value of the test statistic. There are different test statistics for different situations.

What is a sample mean?

Different samples, all of the same size, have different sample means. → A sample mean is a random variable.

Grades on a standardized test have population mean mu = 75 and population standard deviation sigma = 15. Ella earns 87 on the test. What is her z-score? Tony earns 54 on the test. What is his z-score? The calculus test was difficult. The mean mark was 56, with standard deviation 15. Ella scored 59. The biology test was easy. The mean mark was 81, with standard deviation 12. Ella scored 83. Which test did Ella do better on?

Ella's z-score is 0.8 standard deviations above the mean Tony's z-score is 1.4 standard deviations below the mean z-score calculus : 0.20 sd above mean z-score biology : 0.1667 sd above mean Ella did better in calculus

Define the two hypothesis.

Ha • Called the alternative or research hypothesis • This is what you believe/hope is true about your research question (usually) Ho • Called the null (empty) hypothesis • This is the negation of your research hypothesis • Contains equals sign

A biologist wants to know if sunflower-seedlings that are treated with Vinca minor have a lower height (on average) than the standard sunflowerseedling height of 15.7 cm. The biologist knows: (a) That the heights of the treated plants have a normal distribution, and (b) that the standard deviation of heights for the treated plants is 2.5cm. The biologist treated a random sample of n = 55 rooted sunflower-seeds with Vinca minor, waited for them to develop into seedlings, and then measured the heights. The sample mean height was 13.54cm. Determine the Ha and Ho. Assume Ho is true. Find the test statistic. Determine the p-value. Decide between Ha and Ho. Make a real-world conclusion.

Ho: mu ≥ 15.7 Ha: mu < 15.7 Or, more commonly written Ho: mu = 15.7 Ha: mu < 15.7 Assume mu = 15.7 → Heights of the treated plants have a normal distribution with mean 15.7cm and standard deviation 2.5cm. → The sample mean is a single observation from a normal distribution with mean 15.7cm and standard deviation 2.5/ rt55 = 0.3371cm Known: The seedling heights have a normal distribution with = 2.5cm. Assumed: The seedling heights have . mu(o) = 15.7 cm. X ~ N(15.7, 2.5/rt55) Z = X-mu(o) / (sigma/rtn) = X - 15.7 / (2.5 / rt55) ~ N(0,1) observed test statistic = z* = 13.54 -15.7/ (2.5/rt55) = -6.4076 The null hypothesis was Ho: mu ≥ 15.7, so sample means substantially less than 15.7 do not support Ho. We observed 𝑥= 13.54, which does not support Ho. If we'd observed 𝑥< 13.54, we'd have less support for Ho than our sample. → The p-value equals , P (X </= 13.54) OR P(Z </= -6.4076) = 0 The observed test statistic is -6.4076. And this is one observation from the standard normal distribution. The alternative hypothesis is Ha: mu < 15.7. The p-value is p=P(Z < -6.4076) = 0. So we reject Ho. There is strong evidence that the mean height of sunflowerseedlings that have been treated with Vinca minor is lower height than the standard sunflower-seedling mean height of 15.7 cm.

Discuss significance levels.

If we use the decision rule "reject Ho, if p < 0.05", then: • We are conducting a hypothesis test using significance level a = 0.05 • The probability of making a Type I error is 0.05. Other common significance levels are a = 0.01 and a = 0.10.

Discuss sample size in relation to the original population.

Large sample size with normal distribution : tn-1 Large sample size with not normal distribution : z or tn-1 (by CLT and some other theorems) Small sample size with normal distribution : tn-1 Small sample size with not normal distribution : see a statistician

Scores on a standardized test have mean mu = 65 and standard deviation sigma = 12. Suppose a lot of people each take a sample of 9 test scores, and each person calculates their sample mean. The results might look something like this: x1 = 63.4, x2 = 68.7, x3 = 71.1, x4 = 62.6, x5 = 66.3 What is the mean of the sample means? What is the standard deviation / standard error of the mean?

Mean : mu(x) = mu = 65 SD : sigma(x) = sigma/ rt(n) = 12/rt(9) = 4

What is probability notation?

P (a < X < b) In words... • The probability the random variable X is between a and b. The idea... • The probability that when we take an observation of random variable X, the outcome will be somewhere between a and b (exclusive)

Calculate the confidence interval for the mean of a normal population where the variance is known. Let X1, X2, ....Xn be iid N(mu, sigma) where mu is unknown and sigma is known. We know that Z = X - mu/(sigma/rtn) ~ N(0,1) We also know that P(-1.96 < Z < 1.96)

P(-1.96 < Z < 1.96) = 0.95 P(-1.96 < X - mu /(sigma/rt(n)) < 1.96) = 0.95 P(X-1.96sigma/rt(n) < mu < X + 1.96sigma/rt(n)) = 0.95 This is a random interval X +-1.96sigma/rt(n). The interval is random since is random due to sampling. The population mean mu is a fixed, but unknown, number. The probability the random interval captures mu is 0.95. 95% of all samples give an interval that captures mu, and 5% of all samples give an interval that does not capture mu. But, we only see one sample .... This is NOT a random interval x +- 1.96sigma/rt(n). The probability this interval captures mu is either 1 or 0.

What are probabilities in density functions/distribution graphs?

Probabilities are areas under the curve, above the x-axis, between two numbers

What are density functions?

Probability distributions, formally called density functions, are line graphs that display information for the population that generated the sample. Think of a probability distribution as a histogram for a individuals in the infinite population.

What are possible errors in hypothesis testing?

Reject Ho when Ho is true, and Fail to reject Ho when Ho is false

Why do we subtract mu (population mean) when calculating z-score?

Remember graphing f(x) and f(x-3) in high school ... the -3 shifts the graph 3 units to the right. f(x-3) : symmetry about x = mu = 3 f(x) : symmetry about x = mu = 0

Why do we divide by sigma when calculating z-scores?

Remember graphing f(x) and f(x/3) in high school ... the 3 (horizontally) stretches the graph by a factor of 3. f(x) : spread sigma = 1/3 f(x/3) : spread sigma = 3 x 1/3 = 1

Describe the steps in a Hypothesis Test.

Step 1: Define Two Hypotheses (Ho and Ha) Step 2: Assume Ho is True (to see if data contradicts this) Step 3: Calculate the observed Test Statistic Step 4: Determine the p-Value (using Ha) Step 5: Decide between Ho and Ha (decide if data contradicts Ho) Step 6: Make a Real-World Conclusion

What is a z-score?

Suppose X is a random variable from a population with population mean, mu, and population standard deviation, sigma. z-scores have no units. z-scores measure the distance of data values from the mean in standard deviations. A negative z-score tells us that the data value is below the mean; a positive z-score tells us that the data value is above the mean. We can make z-scores for data from any distribution, but the zscores are only normal if the original distribution was also normal. x has z-score : z = x-mu/sigma

What are one and two sided tests?

Suppose X1 , X 2, ..., Xn ~ iid N(mu, sigma), and suppose that we know sigma. Null Hypothesis: : Ho mu = mu(o). The observed test statistic is z*=x-mu(o)/(sigma/rtn). Under Ho, z* is one observation from the N(0,1) distribution. If Ha : mu > mu(o), then p= P(Z > z*) If Ha : mu < mu(o), then p =P(Z < z*)If Ha : mu does not equal mu(o), then p = 2P(Z>|z*|)

What is the Central Limit Theorem?

The Central Limit Theorem states that the sampling distribution of the sample means approaches a normal distribution as the sample size gets larger— no matter what the shape of the population distribution. Take a large sample (size n) from a population with mean mu and standard deviation sigma. Note - this population does not need to have a normal distribution. Calculate the sample mean. The sample mean is a single observation from a nearly normal distribution with mean mu and standard deviation sigma/ rt(n). If the original population is normal, then the distribution of sample means is exactly normal. The bigger the sample from the original population, the closer the distribution of sample means will be to a normal distribution. Some people say that a sample of n ≥ 30 is big enough to get a nearly normal distribution for the sample mean ...... (but a sample less than 30 is not....). https://www.youtube.com/watch?v=Pujol1yC1_A http://onlinestatbook.com/2/sampling_distributions/clt_demo.html

How do you interpret density functions?

The P(a < X < b) is the area under the density function, above the horizontal axis. There is low probability of observing an x in intervals of X where the density function is low. There is high probability of observing an x in intervals of X where the density function is high. The total area under a density function is 1. Example : for x values between two and five, the area under the density function would equal P(2 < X < 5)

What is the standard error of the mean?

The mean of the sample means, mu(x) , is equal to the population mean mu. mu(x) = mu The standard deviation of the sample means, sigma(x) , is equal to the population standard deviation, sigma, divided by the square root of the sample size. sigma(x) = sigma / rt(n) Note: sigma(x) is often called the standard error of the mean.

Discuss how you decide between Ho and Ha.

The p-value of a hypothesis test is the probability of observing another sample that is at least as non-supportive of Ho as the observed sample, under the assumption that Ho is true. OR p-value = P(observing our sample, or a more extreme one|Ho = T). If the p-value is small, then either Case (a) or Case (b) happened: Case (a) Ho is true, and the observed sample (or a more extreme sample) has small probability .... OR Case (b) The observed sample (or a more extreme sample) does not have small probability, and Ho is not true • Case (a) says that something with a small probability must have happened. This is a "statistical contradiction". We don't think case (a) occurred. • Case (b) says that we were probably wrong when we assumed Ho is true. We avoid the "statistical contradiction" by believing that case (b) probably occurred. If the p-value is not small, then the observed sample (or a more extreme sample) does not have small probability of being observed under Ho. Small p-value → we have evidence that Ho is false. Reject Ho. Big p-value → we have no evidence that Ho is false. Fail to Reject Ho.

Discuss a real life example of the normal distribution.

Virtually no real life random variables have an exact normal distribution. IQ scores were designed to have a N(100,15) distribution, but that was in 1904 (Alfred Binet, a French psychologist) and we've gotten smarter, on average, and negatively skewed to the since then. But many random variables are nearly normal. Histograms look bell shaped, unimodal and symmetric.

Discuss the empirical rule.

There is a simple rule that tells us a lot about a Normal distribution... • about 68% of the values fall within one standard deviation of the mean; • about 95% of the values fall within two standard deviations of the mean; and, • about 99.7% (almost all!) of the values fall within three standard deviations of the mean.

Describe why we must assume Ho is true. .

This step sets up our attempt at a proof by contradiction. We hope the data contradicts this assumption; if so it seems that Ha must be true.

In most research is the population mean known or unknown?

Unknown

How do we determine if our data is normal or nearly normal?

Use plots to test this: • Histogram. It should be unimodal, roughly bell shaped, and symmetric • Boxplot. It should be symmetric about the median, with only about 0.07% outliers. • Normal Probability, or QQ, Plot. This is a scatterplot of n ordered pairs, (x, y). The x's are expected standard normal observations (in order from smallest to largest). The y's are the observations (in order from smallest to largest). The plot should look like a straight line with positive slope. It should not be S-shaped. It can be a bit wobbly in the extreme tails, but that's it. If the plots are quite bad, then we can conclude the data is not likely to be nearly normal. The worse the plots are, the less likely it is that the data is nearly normal. If the plots all look great, then we can't say the data is from a normal distribution. All we can say is that there is no evidence the data is not from a normal distribution. We accept more leeway from smaller samples than we do from larger samples. How close is it to the prototypes? - Histogram must be unimodel - Boxplot must show even distribution - QQ plot must be straight line - Large sample size

What is the use of z-tables?

We can use z-tables to find standard normal probabilities. Have z-score, want probability : P (Z< 0.63) = 0.7357 Have probability, want z score : The k% of all z-scores are smaller than the kth percentile. P(Z < 23) = 0.59 -> 0.23 is in the 59th percentile General Normal Probabilities Have x observation, want probability : If X ~ N(4,12), then P( X < 5.5) = P(X-mu/sigma < 5.5-4/12) = P(Z < 0.125) = 0.55 x=zsigma + mu x = x-mu/sigma https://www.youtube.com/watch?v=zZWd56VlN7w https://www.youtube.com/watch?v=YhjzwySWF9c

How do we estimate the population mean?

We estimate the population mean, mu, using the sample mean, 𝑥. 𝑥 is called a point estimate of mu. Things to think about: • 𝑥 almost certainly wont equal mu. • Do we think 𝑥 is close to mu? or far from mu? • An interval estimate, 𝑥 ± 𝑒𝑟𝑟𝑜𝑟 would be useful

It is known that in normal 20°C water, a goldfish's average gill movements is 66 times per minute, but if the water is 30°C, gill movements increase to about 78 times per minute. Researchers investigate the number of gill movements in 25°C water. They observe x = 71.46 and s = 2.8 using a sample of n = 15 goldfish. Assume the number of gill movements per minute is normally distributed. Determine confidence interval. Using the data, test the hypothesis that in 25°C water the mean gill movement rate is greater than 70 times per minute.

We use the tn-1 , or t 15-1= t 14 distribution. t14;0.025= 2.1448 A 95% CI for the mean number of gill movements per minute in 25°C water, is x +- tn-1;0.025 s/rtn 71.46 +- 2.1448 x 2.8/rt15 = (69.909, 73.011) Picture

What is a confidence interval?

We want interval estimates to: (a) contain the population mean and (b) be narrow so we can make decisions But an interval estimate must be constructed using sample data, so they are random and the end points of the interval will vary from sample to sample. We settle for an interval estimate that has: (a) a high probability of containing the population mean and (b) the ability to generate narrow intervals Such an interval estimator is called a Confidence Interval

Discuss the width of a confidence interval.

We want: • CI to be wide → good chance our data captures m • CI to be narrow → can make decisions based on it Things that effect the width of a CI: • a, if a is big then (1-a) is small and CI is narrow ...but we have lousy chance our data captures mu • sigma, if sigma is big, then CI is wide ... if we're using the right population, we can't control this • n, if n is big, then CI is narrow ... use as large n as you can afford

When is the hypothesis test process valid?

When we have : • A random sample from a normal distribution OR if the sample is large enough for the CLT to apply • The variance is known • All observations are independent

x values and Percentiles from Probabilities Have probability, want x value : If X ~ N(4,12), then a. What is the 20 percentile of X? b. What is the value of x , such that the probability a randomly chosen X will be greater than x is 0.63?

a. From the z-tables, P(Z < -0.84) = 0.20 Using the relationship between x and z, x=zsigma + mu, we have x=-0.84 x 12 + 4 = -6.08 b. From the z-tables, P(Z < 0.33) = 0.63 By symmetry, P(Z > -0.33) = 0.63 Using the relationship between x and z, x=zsigma + mu, we have x=-0.33 x 12 + 4 = 0.04

A research neurologist wishes to test the effect of a new drug on response time. She randomly selects 100 rats, injects each rat with one unit of the drug, and then subjects each rat to a neurological stimulus. The response time (time to react to the stimulus) of each rat, in seconds, is recorded. It is known that mean response time for rats not injected with any drug (control group) is 1.2 seconds. The neurologist is interested in determining if the drug is effective. Use a significance level of 5%. The data is attached in the file "DrugRat.csv". Is the drug effective?

https://l.facebook.com/l.php?u=https%3A%2F%2Fwww.khanacademy.org%2Fmath%2Fstatistics-probability%2Fsignificance-tests-one-sample%2Fmore-significance-testing-videos%2Fv%2Fhypothesis-testing-and-p-values%3Futm_campaign%3DProbabilityandStatistics%26utm_medium%3DDesc%26utm_source%3DYT%26fbclid%3DIwAR2ISEc4SsCHln6YMSBHVYetbXvgywGn5hS0CKP5qOjvhk9mXdNEb0GVacE&h=AT0x1c4DVVdzp9-qKEXVQTdszp2IbCZ98QWhjog4NGodWpTjFdeO0Je-tD-zDWWX_dORV9m3TWgmo7yGGjMAs7M9oA-P4o0sDbM2-XsKdJZW5kwX-PsVtLfrJNC_a4FTG4BJiaHl185HLMz-9BEtlA

Interpret the values from the R outputs in R Assignment 2.

https://l.facebook.com/l.php?u=https%3A%2F%2Fwww.khanacademy.org%2Fmath%2Fstatistics-probability%2Fsignificance-tests-one-sample%2Fmore-significance-testing-videos%2Fv%2Fhypothesis-testing-and-p-values%3Futm_campaign%3DProbabilityandStatistics%26utm_medium%3DDesc%26utm_source%3DYT%26fbclid%3DIwAR2ISEc4SsCHln6YMSBHVYetbXvgywGn5hS0CKP5qOjvhk9mXdNEb0GVacE&h=AT0x1c4DVVdzp9-qKEXVQTdszp2IbCZ98QWhjog4NGodWpTjFdeO0Je-tD-zDWWX_dORV9m3TWgmo7yGGjMAs7M9oA-P4o0sDbM2-XsKdJZW5kwX-PsVtLfrJNC_a4FTG4BJiaHl185HLMz-9BEtlA

Discuss the p-value scale in terms of Ho.

p ≈ 1: If Ho is true, it's almost certain that the observed data (or more extreme) will appear p = 0.5: If Ho is true, the observed data (or more extreme) will appear about 50% of the time p ≈ 0: If Ho is true, it's almost certain that the observed data (or more extreme) won't appear p = 0.10: slight evidence against Ho p=0.07: some evidence against Ho p=0.05: Evidence against Ho p=0.01: Strong evidence against Ho p=0.001: Very strong evidence against Ho

It is known that in normal 20°C water, a goldfish's average gill movements is 66 times per minute, but if the water is 30°C, gill movements increase to about 78 times per minute. Researchers investigate the number of gill movements in 25°C water. They observe x = 68.03 and s = 4.2 using a sample of n = 78 goldfish. The distribution of the number of gill movements per minute is unknown. Determine confidence interval and hypothesis test.

picture

What is the equivalence of CIs and HTs?

• If mu(o) is in the central 100(1-a)% CI for mu, we FTR Ho: mu = mu(o) • If mu(o) is not in the 100(1-a)% CI for mu, we Reject Ho: mu = mu(o) in favour of Ha: mu ≠ mu(o) and the p-value is < a

When determining the hypothesis tests about mu, normal population, sigma known, what criteria must be met?

• Original population is normal • Small or large sample size

For a normal distribution curve density function who's x values are denoted to the variable Z, peak is at x=0,right and left bottom of the curve touch the axis around x=-3 and x=3, and is shaded for us to focus on values between -2 and 1, discuss.

• Probabilities are areas under the curve, above the z axis, between two numbers. • The probability an observation of Z is between -2 and +1, is the shaded area in the graph. We write the probability like this : P(-2 < Z < 1)

Describe a general distribution curve density function who's x values are denoted to the variable x : - has a green curve whose peak is at x=-2, and right and left bottom of the curve touch the axis around x=-3.5 and x=-0.5, N(-2, 1/2) - has a red curve whose peak is at x=0, and right and left bottom of the curve touch the axis around x=-5 and x=5, N(0, 2) - has a blue curve whose peak is at x=0, and right and left bottom of the curve touch the axis around x=-3 and x=3, N(0, 1) - has a purple curve whose peak is at x=1, and right and left bottom of the curve touch the axis around x=0 and x=2, N(1,1/4)

• Random variable X • The bell curve • Centre at x = mu • Symmetry about x = mu • Inflection points at x = mu +- sigma • Area under curve =1 • Notation X ~ N(mu, sigma)

Describe a normal distribution curve density function who's x values are denoted to the variable Z, peak is at x=0, and right and left bottom of the curve touch the axis around x=-3 and x=3.

• Random variable, Z • A bell curve • Centre at z = 0 • Symmetry about z = 0 • Inflection points at z = +- 1 • Area under curve =1 • Notation Z ~ N(0,1) • The curve is high in intervals around z = 0, so there is high probability an observation of Z will be in an interval around 0. • The curve is low in intervals around z = 3 and z = -3, so there is low probability an observation of Z will be in an interval around 3 or -3.

What is the t distribution?

• Similar to the normal distribution, but lower probability around zero and higher probability in the tails. • Different shapes for different sample sizes, as n increases the t -distribution approaches the normal distribution.


Ensembles d'études connexes

MACROecon FINAL prep (Tests 1-4)

View Set

A Primer on Antitrust and Securities Laws

View Set