STAT EXAM 2

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

The final row includes

1.645, 1.96, & 2.576. These are "common confidence levels" & z*.

What is the minimum sample size required for the Central Limit Theorem?

40

The standard deviation of a variable is 20. The sample size is 100. Suppose we are testing a claim that the true mean = 40. Which of the following sample means gives the most evidence against the claim?

46 Correct, 46 produces the largest test statistic and smallest p-value.

What percentage of the volunteers are Current smokers or from the Upper-class?

83%

Women's heights are Normally distributed with mean 64 and standard deviation 3. Which event is less likely?

A random sample of 100 women having an average height above 67"

Example: Standard Error

A random sample of 49 students reported receiving an average of 7.2 hours of sleep nightly with a standard deviation of 1.74. What is the standard error of the mean?

random variable

A random variable is a variable whose value is a numerical outcome of a random phenomenon.

statistic

A statistic is a number that can be computed from the sample data. (e.g., the average number of text messages sent yesterday by a random sample of OSU juniors).

Test Statistic

A test statistic calculated from the sample data measures how far the data departs from what we would expect if the null hypothesis was true. The further this statistic is from 0, the more the data contradicts the null hypothesis. Note: A test statistic tells us how many standard deviations our value is away from the hypothesized mean. A positive test statistic is above the mean. A negative one is below the mean. We then use this information to figure out how likely it is to see results like ours if the null hypothesis was true.

Which of the following is not a component of a confidence interval for a population mean with an unknown population standard deviation?

A value for "sigma."

Example: Internet Access

An A.C. Nielsen study found that 81% of households in the United States have computers. Of those 81%, 92% have Internet access. Calculate the probability that a randomly selected U.S. household has a computer and has Internet access. Let C =household has a computer. Let I = household has Internet access. P(C and I) = P(C)×P(I |C) = (0.81)(0.92) = 0.7452

alternative hypothesis

An alternative hypothesis is one-sided if it states that a parameter is larger than or smaller than the hypothesized value.

alternative hypothesis is two-sided if it

An alternative hypothesis is two-sided if it states that the parameter is different from the hypothesized value. (It could be either smaller or larger.)

event

An event is an outcome or a set of outcomes of a random phenomenon. That is, an event is a subset of the sample space. For an event A, the probability that A occurs is denoted P(A).

Significance Level

Question: What constitutes as the evidence being "likely," "unlikely," or, "extremely unlikely?" Generally... "Likely" P-value. > .10 "Unlikely" .05 < P-value ≤ .10 "Extremely Unlikely" P-Value ≤ .05 The quantity α is called the significance level (or the level of significance). If the P-value is as small or smaller than α, we say that the data are statistically significant at level α. After choosing an appropriate level of significance, we can make a decision about H0. P-Value vs. α Decisions about H0 P-value > α Fail to Reject Ho P-value ≤ α Reject Ho Question: Why should a significance level be set before the test has been done? Suppose that your P-value = 0.025. If α = 0.05, you would reject the null hypothesis; if α = 0.01, you would fail to reject the null hypothesis. If you did not set a significance level before the test, you might change your mind based on the results to fit the decision you (might) desire. The test statistic for hypothesis testing is based upon our work from sampling distributions and confidence intervals.

Example: Hemoglobin levels

Recall from earlier that Hb levels are normally distributed. Our original example featured a random sample of 11 boys from an underserved country that had an average hemoglobin level of 11.3 g/dl with a standard deviation of 1.5. Is there significant evidence, at the .05 level of significance that the average Hb level for boys from this country is below 12, which results in ___anemia________? State: Are boys from this underserved country anemic (i.e., Hb μ < 12 g/dl)? Plan: a.) Identify the parameter. µ= mean Hb level for boys from this underserved country. b) List all given information from the data collected. n=11, sd=1.5 State: Are boys from this underserved country anemic (i.e., Hb μ < 12 g/dl)? Plan: a.) Identify the parameter. µ= mean Hb level for boys from this underserved country. b) List all given information from the data collected. n=11, sd=1.5 c) State the null (H0) and alternative (HA) hypotheses. H0: µ = 12 Ha :µ < 12 d) Specify the level of significance. α =.05 e) Determine the type of test. Left-tailed Right-tailed Two-Tailed Solve: Check the conditions for the test you plan to use. Random Sample? Yes. Stated as a random sample. Large enough population: sample ratio? Yes. The number of boys is arbitrarily large; therefore, N > 20*11 = 220. Large enough sample; Normal or t-distribution? Yes. n = 11 < 40. But data is Normal, so we can use t-distribution.

Decreasing the Margin of Error

Smaller margins of error are quite desirable and can be attained mathematically through... Lower confidence levels These result in smaller values of z*. Smaller standard deviations Less error in the data, smaller values of σ. Increased sample sizes This results in dividing σ by a larger number. Quadrupling the sample size cuts the margin of error in half.

population distribution

The population distribution of a variable is the distribution of the values of the variable among all the individuals in the population

Suppose a test of significance desires to see if there is a difference in mean SSHA scores between men and women. Which is the appropriate alternative hypothesis?

The population mean for women does not equal the population mean for men.

The principles of statistical inference allow

The principles of statistical inference allow give or take "room for error" +/- "room for error" +/- "margin of error"

probability distribution

The probability distribution of a random variable X tells us what values X can take and how to assign probabilities to those values.

The probability of any outcome of a random phenomenon is

The probability of any outcome of a random phenomenon is the proportion of times the outcome would occur in a very long series of repetitions.

General Multiplication Rule

The probability that both of two events A and B happen together can be found by Here P(B|A) is the conditional probability that B occurs given the information that A occurs.

Definition: P-value

The probability, computed assuming that the null hypothesis is true, that the test statistic would take a value as extreme or more extreme than that actually observed is called the P-value (probability value) of the test.

Which of the following is not a simple condition for inference about a population mean?

The sample mean equals the margin of error.

The t-distribution

Draw a SRS of size n from a large population that has the Normal distribution with mean μ and standard deviation σ. The one-sample t statistic has the t distribution with n - 1 degrees of freedom.

Which of the following will decrease a P-value?

Increasing the sample size

Two events, P(A|B) and P(B), can be best represented by:

adjoining branches in a Tree Diagram.

Identify a value of n that would produce a larger Test Statistic

n=100_(or any n> 52)

Identify a value of n that would produce more evidence against H0

n=100_(or any n> 52)

Identify a value of n that would produce a larger P-value

n=25__(or any n< 52)

Miguel used his own judgment when he stated, "There is a 30% chance that the next Ohio State University President has Ohio roots." This is an example of

personal probability.

μ

population mean

We will use sample statistics to estimate what

population parameters.

σ

population standard deviation

A North American roulette wheel has 38 slots, of which 18 are red, 18 are black, and 2 are green. If you bet on red, the probability of winning is 18/38 = 0.4737. The probability 0.4737 represents

the proportion of times this event will occur in a very long series of individual bets on red.

As the number of coin flips increases, the proportion of heads should become closer to 50%.

true

Example: Global Warming

"And" question: What is the probability that a randomly selected respondent is a Midwesterner who agrees that Global Warming increased temperatures during December 2011 and January 2012? Let M = Midwesterner, and let A = Agrees that Global Warming Increased Temperatures "Conditional" question: What is the probability that a Midwesterner agrees that Global Warming increased temperatures during December 2011 and January 2012? Use the Conditional Prob. Rule. =

When the exact df is not listed,

"round down" and use the closest df that does not exceed the df that is desired.

Which confidence interval provides evidence (at alpha = 1-c) that the mean of the first population is significantly larger than the mean for the second population?

(+LB, +UB)

What is the probability that both the #3 and #1 Teams win?

.56 It is the region that is outside of the 'overlapping circles' (or a similar phrase).

The U.S. Census Bureau reported that 86.5% of adults in the U.S. age 65 and older are Caucasian. 98.5% of them have health insurance. What percent are Caucasians age 65 and older AND do not have health insurance?

0.013

Suppose that 3% of all athletes are using the endurance-enhancing hormone EPO (you should be able to simply compute the percentage of all athletes that are not using EPO). For our purposes, a "positive" test result is one that indicates presence of EPO in an athlete's bloodstream. The probability of a positive result, given the presence of EPO is .99. The probability of a negative result, when EPO is not present, is .90. What is the probability that a randomly selected athlete tests positive for EPO?

0.1267

What is the probability that a randomly selected registered voter is female and Republican?

0.1384

What is the probability that a randomly selected female is Republican?

0.2537

The probability that the freshman is studying Chinese or Swahili is

0.31

The probability that the freshman is studying Spanish is

0.48

What is the probability that a randomly selected registered Republican is female?

0.5406

A study of the students taking distance learning courses at a university finds that they are mostly older students not living in the university town. Let A be the event that the student is 25 years or older and B be the event that the student is local. The study finds that P(A) = 0.70, P(B) = 0.25, and P(A and B) = 0.05. The probability that a randomly selected distance learning student is 25 years or older or is local is

0.90

probability model

A probability model is a mathematical description of a random phenomenon consisting of two parts:

Conditions (and cautions) for Inference About a Mean

1. We have an SRS from the population of interest. The SRS may not be perfect. 2. The variable we measure has an exactly Normal distribution N(μ, σ) in the population. An exact Normal distribution is not always attainable. 3. We don't know the population mean μ. But we do know the population standard deviation σ. Slightly odd that we'd know but not . The three simple conditions will often be "assumptions" even though there is room for some doubt.

Gardeners often use either chemicals or fences to protect their plants from pests. 60 gardeners were asked about their plant protection method and its success. Three of the 13 who used chemicals had pests. Seven of the 47 who used fences had pests. Please answer both questions. i) What percent of gardeners, overall, had pests? ii) Which protection method was more effective?

16.67%, fences

What percentage of the volunteers are neither Current smokers nor from the Upper-class?

17%

An automobile executive wishes to estimate the mean number of miles on four-year old Hybrid vehicles. What is the minimum sample size required in order to estimate this within 1000 miles with 90% confidence, assuming that σ is 19700?

2 2 * 1.645*19700 1050.18 round to 1051 vehicle

Robustness of t Procedures

A confidence interval or significance test is called robust if the confidence interval or P-value does not change very much when the conditions for use of the procedure are violated. Except in the case of small samples, the condition that the data are an SRS from the population of interest is more important than the condition that the population distribution is Normal. The t-procedures guard against non-Normality except when there is strong skewness or outliers present. When the data are not from a Normal distribution we also need to consider the sample size: Note that we have changed the "large enough sample" condition to be adaptable to the situations that we encounter. This is because t procedures are robust against violations of Normality.

confidence level C

A confidence level C, which gives the probability that the interval will capture the true parameter in repeated samples. That is, the confidence level is the success rate for the method (not for an individual interval).

Continuous Probability Models

A continuous probability model assigns probabilities as areas under a density curve. The area under the curve and above any range of values is the probability of an outcome in that range.

density curve

A density curve is the overall pattern of a distribution. The area under the curve for a given range of values along the x- axis is the proportion of the population that falls in that range. A density curve has total area 1 underneath it.

Which of the following increases the margin of error?

A larger standard deviation

parameter

A parameter is a number that describes a characteristic of the population. (e.g., the average number of text messages sent yesterday by all Ohio State students). Often the value of a parameter is unknown because we cannot examine the entire population.

personal probability

A personal probability is a number between 0 and 1 that expresses someone's judgment of an event's likelihood. Example: I believe there is a 20% chance of precipitation tomorrow.

Example: Carry-on luggage

Airlines are now monitoring the amount of carry-on luggage passengers bring with them. It is believed that the mean weight of carry-on luggage for passengers on multiple hour flights is 30 lbs. with a standard deviation of 7.5 lbs. A random sample of 500,000 passengers who had recently flown on multiple hour flights had an average carry-on luggage weight of 29.9 lbs. The test statistic is -9.43 with a P-value of 0. There is a statistically significant reason to reject the H0 and believe that the mean weight of carry-on luggage is not 30 lbs. But, practically, the sample mean (29.9) and the population mean (30.0) are quite comparable. The P-Value for a z-score of -9.43 is essentially 0. Statistically significant, but not practically significant.

A level C confidence interval for a parameter has two parts:

An interval calculated from the data, usually of the form estimate ± margin of error

A Connection between Confidence Intervals and Significance Tests

Analogous to how we use high levels of confidence for confidence intervals, we need strong evidence (and very small p-values) to reject null hypotheses. Standard levels of confidence are 90%, 95%, and 99%. Standard levels of significance are 10%, 5%, and 1%. Recall from last chapter: more than 10% was a "likely" event. 5% to 10% was an "unlikely" event. less than 5% was an "extremely unlikely" event.

X (line over it)

As mentioned before, x is an unbiased estimator of μ.

square root of n

Because the square root of n is in the denominator, we also know that the results of large samples are less variable than the results of small samples.

Construct a 95% confidence interval for the average difference in initial and final pulse rates.

Because we rejected the null hypothesis, we will now construct a 95% confidence interval for the difference in pulse rates. We find t*(11, .95)=2.201 by looking across the df = 11 row and down the 95% confidence level column.

Proportions and Probabilities

Beginning with Chapter 12, we will transition away from what proportion of persons have some characteristic to what is the probability that some event occurs.

The Complement Rule and Addition Rule

Calculate the probability that a randomly selected person is neither right-hand dominant nor female. Calculate the probability that a randomly selected person is neither right-hand dominant nor female. P(neither R nor F) = 1 - P(R or F) = 1 - 0.937 = 0.063 We just used the complement rule: For any event A, P(A does not occur) = P(not A) = 1 - P(A).

Large enough population: sample ratio

Is the population of interest at least 20 times the sample size?

As the sample size increases, the variability of the sample mean _____

Decreases

Which of the following will decrease the margin of error for a confidence interval?

Decreasing the confidence level Increasing the sample size

Which of the following will increase the margin of error for a confidence interval?

Decreasing the sample size

What is the probability that the mean score of your sample is between 22 and 28?

Define X to be the mean score of our sample of 25 students. Similar to (a), 22 25 25 28 25 (22 28) .9808 1.28 1.28 1.28 X PX P⎛ ⎞ −− − << = < < = ⎜ ⎟ ⎝ ⎠

Almost all medical schools in the United States require students to take the Medical College Admission Test (MCAT). To estimate the mean score μ of those who took the MCAT on your campus, you will obtain the scores of an SRS of students. The scores follow a Normal distribution, and from published information you know that the standard deviation is 6.4. Suppose that (unknown to you) the mean score of those taking the MCAT on your campus is 25.0. a. If you choose one student at random, what is the probability that the student's score is between 22 and 28?

Define X to be the single student's score. Given that µ = 25 and the standard deviation is σ = 6.4, 22 25 25 28 25 (22 28) ( 0.47 0.47) 0.3616

Light vehicles sold in the United States must emit an average of no more than 0.07 grams per mile (g/mi) of nitrogen oxides (NOX). NOX emissions for one car model vary Normally with mean 0.05 g/mi and standard deviation 0.01 g/mi. a. What is the probability that a single car of this model emits more than 0.07 g/mi of NOX?

Define � to be the NOX emission for the single car. Given that � = �. �� g/mi and � = �. �� g/mi, P(X > 0.07) = P X − 0.05 0.01 > 0.07 − 0.05 0.01 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ = P(Z > 2) = 1− 0.9772 = 0.0228

A trial begins under the assumption that Bill did not commit the murder. A "Type I Error" would be

Determining that Bill was guilty of murder; when, in fact, he was not.

Discrete probability models

Discrete probability models have countable outcomes. These outcomes either assume fixed values or are Natural numbers {0, 1, 2, ...}.

can Random variables can be discrete or continuous or both

Discrete random variables have a finite list of possible outcomes. Continuous random variables can take on any value in an interval, with probabilities given as areas under a curve.

Random sample:

Do we have a random sample? If not, is the sample representative of the population? If not, was it a randomized experiment?

Random sample

Do we have a random sample? Was it a randomized experiment? If neither, is the sample representative of the population?

The One-sample t Test of Significance

Draw a SRS of size n from a large population that has the Normal distribution with mean μ and unknown standard deviation σ. The one-sample t statistic has the t distribution with n - 1 degrees of freedom. To test the hypothesis , compute the one-sample t statistic The p-value for a test of H0 against is is . is . These P-values are exact if the population distribution is Normal and are approximately correct for large n in other cases.

The One-sample t Confidence Interval

Draw an SRS of size n from a large population having unknown mean μ. A level C confidence interval for μ is where t* is the critical value for the t(n - 1) density curve with area C between ‒ t* and t*. This interval is exact when the population distribution is Normal and is approximately correct for large n in other cases. Draw an SRS of size n from a large population having unknown mean μ. A level C confidence interval for μ is where t* is the critical value for the t(n - 1) density curve with area C between ‒ t* and t*. This interval is exact when the population distribution is Normal and is approximately correct for large n in other cases. The one-sample t confidence interval is used to estimate means. Its form is similar to previous forms of confidence intervals: estimate ± margin of error Introduction of Confidence Intervals (1.96 or another z-score) General Form (when σ is known) Now (σ is unknown).

The Law of Large Numbers

Draw observations at random from any population with finite mean μ. As the number of observations drawn increases, the mean of the observed values gets closer and closer to the mean μ of the population. Check out Example 15.3 on page 347 of the book for a nice example and graphic of the law of large numbers.

If the P-value is small enough

If the P-value is small enough, the data we observed would be unusual (very unlikely to have happened) if the null hypothesis was true.

The Reasoning of Statistical Estimation

Even though we find the simple conditions a bit questionable, they help us understand the reasoning of statistical estimation. Please read Example 16.1 on page 374 of the text, because it goes through this reasoning well. Example: Suppose IQ scores for adults follow a Normal distribution with a standard deviation of 15. A random sample of 56 adults yields an average score of 100. Recall that we do not know the true mean IQ score. 1. To estimate the unknown population mean IQ score μ, we can start with the sample mean = 100 from the random sample. 2. Based on what we learned in Chapter 15, we know the sampling distribution of the sample mean. The distribution of the sample mean IQ score for an SRS of 56 adults is Normally distributed with mean & standard deviation =

Venn Diagrams and Probabilities

Example: In a sample of 1000 people, 88.7% of them were right-hand dominant, 47.5% of them were female, and 42.5% of them were female and right-hand dominant. Draw a Venn diagram for this situation.

True or False: We know with certainty that the average amount of coffee consumed daily is different from our hypothesized value.

FALSE. It may be likely given our sample size, but not with certainty. There is variability in sampling. It is possible that the next random sample of 48 could yield a different average.

Phenomena that are statistically significant are always practically significant.

False

True or False. All independent events are disjoint.

False

Complement Rule

For any event A, P(A does not occur) = 1 - P(A).

The anticipated positive impact a vaccine has had on a dozen subjects that were suffering from an ailment was measured by each patient's strength. Strength was measured before-and-after vaccine administration. The variable: Difference = Strength After - Strength Before was created. The appropriate alternative hypothesis for the population mean difference in patient strength is:

Ha: u > 0

Example: 90% CI for Hb levels

Hemoglobin (Hb) levels are normally distributed, and should neither be too large, nor too small. A random sample of 11 boys from an underserved country had an average hemoglobin level of 11.3 g/dl with a standard deviation of 1.5. Compute a 90% confidence interval for the average hemoglobin level for boys from this particular country.

If the P-value is not small enough

If the P-value is not small enough, the data we observed are not strange at all (could plausibly have happened due to sampling variability) if the null hypothesis was true.

Statistical Estimation

If we want to estimate the population mean μ, we use the sample mean . The sample mean is a random variable. We learned in Chapter 12 that each random variable has a probability model which tells us the values the random variable can take and the probability that it takes on these values. Here's an interesting thought: the probability that the sample mean is exactly equal to the population mean μ is 0. However, we expect the sample mean to be somewhere near the population mean. is an unbiased estimator of μ—we don't expect it to over- or underestimate the population mean. And, as we sample more and more individuals from the population, we expect the sample mean to get closer to the population mean—this is a law of large numbers...

assess the evidence

If, on the other hand, we want to assess the evidence provided by data about some claim concerning a population parameter, we need to conduct a hypothesis test.

Stating Hypotheses

Important note: Base your alternative hypothesis on your question of interest—do not base it on the data.

Sample Size for Confidence Intervals

In Chapter 16, we noticed that the sample size impacts the margin of error (and thus the width of the confidence interval). The margin of error is expressed as where Some algebra leads us to a formula for the minimum sample size for a particular margin of error: The z confidence interval for the mean of a Normal population will have a specified margin of error m when the sample size is

Sales Effectiveness

In a sales effectiveness seminar a group of sales representatives tried two approaches to selling a customer a new automobile: the aggressive approach and the passive approach. Using the complement rule, P(not Aggressive) = 1- .54 = .46 = P(Passive)

Example: Carry-on Luggage

In the carry-on luggage example from earlier, a random sample of 500,000 passengers yielded a standard deviation for the sample mean that was extremely small; resulting in |z| ≈ 9.43. Assuming now that we use a margin of error of 0.2 lbs. with 95% confidence (z=1.96), which do you think is true? a) More than 500,000 people must be sampled to stay within our constraints. b) Fewer than 500,000 people must be sampled to stay within our constraints. In the carry-on luggage example from earlier, a random sample of 500,000 passengers yielded a standard deviation for the sample mean that was extremely small; resulting in |z| ≈ 9.43. Assuming now that we use a margin of error of 0.2 lbs. with 95% confidence (z=1.96), which do you think is true? a) More than 500,000 people must be sampled to stay within our constraints. b) Fewer than 500,000 people must be sampled to stay within our constraints.

Large enough sample:

Is our sample size n at least 40? Are the observations from a population that has a Normal distribution, or one where we can apply principles from a Normal distribution? Be sure to look at the shape of the distribution and see whether any outliers are present.

Population : sample ratio:

Is the population of interest at least 20 times larger than the sample?

Inference when σ is unknown

It is unlikely that the population standard deviation σ will be known and the population mean μ will not be known. Chapter 16 taught us that is the best point estimate of µ. Similarly, s can estimate σ. In Chapter 15 when σ was known we used This statistic follows a Standard Normal Z-distribution When σ is not known we can use s instead: This statistic is not quite Normal. It follows a t-distribution.

Example: Koalas

Koalas are considered to be "cuddly" creatures that weigh between 15 and 30 pounds. Suppose it is known that koalas have an average weight of 20.75 lbs. with a standard deviation of 3.05 lbs. Note: Each of the following exercises is based upon a random sample of 45 koalas.

Suppose that for a right-tailed test of significance for a population mean with unknown standard deviation the sample mean exceeds the hypothesized mean and the sample standard deviation remains fixed. What happens to the test statistic and p-values as the sample size increases?

Larger test statistic; smaller p-value.

In a survey of workers who took an introductory statistics course in college, 70% of workers reported that they use knowledge from the course in their careers, 50% wish that they had taken more statistics courses in college, and 30% use knowledge from the course in their careers and wish that they had taken more statistics courses in college.

Let U be the event that the worker uses knowledge from the course and M be the event that the worker wishes he had taken more classes. P(U or M) = P(U) + P(M) - P(U and M) = 0.70 + 0.50 - 0.30 = 0.90

Critical Values

Let's look at Table A for areas corresponding to +2.00 and -2.00. P(Z < 2.00) =.9772 P(Z< - 2.00) = .0228 The difference is: 95.44% Let's look at Table A for areas corresponding to +2.00 and -2.00. P(Z < 2.00) =.9772 P(Z< - 2.00) = .0228 The difference is: 95.44% Let's look at Table A for areas corresponding to +1.96 and -1.96. P(Z < 1.96) = .9750 P(Z< - 1.96) = .0250 The difference is: 95% So, we'll adopt 1.96 vs. 2.00. In general, for a level C confidence interval, we need to find the value z* such that we have area C between ‒z* and z*. These are the critical values of the standard Normal distribution. The following table gives the critical values for the most common confidence levels:

Example: Hemoglobin levels (n= 11, 20, and 33)

Let's now use technology to conduct the test of significance at α = .05 for the three different sample sizes (n =11, 20, and 33). We will particularly focus on the test statistic, p-value, and decision at α = .05.

Example: Coffee Consumption

Let's use one of the values from a) to compose the null and alternative hypotheses. Assuming a value of _20_. H0: μ= _20_ vs. Ha: μ _20__ Suppose it is known that the standard deviation for daily coffee consumption is 9.2 oz. The average amount of coffee consumed daily for a random sample of 48 people is 26.31 oz. The standard deviation of daily coffee consumption is 9.2 oz. A random sample of 48 people consumed an average of 26.31 oz. of coffee daily. Is this evidence that the average amount of coffee consumed daily is different from our original estimate? Recall the coffee consumption example from last chapter with standard deviation of 9.2 oz. A random sample of 48 people drank an average of 26.31 oz. of coffee daily. A significance test of the mean being different from our original estimate is conducted. Provide examples of α, β, and power. Reject null when it's true We say average coffee consumption differs from 20 oz when in fact it doesn't We fail to reject H0 when the alternative is true We fail to reject that the average coffee consumption is 20 oz when in fact it's not : Probability of rejecting H0 for a specific value of ≠ 20.

Example: Parental Monitoring Software

Many parents elicit the use of various software and passwords to monitor the ways children use their computers. In a survey of a random sample of high school students, 16.7% (with 3.45% margin of error) expressed an ability to circumvent their parent's security efforts. Would you trust a confidence interval based upon this data? Explain. Many parents elicit the use of various software and passwords to monitor the ways children use their computers. In a survey of a random sample of high school students, 16.7% (with 3.45% margin of error) expressed an ability to circumvent their parent's security efforts. Would you trust a confidence interval based upon this data? Explain. The confidence interval would be (.1325, .2015). Yes, it is from a random sample. But, it is likely that some teens (out of fear of their parents finding out) are under-reporting their abilities to circumvent established security efforts. As mentioned in Chapter 8, people tend to provide conservative answers to provocative questions.

Significance from a Table (Method 1—Table C)

Method 1 - Table C: 1. Compare z with the critical values z* and the bottom of Table C. 2. If Z falls between two values of z*, then the P-value falls between the two corresponding values of P in the "One-sided P" or the "Two-sided P" row of Table C.

Are the events "female" and "Republican" disjoint?

No, because a randomly selected registered voter can be both female and Republican

Margin of Error and Confidence Level

Note that the 95% confidence interval we calculated is centered at the sample mean (our estimate) and goes out about 2 standard deviations (margin of error) on either side of the sample mean.'The confidence level is the success rate of the method that produces the interval. We don't know whether the 95% confidence interval from a particular sample is one of the 95% that capture μ or one of the unlucky 5% that miss. Note: Confidence is about the process, not about any one interval.

This hypothesis gives the "benefit of the doubt."

Null Hypothesis

A dentist is interested in the number of adult teeth his patients have and the length of their incisors (mm). Which is true?

Number of adult teeth is discrete, and incisor length is continuous.

Matched Pairs t Procedures

One way to demonstrate that a treatment causes an observed effect is to use a matched pairs experiment. In a matched pairs design subjects are matched in pairs and each treatment is given to one subject in each pair or observations are taken on the same subject before-and-after some treatment. To compare the responses to the two treatments in a matched pairs design, find the difference between the responses within each pair. Then apply the one-sample t procedures to these differences.

Probabilities and Two-Way Tables

Our work with Two-Way Tables is rooted in probability rules. Independence question: Is Belief in Impact due to Global Warming (in terms of increase to December 2011 and January 2012 temperatures) independent of Region?

In the Addition Rule for disjoint events, ____________________.

P(A or B) = P(A) + P(B).

If a randomly selected worker reported that he uses knowledge from the course, what is the probability that he wishes that he had taken more statistics courses in college?

P(M|U) = P(U and M)/P(U) = 0.30/0.70 = 0.4286

Example: IQ Scores

Recall that IQ scores from Chapter 14 followed a Normal Distribution with σ= 15. You suspect that persons from affluent communities have IQ scores above 100. A random sample of 35 residents of an affluent community had an average IQ score of 112. Is there significant evidence to support your claim at the α=.05 level? Plan: Identify the parameter μ = average IQ score for affluent communities List all given information from the data collected. _n=35, _σ= 15, State the null (H0) and alternative (HA) hypotheses. H0: _μ = 100 HA: _μ > 100__ Specify the level of significance. α =.05 Determine the type of test. (Left-tailed, Right-tailed, Two-Tailed) Sketch the region(s) of "extremely unlikely" test statistics. (Rejection Region(s) ) Solve: Check the conditions for the test you plan to use. Random sample? Large population : sample ratio? Large enough sample? Yes Certainly there are more than 20*35 We were informed that affluent households the data came from a Normal Distribution. 4. Conclude: Make a decision about the null hypothesis. P-value = .000001 ≤ .05 = α (Reject H0 or Fail to reject H0). Interpret the decision in the context of the original claim. (i.e.,"There is enough (or not enough) evidence at the α level of significance that ...) There is enough evidence at the α=.05 level to conclude that persons from affluent communities have a mean IQ score above 100.

Example: Adult IQ Scores

Recall that adult IQ scores follow a Normal distribution with a standard deviation of 15. A random sample of 56 adults had an average IQ of 100. Give a 99% confidence interval for the average adult IQ score. Also provide its margin of error. Give a 99% confidence interval for the average adult IQ score. STATE: We would like to know the average IQ score for all adults. PLAN: We will estimate the population average IQ score for all adults by calculating a 99% confidence interval. Give a 99% confidence interval for the average adult IQ score. SOLVE: 1. ✓Do we have an SRS? - We are told that this is a random sample. ✓Is the sample from a Normal population? - We are told that IQ scores follow a Normal distribution. ✓Do we know the population standard deviation? - We are told that the population standard deviation is 15. SOLVE: 2. Because we are using 99% confidence, we will use . Our 99% confidence interval is The margin of error is 5.16. , 105.16) CONCLUDE: We are 99% confident that the mean IQ score for all adults is between 94.84 and 105.16. 1. ✓Do we have an SRS? Yes. ✓Do we know the population standard deviation? Yes. ✓Is the sample from a Normal population? Yes. Recall that earlier we computed a 95% confidence interval for the mean adult IQ score based upon a population standard deviation of 15 and a sample of 56. Suppose that the population standard deviation was the size of our initial value of σ. Is this margin of error the size of the earlier calculation?

Confidence Intervals for a Population Mean

Remember from the 68-95-99.7 rule that the 95 we used above was approximate. We do not need to go a full 2 standard deviations away from the mean to be 95% confident. In fact, we need only go 1.96 standard deviations for 95% confidence. Using the critical values, we can calculate a confidence interval for any confidence level: Draw an SRS of size n from a Normal population having unknown mean μ and known standard deviation σ. A level C confidence interval for μ is

The broader method of calculating confidence intervals involves the Four-Step Process (page 379): "S.P.S.C."

STATE: What is the practical question that requires estimating a parameter? PLAN: Identify the parameter, choose a level of confidence, and select the type of confidence interval that fits your situation. SOLVE: Carry out the work in two phases: a. Check the conditions for the interval you plan to use. b. Calculate the confidence interval. CONCLUDE: Return to the practical question to describe your results in this setting.

This type of error is reflected in the margin of error.

Sampling

Steps for Success- Conducting Tests of Significance

Set up your Hypotheses. Check your Conditions. Compute the Test Statistic. Compute the P-Value. Make a Decision.

Steps for Success - Conducting Tests of Significance

Set up your Hypotheses. Check your Conditions. Compute the Test Statistic. Compute the P-Value. Make a Decision.

Choosing a Confidence Level

Similar to how you'd like to get the best grades possible - usually in the 90's. In statistics, we aspire to be as confident as possible. Confidence levels of 95%, 90%, and 99% are quite common. A confidence level of 68% is not common.

Example: Smoking

Smoking has been a "hot" topic in recent months. Twenty percent of people (in a city with a population in of about 1 million) are smokers. Two people, Megan and Zach, are selected at random. Confirm mathematically and practically, that Megan's decision to smoke is independent of Zach's. Mathematically, P(SM and SZ ) =P(SM )*P(SZ |SM) gen.rule of mult. = P(SM and SZ ) = P(SM and SZ ) =.04 = .20*.20 = P(SM )*P(SZ) (independence) Practically, Zach's decision to smoke is not influenced by a randomly selected person from the same city (Megan, in this case). Therefore, it makes sense that P(SZ |SM) = P(SZ ).

Example: Music relaxation therapy

Some researchers claim that music relaxes students and reduces stress while studying. 12 students were selected at random. Their initial resting pulse rate (beats/minute) was obtained, and each person participated in a month-long music-listening, relaxation therapy program. A final resting pulse rate was taken at the end of the experiment. The data are given below. Is there any evidence that music reduced the mean pulse rate, and consequently, reduced stress? Assume the underlying distributions are normal and use a 0.025 level of significance. Music "reduces" pulse rate implies: Change in Pulse rate = Difference in Pulse Rates = Final Pulse Rate - Initial Pulse Rate All three of these expressions are equivalent.: D= "Diff" < 0 (implies that the pulse rate has "reduced.") Final Pulse Rate - Initial Pulse Rate < 0 Final Pulse Rate < Initial Pulse Rate DiffFinal-Initial= DiffFinal<Initial this is preferred since it directly aligns with the hypothesis presented. Compute the sample statistics for the DiffFinal-Initial check the distribution for our assumptions. Plan: Identify the parameter. μ = mean difference between final and initial pulse rates List all given information from the data collected. n=12, , s = 8.732 State the null (H0) and alternative (Ha) hypotheses. H0: μDiff = 0 HA: μDiff < 0 Specify the level of significance. α = .025 Determine the type of test. Left-tailed Right-tailed Two-Tailed Sketch the region(s) of "extremely unlikely" test statistics. Technology output for the hypothesis test and confidence interval:

Steps for Success-Finding Normal Probabilities

State the Problem. Draw a Picture. Compute Z. Use Table A. Answer the question.

Tests for a Population Mean: 4-Step Process

State: What is the practical question that requires a statistical test? Plan: Identify the parameter List all given information from the data collected. State the null (H0) and alternative (HA) hypotheses. Specify the level of significance. Determine the type of test. (Left-tailed, Right-tailed, Two-Tailed) Sketch the region(s) of "extremely unlikely" test statistics. (Rejection Region(s) ) (Note that the items in bold were mentioned in the previous exercise.) Solve: Check the conditions for the test you plan to use. (Random sample? Large population:sample ratio? Large enough sample?) Calculate the test statistic. Determine (or estimate) the P-Value. Conclude: Make a decision about the about the null hypothesis. (Reject H0 or Fail to reject H0). Interpret the decision in the context of the original claim. (i.e.,"There is enough (or not enough) evidence at the α level of significance that ...)

Statistical inference

Statistical inference provides methods for drawing conclusions about a population from sample data. If we want to estimate the population mean, it is logical to use the sample mean as our basis. But, the sample mean might not be exactly equal to the population mean. So, we'll need some "room for error."

The Sampling Distribution of the Sample Mean

Suppose that is the mean of an SRS of size n drawn from a large population with mean μ and standard deviation σ. Then the sampling distribution of has mean μ and standard deviation What if the population distribution is Normal? In this case, we can work with the Normal distribution like we did in Chapter 3, provided that we use the appropriate standard deviation. Recall that a general way to think about a z-score is: This tells us the number of standard deviations our value is away from the mean. If individual observations have the N(μ, σ) distribution, then probability theory tells us that the sample mean of an SRS of size n has the distribution. So, if we have an SRS of size n from a Normal distribution with mean μ and standard deviation σ, we can standardize our value of interest and use Table A to find probabilities. In this case,

Technology Tips—Computing Normal Probabilities for Sampling Distributions

TI-83/84 2nd >> Vars >> normalcdf( >> Enter Enter (Lower Bound, Upper Bound, Population Mean, ()) Note: 1. You must enter commas between each argument. 2. Use -1E99 if there is no Lower Bound and 1E99 if there is no Upper Bound. 3. Remember that we are working with the sampling distribution of , so the lower and upper bounds will be in terms of sample means. JMP Enter 1 into the top row of Column 1. Right-click on Column 1. Select Formula. Under the Functions (grouped) menu Select Probability à Normal Distribution. Click the ^ (caret) symbol. Click the ^ (caret) symbol a 2nd time. In the fields provided, ENTER the: value of x, mean, and . Click OK. Note: 1. Upper tail probabilities require that "1 -" is entered after selecting Formula in the first step. 2. Computation of area between two values requires: a. that all of the steps (except the last one) are entered for the larger value. b. you enter " - " c. that all of the steps (including the last one) are entered for the smaller value.

Technology Tips—Generating Random Numbers

TI-83/84 Apps Prob Sim (press any key) Toss Coins. Click Toss. Select +1 (this is the Window key) (for your 2nd trial). Repeat selecting +1 (this is the Window key) 8 more times to obtain your first 10 trials. Now select +50 (the Trace key). Click on the Right Arrow key (to reveal the number of Tales). Click on the Right Arrow key again (to reveal the number of Heads). Repeat the last 3 steps to obtain the results for trials 61 - 110. JMP Enter 1 into the top row of Column 1. Right-click on Column 1. Select Formula. Under the Functions (grouped) menu Select Random Random Binomial. In the fields provided, ENTER the: sample size (10 for the first time, then 50) and the probability (.50). Click Apply. This returns the number of heads for the first 10 flips. Repeat the process with Column 2. Use a sample size of 50 instead of 10. Then once more with a sample size of 50. TI-83/84 Apps Prob Sim (press any key) Toss Coins. Click Toss. Select +1 (this is the Window key) (for your 2nd trial). Repeat selecting +1 (this is the Window key) 8 more times to obtain your first 10 trials. Now select +50 (the Trace key). Click on the Right Arrow key (to reveal the number of Tales). Click on the Right Arrow key again (to reveal the number of Heads). Repeat the last 3 steps to obtain the results for trials 61 - 110. JMP Enter 1 into the top row of Column 1. Right-click on Column 1. Select Formula. Under the Functions (grouped) menu Select Random Random Binomial. In the fields provided, ENTER the: sample size (10 for the first time, then 50) and the probability (.50). Click Apply. This returns the number of heads for the first 10 flips. Repeat the process with Column 2. Use a sample size of 50 instead of 10. Then once more with a sample size of 50.

Constructing a 95% confidence interval for the average difference in initial and final pulse rates.

TI-83/84 Enter the differences { -6, 1, 3, -7, -12, -14, 3, -9, -11, -24, -7, 7} in L1. STAT >> TESTS >> TInterval >> Enter Note: Select Data when and are not provided. Then enter the list where the data are stored. Inpt >> DATA List: L1 >> Freq: 1 C-Level : 95 Calculate (ENTER) TI-83/84 Enter the differences { -6, 1, 3, -7, -12, -14, 3, -9, -11, -24, -7, 7} in L1. STAT >> TESTS >> TInterval >> Enter Note: Select Data when and are not provided. Then enter the list where the data are stored. Inpt >> DATA List: L1 >> Freq: 1 C-Level : 95 Calculate (ENTER) ( -11.88, -0.7855) TI-83/84 STAT >> TESTS >> TInterval >> Enter Note: Select Data when and are not provided. Then enter the list where the data are stored. (for this example) Inpt >> STATS : -6.33 >> s: 8.732 >> n : 12 (these should populate automatically) C-Level : 95 Calculate (ENTER) ( -11.88, -0.7855)

Technology Tips - Computing 90% Confidence Intervals ( unknown)

TI-83/84 STAT >> TESTS >> TInterval >> Enter Note: Select Data when and are not provided. Then enter the list where the data are stored. (for this example) Inpt >> STATS : 11.3 >> s: 1.5 >> n : 33 C-Level : 90 Calculate (ENTER) (10.858, 11.742)

Technology Tips - Conducting Tests of Significance (σ unknown)

TI-83/84. STAT TESTS TTest Enter. Select Stats. Enter and n. Select Calculate. (Note: Select Data when and n are not provided. Then enter the list where the data are stored.) (for this example) Inpt >> STATS μ0: 12 >> : 11.3 >> s: 1.5 >> n : 11 >> μ : < Calculate (ENTER) t= - 1.548 p= .076 TI-83/84. STAT TESTS TTest Enter. Select Stats. Enter and n. Select Calculate. (Note: Select Data when and n are not provided. Then enter the list where the data are stored.) (for this example) Inpt >> STATS μ0: 12 >> : 11.3 >> s: 1.5 >> n : 11 >> μ : < Calculate (ENTER) t= - 1.548 p= .076 Fail to reject H0, .076 > .05 => p-value > α . There is not enough evidence (at α = .05) to conclude that boys from this country are typically anemic. TI-83/84. STAT TESTS TTest Enter. Select Stats. Enter and n. Select Calculate. (Note: Select Data when and n are not provided. Then enter the list where the data are stored.) (for this example) Inpt >> STATS μ0: 12 >> : 11.3 >> s: 1.5 >> n : 20 >> μ : < Calculate (ENTER) t= - 2.087 p= .025 TI-83/84. STAT TESTS TTest Enter. Select Stats. Enter and n. Select Calculate. (Note: Select Data when and n are not provided. Then enter the list where the data are stored.) (for this example) Inpt >> STATS μ0: 12 >> : 11.3 >> s: 1.5 >> n : 20 >> μ : < Calculate (ENTER) t= - 2.087 p= .025 Reject H0, .025 < .05 => p-value < α . There is enough evidence (at α = .05) to conclude boys from this country are typically anemic. TI-83/84. STAT TESTS TTest Enter. Select Stats. Enter and n. Select Calculate. (Note: Select Data when and n are not provided. Then enter the list where the data are stored.) (for this example) Inpt >> STATS μ0: 12 >> : 11.3 >> s: 1.5 >> n : 33 >> μ : < Calculate (ENTER) t= - 2.68 p= .006 Reject H0, .006 < .05 => p-value < α . There is enough evidence (at α = .05) to conclude boys from this country are typically anemic.

Technology Tips - Conducting Tests of Significance (σ unknown)

TI-83/84: STAT TESTS TTest Enter. Select Stats. Enter μ0 s, , n, and the confidence level. Select Calculate. (Note: Select Data when and n are not provided. Then enter the list where the data are stored.) JMP: Enter the data. Analyze Distribution."Click-and-Drag" (the appropriate variable) into the 'Y, Columns' box. Click on OK. Click on the red upside-down triangle next to the title of the variable from the 'Y,Columns' box. Proceed to 'Confidence Interval' -> Select the appropriate confidence level.

Randomness and Probability

Technically, physically tossing a coin or rolling a die is predictable. If theoretically, the same force, angles, etc... are applied, the results should remain the same. But, usually persons apply differing amounts to these variables, thus resulting in what appears to be a random process. Using technology simulations (via graphing calculators or JMP) restore the randomness to the process. Technically, physically tossing a coin or rolling a die is predictable. If theoretically, the same force, angles, etc... are applied, the results should remain the same. But, usually persons apply differing amounts to these variables, thus resulting in what appears to be a random process. Using technology simulations (via graphing calculators or JMP) restore the randomness to the process.

Is the alternative hypothesis in our situation one-sided or two-sided?

Testing a "difference" implies that we are interested in whether the mean daily coffee consumption amount is either greater than, or, less than our estimate.. Hence, we are working with a two-sided test.

Tests of Significance & the Justice System

Tests of Significance Justice System Null Hypothesis The defendant gets the "benefit of the doubt" and begins with the "not guilty" assumption. Alternative Hypothesis The defendant is "guilty." Test Statistic Totality of evidence collected. P-value The probability of observing data as extreme as what was collected under the assumption that the defendant is, indeed, "not guilty." When the evidence collected seems 'likely' (based upon the null hypothesis) Decision Jury rules that the defendant is 'not guilty." When, the evidence collected seems 'extremely unlikely' (based upon the null hypothesis) Decision Either we have "bad" data (mistrial, tampering, etc...) -Or- The jury rules that the defendant is 'guilty.' Note: Our jury system assumes innocent until proven guilty. The actual truth of whether the person did indeed commit the crime may never be known.

Tests for a Population Mean

Tests of significance, allow researchers to determine the validity of certain hypotheses based upon P-values. There are various parameters that we can test (proportions, standard deviations, etc...). We will begin with the most common parameter to be tested, the mean; much like how we began our confidence interval discussion by estimating the true mean, µ. Draw an SRS of size n from a large population that has the Normal distribution with mean μ and standard deviation σ. The one-sample z statistic has the z distribution. To test the hypothesis , compute the one-sample z statistic

Example: Textbooks

Textbook editors, must estimate the sales of new (first-edition) books. The records of one major publishing company indicate that 10% of all new books sell more than projected, 30% sell close to projected, and 60% sell less than projected. Of those that sell more than projected, 70% are revised for a second edition; of those that sell close to projected, 50% are revised for a 2nd edition; of those that sell less than projected, 20% are revised for a 2nd edition. What percent of books are revised for a second edition? Let M= textbooks that sell More than projected. Let C = textbooks that sell Close to projections. Let L= textbooks that sell Less than projected.

Conditions for Inference about a Mean

The conditions for inference about a mean are listed on page 456 of the text. Random sample: Do we have a random sample? If not, is the sample representative of the population? If not a representative sample, was it a randomized experiment? Large enough population : sample ratio: Is the population of interest ≥ 20 times 'n'? The population is from a Normal Distribution. If the population is not from a Normal Distribution, then the sample size must be "large enough" with a shape similar to the Normal Distribution; then we apply the Central Limit Theorem.

Accuracy and Precision of Confidence Intervals

The confidence interval for the population mean, μ, with known standard deviation is of the form with margin of error equaling . A high confidence level indicates that our methods commonly yield correct answers. Smaller margins of error reflect increased precision in estimating the true value of the parameter.

Significance from a Table

The graphing calculator and JMP provide the most accurate P-value calculations. Tables can also be used to estimate P-values. There are two methods of determining the P-Value for a z-statistic.

Sampling Distributions

The law of large numbers says, "sample enough individuals and the statistic will approach the unknown parameter μ." Typically we take just one sample and then generalize to the population as a whole. Before we do that, we need to understand how behaves. This can be done through simulation (see Example 15.4 on page 347). The population distribution is about the individuals in the population, while the sampling distribution is about the values of the statistic calculated from the samples.

Cautions about Confidence Intervals

The margin of error covers only sampling errors. Undercoverage, nonresponse, or other biases are not reflected in margins of error. The source of the data is of utmost importance. Consider the details of a study before completely trusting a confidence interval.

What is the probability that the mean weekly income for the sample of farm workers is between $470 and $540?

The sample mean is approximately Normal with mean $500 (N.Z.) and standard deviation 160 80 . z = x − µ σ n = 470 − 500 160 80 = −1.68 and z = 540 − 500 160 80 = 2.24 The area to the left of z = -1.68 is 0.0465, and the area to the left of z = 2.24 is 0.9875. The probability that the mean weekly income for the sample of farm workers is between $470 and $540 is 0.9875 - 0.0465 = 0.9410.

sample space

The sample space S of a random phenomenon is the set of all possible outcomes.

You sample 25 students. What is the sampling distribution of their average score, �?

The sampling distribution is Normal with mean µ = 25 and standard deviation σ n = 6.4 25 = 1.28 .

sampling distribution

The sampling distribution of a statistics is the distribution of values taken by the statistic in all possible samples of the same size from the same population.

A survey asked "About how many hours per week do you spend sending and answering e-mail?" The time spent sending and answering e-mail for a random sample of 450 American adults had mean 6.03 hours and standard deviation 9.13 hours. Find a 99% confidence interval for the population mean number of hours per week American adults spend sending and answering e-mail. Assume that the simple conditions apply and interpret your interval in context.

The simple conditions apply, but here they are again to be clear. ü Random sample: We are told that this is a random sample. ü Small enough sample: There are more than 20(450) = 9000 American adults. ü Large enough sample: Because n = 450 > 40, we have a large enough sample. df = 450 - 1 = 449. Since df = 449 is not listed in Table C, we will use df = 100. Because we want 99% confidence, t 100 * = 2.626 . x ± t ∗ s n = 6.03± 2.626 9.13 450 = 6.03±1.13 = 4.90 to 7.16 We are 99% confident that the population mean number of hours per week American adults spend sending and answering e-mail is between 4.90 to 7.16 hours.

Why does the probability of the event that we are interested in decrease as the sample size increases?

The standard deviations for the sample mean decrease, and we're further in the tail of the distributions. It is becoming decreasingly likely to find a mean height above 67" as the sample size increases.

Which situation involving two independent samples of data elicits the use of the t-distribution?

The sum of the sample sizes is at least 40 and there is clear evidence of skewness and outliers.

Sample size is at least 40

The t procedures can be used even for clearly skewed distributions.

Sample size between 15 and 40

The t procedures can be used except in the presences of outliers or strong skewness.

Sample size less than 15

The t procedures can be used if the data close to Normal (roughly symmetric, single peak, no outliers)? If there is clear skewness or outliers then, do not use t.

Example: P(Z< -1.45) via Table C

The z-statistic for a left-tailed test is z= -1.45. How significant is this result? Compute P( Z < -1.45) from Table A. P(Z < -1.45) = .0735. Notice that this coincides with the answer from Table C. Table C will allow us to work with distributions where the population standard deviation is unknown (this will be covered in Chapter 20).

Jill attends 20% of the philanthropic activities for her sorority. Dinah attends about 15% of them. Jill's behavior is independent of Dinah's (and vice-versa). Both women are in the same sorority. We can conclude

There is a 3% chance that both will attend an upcoming philanthropic activity. The chance that Jill attends given Dinah is attending is = 0.20. P(Dinah|Jill) = 0.15.

Margin of Error

This "room for error/margin of error" is a function of the standard deviation (σ). But, it must also reflect our knowledge of probability theory (let's say a constant c), and the sample size (n). The "margin of error" for the population mean will contain "c", σ, and n.

Which of these is not a type of test?

Three-tailed

Tree Diagrams

Tree diagrams can be helpful when we have several stages of a probability model. The graph begins with line segments (branches) that correspond to probabilities for specific disjoint events. Subsequent sets of branches represent probabilities at each stage conditional on the outcomes of earlier stages. Take a look at Example 13.10 on page 314 of the text to see how to work with tree diagrams.

A confidence level is essentially the success rate of the method that produces the confidence interval.

True

Matched pairs is based upon dependent samples.

True

Suppose a random samples of ticket prices for concerts by the Rolling Stones was obtained. For comparison purposes another random sample of Coldplay ticket prices was obtained. True or False: The two groups of data are independent samples.

True

When the p-value is less than alpha, we should reject the null hypothesis.

True

Independence

Two events A and B are independent if knowing that one occurs does not change the probability that the other occurs. Thus, if A and B are independent, P(A and B) = P(A)*P(B)

Definition: Independence

Two events A and B that both have positive probability are independent if The fact that A has occurred does not impact B's probability of occurrence.

Addition Rule for disjoint events.

Two events are disjoint if they have no events in common. In these cases P(A or B)= P(A) + P(B).

The General Addition Rule

Two-Way tables are helpful ways to picture two events. Venn diagrams are an alternative means of displaying multiple events. Both can be used to answer many questions involving probabilities. We just used the general addition rule: For any two events A and B, P(A or B) = P(A) + P(B) - P(A and B). Question: Where did we see this concept previously?

When considering the Justice System example associated with significance testing if the jury renders a "Not Guilty" verdict when, in-fact, the defendant was actually not innocent, then this would be a_________.

Type II Error

Which of the following is not a simple condition associated with statistical inference?

Verifying that the population mean, u is known.

Sample Size affects Statistical Significance

Very large samples can yield small p-values that lead to rejection of the null hypothesis. Phenomena that are "statistically significant" are not always "practically significant."

Which categories of volunteers are disjoint?

Volunteers that are Former smokers and the Current smokers.

The Reasoning of Tests of Significance

We are now inquiring about a behavior of an event if a phenomena was repeated numerous times. We will begin by working with simple random samples of data from Normal populations with known standard deviations. Situation: People drink coffee for a variety of professional, and now, social reasons. Coffee used to merely be a beverage option on the menu. Now, it is the main attraction for a growing number of restaurants and shoppes. The standard "cup of coffee" is 8 oz. However, even a Tall at Starbuck's is 12 oz. Please answer the following questions: How many ounces of coffee do you think regular coffee drinkers consume daily? __ How many ounces of coffee do you drink daily? __

We call a phenomenon random if

We call a phenomenon random if individual outcomes are uncertain but there is nonetheless a regular distribution of outcomes in a large number of repetitions.

Key Terminology—Randomness and Probability

We call a phenomenon random if individual outcomes are uncertain but there is nonetheless a regular distribution of outcomes in a large number of repetitions. The probability of any outcome of a random phenomenon is the proportion of times the outcome would occur in a very long series of repetitions.

A random sample of 50 records yields a 95% confidence interval of 21.5 to 23.0 years for the mean age at first marriage of women in a certain county. Which of the following is correct?

We can be 95% confident that the population mean age at first marriage of women in the county is between 21.5 and 23.0 years.

The Idea of Probability

We can trust random samples and randomized comparative experiments because of chance behavior—chance behavior is unpredictable in the short run but has a regular and predictable pattern in the long run. We can trust random samples and randomized comparative experiments because of chance behavior—chance behavior is unpredictable in the short run but has a regular and predictable pattern in the long run.

Null and Alternative Hypotheses

We have two possible hypotheses about this situation: 1. The true mean amount of coffee consumed daily is the value listed in a). 2. The true mean amount of coffee consumed daily differs from the value listed in a). These hypotheses have names: The null hypothesis (denoted H0) is the claim tested about the population parameter. The test is designed to assess the strength of the evidence against the null hypothesis. Usually the null hypothesis is a statement of "no effect" or "no difference." It commonly assumes the "benefit of the doubt." The alternative hypothesis (denoted Ha) is the claim about the population parameter that we are trying to find evidence for.

. Suppose you decide to bet on red on each of 10 consecutive spins of the roulette wheel. Suppose you lose all 5 of the first wagers. Which of the following is true?

What happened on the first 5 spins tells us nothing about what will happen on the next 5 spins

The Addition Rule for Disjoint Events

What if there is no overlap of the events A and B? Events A and B are disjoint if they have no outcomes in common. Question: What is P(A or B) when A and B are disjoint? P(A and B) = 0 (no outcomes in common) P(A or B) = P(A) + P(B) - P(A and B) (General Addition Rule) = P(A) + P(B) - 0 (Disjoint) P(A or B) = P(A) + P(B)

The Central Limit Theorem

What if we do not know that the population distribution is Normal? We're in luck. To quote the authors, "it is a remarkable fact that as the sample size increases, the distribution of changes shape: it looks less like that of the population and more like a Normal distribution." So... Draw an SRS of size n from any population with mean μ and finite standard deviation σ. The Central Limit Theorem (CLT) says that when n is large, the sampling distribution of the sample mean is approximately Normal: is approximately The central limit theorem allows us to use Normal probability calculations to answer questions about sample means from many observations even when the population distribution is not Normal.

Predicting Chance Behavior

What should happen to the long run proportion of heads as we take more flips? Probability theory informs us that the sample proportion of heads should 'converge' to the population proportion (50%). What should happen to the long run proportion of heads as we take more flips? Probability theory informs us that the sample proportion of heads should 'converge' to the population proportion (50%).

Cautions about Significance Tests

When H0 represents an assumption that is widely believed, small p-values are needed. Be careful though about conducting multiple analyses for a fixed α. It is preferred to just run a single test and reach a decision. When there are strong consequences of rejecting H0 in favor of HA, we need strong evidence. Either way, strong evidence of rejecting H0 requires small p-values. Depending on the situation, p-values that are below 10% can lead to rejecting H0. Unless stated otherwise, researchers assume the de-facto significance level of 5%. The P-Value for a one-sided test is half of the P-Value for the two-sided test of the same null hypothesis and of the same data. The two-sided case combines two equal areas. The one-sided case has one of those areas PLUS the inherent supposition by the researcher of the direction of the possible deviation from H0. Be advised that it is better to design a single study and conduct one test of significance - (yielding one conclusion) than to design one study, and perform multiple analyses until a desired result is achieved.

Conditional Probability

When P(A) > 0, the conditional probability of B given A is Note: Wording for conditional probabilities can often be subtle, so be sure to read carefully. Occasionally you will come across problems that embed conditional probabilities into the question. Consider the following example: Example: Calculate the probability that a randomly selected female is right-hand dominant. = = = 0.8947 The | means "given." The event behind the | is the conditioning event. The idea of a conditional probability P(B|A) of one event B given that another event A occurs is the proportion of all occurrences of A for which B also occurs.

Standard Error

When the standard deviation of a statistic is estimated from data, the result is called the standard error of the statistic. The standard error of the sample mean is . Now the sample standard deviation will replace σ. Now the sample standard deviation will replace s; allowing us to use the one-sample t statistic for confidence intervals and tests of significance. As mentioned earlier, the t-distribution is "not quite Normal."

A company has 36 cars of this model in its fleet. What is the probability that the average NOX level of these cars is above 0.055 g/mi?

With n = 36, the sampling distribution of the average is now Normal with mean µ = 0.05 g/mi and standard deviation σ n = 0.01 36 = 0.00167 . So, P(X > 0.055) = P X − 0.05 0.00167 > 0.055 − 0.05 0.00167 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ = P(Z > 2.99) = 1− 0.9986 = 0.0014

Now suppose a 95% confidence interval was desired for u Identify a value of n that would produce a narrower interval.

__ n=100_(or any n> 52)__

sample space S

a sample space S and a way of assigning probabilities to events.

averages

averages are less variable than individual observations.

As the sample size increases, the confidence interval ___________ .

becomes narrower.

When determining the minimum sample size for a specified margin of error, ___________ the standard deviation will increase the sample size

increasing

s

sample standard deviation

Each student in a class logs the amount of time they spend on Facebook daily for one week. The students compute their sample averages. Their Teaching Assistant (TA) records the weekly averages in one column of JMP. The TA also records all of the daily times for all students in another column. The weekly averages will have a range that is:

smaller than that of the daily times.

If we want to estimate a population parameter, then we should use

statistics to create a confidence interval.

strong evidence

strong evidence of rejecting H0 requires small p-values.

matched pairs design

subjects are matched in pairs and each treatment is given to one subject in each pair or observations are taken on the same subject before-and-after some treatment.

Suppose that for a particular family of T-distributions that t2 has 2 degrees of freedom (df) and t9 has 9 degrees of freedom. What would you anticipate happening to tdf as the df increase?

tdf will approach Z

critical values decrease

the degrees of freedom (df) increase.

Effect size

the departure from a null hypothesis that results in practical significance.

Type I Error

the maximum allowable "error" of a falsely rejected H0 (also the significance level, ).

Type II Error

the probability of not rejecting H0, when it should have been rejected.

Power

the probability that the test will reject H0 when the alternative value of the parameter is true. Note: Increasing the sample size increases the power of a significance test.

Confidence intervals based upon "t" will be slightly

wider than those based upon "z."

For the population of farm workers in New Zealand, suppose that weekly income has a distribution that is skewed to the right with mean $500 (N.Z.) and standard deviation $160 (N.Z.). A random sample of 80 farm workers is selected. What information in the statement of the question allows us to use a Normal distribution to do calculations about the mean weekly income for farm workers in New Zealand?

ü Random sample? We are told that we have a random sample. ü Small enough? The population of farm workers is more than (20)(80) = 1600. ü Large enough? n = 80 is large enough (more than 40) The Central Limit Theorem will apply.

The General Social Survey (GSS) is a survey of opinions and lifestyles of U.S. adults, conducted by the National Opinion Research Center at the University of Chicago. The samples that are used in the GSS are not random, but they are representative of the populations they seek to represent. Based on a sample of 123 U.S. adults aged 18-22, the mean amount of time spent on the Internet in an average week was 8.20 hours, with standard deviation 9.84 hours. Find a 90% confidence interval for the population mean number of hours per week spent on the Internet by U.S. adults aged 18-22. Make sure to state and check all conditions and interpret your interval in context.

üRandom sample? We are told that this is a not random sample, but we are told that the sample is representative of the population. üSmall enough sample? There are more than 20(123) = 2460 U.S. adults. üLarge enough sample? Since n = 123 > 40, we have a large enough sample and the tprocedures should be okay. df = 123 - 1 = 122. Since df = 122 is not listed in Table T, we will use df = 100. Since we want 90% confidence, * 100 t =1.660 . 9.84 8.20 1.660 8.20 1.47 6.73 to 9.67 123 s x t n ∗ ± =± =±= We are 90% confident that the population mean number of hours spent on the Internet by U.S. adults is between 6.73 to 9.67 hours.


Set pelajaran terkait

ATI comprehensive predictor STUDY THIS ONE

View Set

Psychology: Module 8.3 Important Points

View Set

Principles of Nutrition Exam Chapters 7-11

View Set