All Stats Homework
Every fifth adult entering an airport is checked for extra security screening. What sampling technique is used?
Systematic
What is at the "heart" of hypothesis testing in statistics?
Make an assumption about reality, and collect sample evidence to determine whether it contradicts the assumption.
How is point estimate calculated?
(upper bound + lower bound)/2
What is the formula for estimating p₁-p₂ if prior estimates of p₁ and p₂ are available?
n = [p-hat₁(1 - p-hat₁)+p-hat₂(1 - p-hat₂)] * [(z sub-α/2)/E]²
A sample consists of every 35th worker from a group of 3000 workers. What sampling technique was used?
Systematic
Compute the critical value z v α/2 that corresponds to a 85% level of confidence.
1-0.85 = 0.15 0.15/2 = 0.075 0.075+0.85 = 0.925 invNorm (0.925, 0, 1) = 1.44
What is a completely randomized design?
Each experimental unit is randomly assigned to a treatment.
Explain what each point on the least-squares regression line represents.
Each point on the least-squares regression line represents the predicted y-value at the corresponding value of x.
A binomial experiment is performed a fixed number of times. What is each repetition of the experiment called?
Each repetition of the experiment is called a trial.
An investment counselor calls with a hot stock tip. He believes that if the economy remains strong, the investment will result in a profit of $60,000. If the economy grows at a moderate pace, the investment will result in a profit of $20,000. However, if the economy goes into recession, the investment will result in a loss of $60,000. You contact an economist who believes there is a 30% probability the economy will remain strong, a 60% probability the economy will grow at a moderate pace, and a 10% probability the economy will slip into recession. What is the expected profit from this investment?
(60,000 * 0.30) + (20,000 * 0.60) + (-60,000 * 0.10) = $24,000
Fill in the blanks to complete the sentences below. (a) As the number of samples increases, the proportion of 95% confidence intervals that include the population proportion approaches ______. (b) If a 95% confidence interval results in a sample proportion that does not include the population proportion, then the sample proportion is more than ______ standard errors from the population proportion.
(a) 0.95 (b) 1.96
Twenty years ago, 56% of parents of children in high school felt it was a serious problem that high school students were not being taught enough math and science. A recent survey found that 242 of 700 parents of children in high school felt it was a serious problem that high school students were not being taught enough math and science. Do parents feel differently today than they did twenty years ago? Use the α=0.05 level of significance. (a) Because np₀(1−p₀)=_______ _____ 10, the sample size is _______ 5% of the population size, and the sample _______, the requirements for testing the hypothesis _______ satisfied. (b) Find the test statistic. (c) Find the P-value. (d) Determine the conclusion for this hypothesis test.
(a) 172.5, >, less than, can be reasonably assumed to be random, are (b) STAT-->TESTS-->1-PropZTest (0.56, 252, 700, <p₀, calculate) z₀ = -11.42 (c) STAT-->TESTS-->1-PropZTest (0.56, 252, 700, <p₀, calculate) p = 0.000 (d) Since P-value<α, reject the null hypothesis and conclude that there is sufficient evidence that parents feel differently today.
Fill in the blanks to complete the following statements. (a) For the shape of the distribution of the sample proportion to be approximately normal, it is required that np(1−p)≥______. (b) Suppose the proportion of a population that has a certain characteristic is 0.3. The mean of the sampling distribution of ^p from this population is μ^p=______.
(a) 10 (b) 0.3
Complete parts (a) through (d) for the sampling distribution of the sample mean shown in the accompanying graph. (a) What is the value of μ v overbar x? (b) What is the value of σ v overbar x? (c) If the sample size is n=16, what is likely true about the shape of the population? (d) If the sample size is n=16, what is the standard deviation of the population from which the sample was drawn? The standard deviation of the population from which the sample was drawn is 4040.
(a) 400 (b) 10 (c) The shape of the population is approximately normal. (d) (√16) * 10 = 40
A survey of 2276 adults in a certain large country aged 18 and older conducted by a reputable polling organization found that 401 have donated blood in the past two years. (a) Obtain a point estimate for the population proportion of adults in the country aged 18 and older who have donated blood in the past two years. (b) Verify that the requirements for constructing a confidence interval about p are satisfied. (c) Construct and interpret a 90% confidence interval for the population proportion of adults in the country who have donated blood in the past two years.
(a) 401/2276 = 0.176 (b) n = 2276, x = 401, ^p = 0.176 n^p(1-^p) = 2276*0.176*(1-0.176) = 330.349 The sample can be assumed to be a simple random sample, the value of n^p(1−^p) is 330.349, which is greater than or equal to 10, and the sample size can be assumed to be less than or equal to 5% of the population size. (c) STAT-->TEST-->1-PropZInt (401, 2276, 0.90) W e are 90%confident the proportion of adults in the country aged 18 and older who have donated blood in the past two years is between 0.163 and 0.189.
The horizontal axis in the sampling distribution of ^p represents all possible sample proportions from a simple random sample of size n. (a) What percent of sample proportions results in a 95% confidence interval that includes the population proportion? (b) What percent of sample proportions results in a 95% confidence interval that does not include the population proportion?
(a) 95% (b) 5%
A doctor wants to estimate the mean HDL cholesterol of all 20- to 29-year-old females. (a) How many subjects are needed to estimate the mean HDL cholesterol within 4 points with 99% confidence assuming s=16.1 based on earlier studies? (b) Suppose the doctor would be content with 90% confidence. How does the decrease in confidence affect the sample size required?
(a) 99% critical value = 2.575 n = ((2.575*16.1)/4)² = 108 (b) 90% critical value = 1.645 n = ((1.645*16.1/4)² = 44 Decreasing the confidence level decreases the sample size needed.
A random sample of n1=135 individuals results in x1=40 successes. An independent sample of n2=140 individuals results in x2=60 successes. Does this represent sufficient evidence to conclude that p1<p2 at the α=0.10 level of significance? (a) What type of test should be used? (b) Determine the null and alternative hypotheses. (c) Use technology to calculate the P-value. (d) Draw a conclusion based on the hypothesis test.
(a) A hypothesis test regarding the difference between two population proportions from independent samples. (b) H₀: p₁=p₂ and H₁: p₁<p₂ (c) STATS-->TEST-->2-PropZTest (x1=40, n1=135, x2=60, n2=140, <p2) p=0.011 (c) There is sufficient evidence to reject the null hypothesis because the P-value<α.
The data table represents the measure of a variable before and after a treatment. Does the sample evidence suggest that the treatment is effective in decreasing the value of the response variable? Use the α=0.10 level of significance. (a) What type of test should be used? (b) Determine the null and alternative hypotheses. Let μd=μx−μy. (c) Use technology to calculate the P-value. (d) Draw a conclusion based on the hypothesis test.
(a) A hypothesis test regarding the difference of two means using a matched-pairs design. (b) H₀: µd=0 and H₁: µd>0 (c) STAT-->EDIT-->Xi in L1, Yi in L2, (L1-L2) in L3. STAT-->TESTS-->T-Test (Inpt=Data, µ₀=0, List=L3, Freq=1, >µ₀) p = 0.190 (d) There is not sufficient evidence to reject the null hypothesis because the P-value>α.
A student wants to determine if there is a difference in the pricing between two stores for health and beauty supplies. She recorded prices from both stores for each of 10 different products. Assuming that the conditions for conducting the test are satisfied, determine if there is a price difference between the two stores. Use the α=0.005 level of significance. (a) What type of test should be used? (b) Determine the null and alternative hypotheses. (c) Use technology to calculate the P-value. (d) Draw a conclusion based on the hypothesis test.
(a) A hypothesis test regarding the difference of two means using a matched-pairs design. (b) H₀: µd=0 and H₁: µd≠0 (c) STAT-->EDIT-->Store 1 in L1, Store 2 in L2, (L1-L2) in L3 STAT-->TESTS-->T-Test (Inpt=Data, µ₀=0, List=L3, Freq=1, ≠µ₀) p=0.331 (d) There is not sufficient evidence to reject the null hypothesis because the P-value>α.
Suppose a simple random sample of size n=75 is obtained from a population whose size is N=25,000 and whose population proportion with a specified characteristic is p=0.4. (a) Describe the sampling distribution of ^p. (b) Determine the mean of the sampling distribution of ^p. (c) Determine the standard deviation of the sampling distribution of ^p. (d) What is the probability of obtaining x=33 or more individuals with the characteristic? That is, what is P(^p≥0.44)? (e) What is the probability of obtaining x=21 or fewer individuals with the characteristic? That is, what is P(^p≤0.28)?
(a) Approximately normal because n≤0.05N and np(1-p)≥10. (b) 0.4 (same as p) (c) σ^p = √(0.4(1-0.4)/(75)) = 0.056569 (d) normalcdf (0.44, 9999999, 0.4, 0.056569) = 0.2398 (e) normalcdf (-9999999, 0.28, 0.4, 0.56569) = 0.0169
Describe the sampling distribution of ^p. Assume the size of the population is 25,000. n=700, p=0.6 (a) Choose the phrase that best describes the shape of the sampling distribution of ^p below. (b) Determine the mean of the sampling distribution of ^p. (c) Determine the standard deviation of the sampling distribution of ^p.
(a) Approximately normal because n≤0.05N and np(1-p)≥10. N = 25,000 25,000*0.05 = 1250. n = 700, so it is ≤ 0.05N np(1-p) = (700*0.6)*(1-0.6) = 160, so it is ≥ 10 (b) µ ^p = p = 0.6 (c) σ ^p = √p(1-p)/n = √0.6(1-0.6)/700 = 0.019
(a) In general, the variability among the sample means is called the ______-sample variability and the variability of each sample is called the ______-sample variability. (b) The analysis of variances F-test statistic is given by what?
(a) Between, within (b) F₀ = (between-sample variability) / (within-sample variability)
Do women feel differently from men when it comes to tax rates? One question on a survey of randomly selected adults asked, "What percent of income do you believe individuals should pay in income tax?" (a) Explain why a hypothesis test may be used to test whether the mean tax rates for the two genders differ. (b) Test whether the mean tax rate for females differs from that of males at the α=0.1 level of significance. Determine the null and alternative hypotheses for this test. Let μM represent the mean income tax rate for males and let μF represent the mean income tax rate for females. (c) Find t₀, the test statistic for this hypothesis test, and the P-value. (d) State the appropriate conclusion.
(a) Each sample is obtained independently of the other. Each sample is a simple random sample. Each sample size is small relative to the size of its population. Each sample size is large. (b) H₀: µM=µF and H₁: µM≠µF (c) STAT-->EDIT-->Males into L1, Females into L2, (L1-L2) into L3 STAT-->TESTS-->2-SampTTest (Inpt=Data, List1=L2, List2=L1, Freq1=1, Freq2=1, ≠µ₀, Pooled=No t₀ = -0.99 p = 0.324 (d) Do not reject H₀. There is not sufficient evidence at the level of significance to conclude that the mean income tax rate for males is different from the mean income tax rate for females.
A sociologist randomly selects 374 females 15 to 19 years old and asks each to disclose her family structure at age 14 and whether she has had sexual intercourse. Use the results in the accompanying table to construct a conditional distribution by family structure and draw a bar graph. (a) Construct a conditional distribution by family structure and draw a bar graph. Fill in the conditional distribution.
(a) For each column, divide the number of yes/no by the total number in that column, e.g. for both biological or adoptive parents: Yes = 30/48 = 0.625 No = 18/48 = 0.375
Test the hypothesis using the P-value approach. Be sure to verify the requirements of the test. H₀: p=0.2 versus H₁: p>0.2 n=100 x=30 α= 0.1 (a) Is np₀(1−p₀)≥10? (b) Use technology to find the P-value. (c) _______ the null hypothesis, because the P-value is _______ than α.
(a) Yes. np₀(1-p₀) = (100 * 0.2)*(1-0.2) = 16 (b) STAT→TESTS→1-PropZTest (0.2, 30, 100, >p₀, Calculate) = 0.006 (c) Reject, less
The following data represent the level of health and the level of education for a random sample of 1668 residents. (a) Does the sample evidence suggest that level of education and health are independent at the α=0.05 level of significance? Conduct a P-value hypothesis test. State the hypotheses. (b) Calculate the test statistic and P-value. (c) Make the proper conclusion. (d) Construct a conditional distribution of health by level of education and draw a bar graph.
(a) H₀: Level of education and health are independent. H₁: Level of education and health are dependent. (b) 2nd-->X-1-->EDIT-->[A]--> 4 x 4 matrix Enter given values in each cell. Calculator will automatically compute expected values for [B]. STAT-->TESTS-->X2-Test (Observed: A, Expected: B, Calculate) x2 = 8.661 p = 0.469 (c) Fail to reject H0. There is not sufficient evidence that level of education and health are associated. (d) Divide each cell by the total of its row to obtain the conditional distribution.
A can of soda is labeled as containing 13 fluid ounces. The quality control manager wants to verify that the filling machine is not over-filling the cans. (a) Determine the null and alternative hypotheses that would be used to determine if the filling machine is calibrated correctly. (b) The quality control manager obtains a sample of 84 cans and measures the contents. The sample evidence leads the manager to reject the null hypothesis. Write a conclusion for this hypothesis test. (c) Suppose, in fact, the machine is not out of calibration. Has a Type I or Type II error been made? (d) Management has informed the quality control department that it does not want to shut down the filling machine unless the evidence is overwhelming that the machine is out of calibration. What level of significance would you recommend the quality control manager to use? Explain.
(a) H₀: µ = 13 H₁: µ > 13 (b) There is sufficient evidence to conclude that the machine is out of calibration. (c) A Type I error has been made since the sample evidence led the quality-control manager to reject the null hypothesis, when the null hypothesis is true. (d) The level of significance should be 0.01 because this makes the probability of Type I error small.
The manufacturer of a certain engine treatment claims that if you add their product to your engine, it will be protected from excessive wear. An infomercial claims that a woman drove 3 hours without oil, thanks to the engine treatment. A magazine tested engines in which they added the treatment to the motor oil, ran the engines, drained the oil, and then determined the time until the engines seized. (a) Determine the null and alternative hypotheses that the magazine will test. (b) Both engines took exactly 19 minutes to seize. What conclusion might the magazine make based on this evidence?
(a) H₀: µ = 3 H₁: µ < 3 (b) The infomercial's claim is not true.
An engineer wants to know if the mean strengths of three different concrete mix designs differ significantly. He randomly selects 9 cylinders that measure 6 inches in diameter and 12 inches in height in which mixture A is poured, 9 cylinders of mixture B, and 9 cylinders of mixture C. After 28 days, he measures the strength (in pounds per square inch) of the cylinders. The results are presented in the accompanying table. (a) State the null and alternative hypotheses. (b) Explain why we cannot use one-way ANOVA to test these hypotheses.
(a) H₀: µA = µB = µC H₁: At least one of the means is different. (b) Because the standard deviation for mixture B is more than two times larger than the standard deviation for mixture A.
Assume that both populations are normally distributed. a) Test whether μ1≠μ2 at the α=0.01 level of significance for the given sample data. (b) Determine the test statistic. (c) Approximate the P-value. Choose the correct answer below. (d) Should the hypothesis be rejected at the α=0.01 level of significance? b) Construct a 99% confidence interval about μ1−μ2.
(a) H₀: µ₁=µ₂ and H₁: µ₁≠µ₂ (b) STAT-->TESTS-->2-SampTTest (Inpt=Stats, x1=14.7, Sx1=3.9, n1=12, x2=15.8, Sx2=4.6, n2=12, ≠µ₂, Pooled=no) t₀=-0.63 (c) p=0.534, so P-value≥0.10 (d) Do not reject the null hypothesis because the P-value is greater than or equal to the level of significance. (e) STAT-->TESTS-->2-SampTInt (Inpt=Stats, x1=14.7, Sx1=3.9, n1=12, x2=15.8, Sx2=4.6, n2=12, C-Level=0.99, Pooled=no) The confidence interval is the range from −6.02 to 3.82.
A manufacturer of colored candies states that 13% of the candies in a bag should be brown, 14% yellow, 13% red, 24% blue, 20% orange, and 16% green. A student randomly selected a bag of colored candies. He counted the number of candies of each color and obtained the results shown in the table. Test whether the bag of colored candies follows the distribution stated above at the α=0.05 level of significance. (a) Determine the null and alternative hypotheses. (b) Compute the expected counts for each color: Color ------ Frequency Brown ------ 61 Yellow ------ 64 Red ------ 54 Blue ------ 60 Orange ------ 83 Green ------ 64 (c) What is the test statistic? (d) What is the P-value of the test? (e) Based on the results, do the colors follow the same distribution as stated in the problem?
(a) H₀: The distribution of colors is the same as stated by the manufacturer. H₁: The distribution of colors is not the same as stated by the manufacturer. (b) Total of all colors = 386 Brown: 386 * 0.13 = 50.18 Yellow: 386 * .14 = 54.04 Red: 386 * 0.13 = 50.18 Blue: 386 * 0.24 = 92.64 Orange: 386 * 0.20 = 77.20 Green: 386 * 0.16 = 61.76 (c) STAT-->EDIT-->Frequencies in L1, Expected Frequencies in L2 STAT-->TESTS-->X2GOF-Test (L1, L2, df = 386-1) x² = 16.477 (d) STAT-->TESTS-->X2GOF-Test (L1, L2, df = [number of categories - 1 = 5) p = 0.006 (e) Reject H₀. There is sufficient evidence that the distribution of colors is not the same as stated by the manufacturer.
A book claims that more hockey players are born in January through March than in October through December. The following data show the number of players selected in a draft of new players for a hockey league according to their birth month. Is there evidence to suggest that hockey players' birthdates are not uniformly distributed throughout the year? Use the level of significance α=0.05. Birth Month and Frequency: January-March: 58 April-June: 61 July-September: 37 October-December: 30 (a) What are the null hypothesis and alternative hypotheses? (b) Compute the expected counts for each birth month. The total number of hockey players is 186. (c) What is the test statistic? (d) What is the P-value of the test? (e) Based on the results, do the hockey league's players' birth months follow a uniform distribution? Use the level of significance α=0.05.
(a) H₀: The distribution of hockey players' birth months is uniformly distributed. H₁: The distribution of hockey players' birth months is not uniformly distributed. (b) January-March: 186 * 0.25 = 46.50 April-June: 186 * 0.25 = 46.50 July-September: 186 * 0.25 = 46.50 October-December: 186 * 0.25 = 46.50 (c) STAT-->EDIT-->Frequencies in L1, Expected Frequencies in L2 STAT-->TESTS-->X2GOF-Test (L1, L2, df = [186-1 = 185]) x² = 15.161 (d) STAT-->TESTS-->X2GOF-Test (L1, L2, df = [number of categories - 1 = 3) p = 0.002 (e) No, because the calculated P-value is less than the given α level of significance.
Previously, 5% of mothers smoked more than 21 cigarettes during their pregnancy. An obstetrician believes that the percentage of mothers who smoke 21 cigarettes or more is less than 5% today. She randomly selects 145 pregnant mothers and finds that 4 of them smoked 21 or more cigarettes during pregnancy. Test the researcher's statement at the α=0.05 level of significance. (a) What are the null and alternative hypotheses? (b) Find the P-value. (c) Is there sufficient evidence to support the obstetrician's statement?
(a) H₀: p=0.05 versus H₁: p<0.05 Because np₀(1−p₀)=(145*0.05)(1-0.05)=6.9 < 10, the normal model may not be used to approximate the P-value. (b) 2nd-->VARS-->binomcdf (145, 0.05, 4) = 0.145 (c) Compare the P-value with α. If the P-value is less than α, reject the null hypothesis. Otherwise, do not reject the null hypothesis. No, do not reject the null hypothesis because the P-value is greater than α. There is not sufficient evidence to conclude that the percentage of mothers who smoke 21 or more cigarettes during pregnancy is less than 5%.
Previously, 7.2% of workers had a travel time to work of more than 60 minutes. An urban economist believes that the percentage has increased since then. She randomly selects 95 workers and finds that 8 of them have a travel time to work that is more than 60 minutes. Test the economist's belief at the α=0.05 level of significance. (a) What are the null and alternative hypotheses? (b) Find the P-value. (c) Is there sufficient evidence to support the economist's belief?
(a) H₀: p=0.072 versus H₁: p>0.072 (b) 1-binomcdf (95, 0.072, [8-1 because p>]) = 0.376 (c) No, do not reject the null hypothesis. There is not sufficient evidence because the P-value is greater than α.
Several years ago, 42% of parents who had children in grades K-12 were satisfied with the quality of education the students receive. A recent poll asked 1,005 parents who have children in grades K-12 if they were satisfied with the quality of education the students receive. Of the 1,005 surveyed, 464 indicated that they were satisfied. Construct a 90% confidence interval to assess whether this represents evidence that parents' attitudes toward the quality of education have changed. (a) What are the null and alternative hypotheses? (b) Use technology to find the 90% confidence interval. (c) What is the correct conclusion?
(a) H₀: p=0.42 versus H₁: p≠0.42 (b) STAT-->TESTS-->1-PropZInt (464, 1005, 0.9) The lower bound is 0.44. The upper bound is 0.49. (c) Since the interval does not contain the proportion stated in the null hypothesis, there is sufficient evidence that parents' attitudes toward the quality of education have changed.
In 1945, an organization asked 1477 randomly sampled American citizens, "Do you think we can develop a way to protect ourselves from atomic bombs in case others tried to use them against us?" with 761 responding yes. Did a majority of the citizens feel the country could develop a way to protect itself from atomic bombs in 1945? Use the α=0.01 level of significance. (a) What are the null and alternative hypotheses? (b) Determine the test statistic, z₀, and the P-value. (c) What is the correct conclusion at the α=0.01 level of significance?
(a) H₀: p=0.50 and H₁: p>0.50 (b) STAT-->TESTS-->1-PropZTest (p₀ = 0.50, x = 761, n = 1477, >p₀) z₀ = 1.17 p = 0.121 (c) Since the P-value is greater than the level of significance, do not reject the null hypothesis. There is not sufficient evidence to conclude that the majority of the citizens feel the country could develop a way to protect itself from atomic bombs.
In a recent survey conducted, a random sample of adults 18 years of age or older living in a certain country were asked their reaction to the word socialism. In addition, the individuals were asked to disclose which political party they most associate with. Results of the survey are given in the table. (a) Does the evidence suggest individuals within each political affiliation react differently to the word "socialism"? Use the α=0.05 level of significance. (b) Compute the P-value and make the proper conclusion. (c) Construct a conditional distribution of reaction by political party. (d) Write a summary about the "partisan divide" regarding the reaction to the word "socialism."
(a) H₀: pD=pR=pI H₁: At least one of the proportions is different from the others. (b) 2nd-->X-1-->EDIT-->[A]--> 2 x 3 matrix Enter given values in each cell. Calculator will automatically compute expected values for [B]. STAT-->TESTS-->X2-Test (Observed: A, Expected: B, Calculate) p = 0.001 Conclusion: Yes, there is evidence because the P-value is less than α. (c) To construct a conditional distribution by political party, divide the number of observations in a cell by the column total. (d) Republicans and Independents are far more likely to react negatively to the word "socialism" than Democrats are. All groups had a majority negative reaction.
Suppose the mean wait-time for a telephone reservation agent at a large airline is 40 seconds. A manager with the airline is concerned that business may be lost due to customers having to wait too long for an agent. To address this concern, the manager develops new airline reservation policies that are intended to reduce the amount of time an agent needs to spend with each customer. A random sample of 250 customers results in a sample mean wait-time of 39.4 seconds with a standard deviation of 4.3 seconds. Using α=0.05 level of significance, do you believe the new policies were effective in reducing wait time? Do you think the results have any practical significance? (a) Determine the null and alternative hypotheses. (b) Calculate the test statistic and P-value. (c) State the conclusion for the test. (d) State the conclusion in context of the problem. (e) Do you think the results have any practical significance?
(a) H₀: µ=40 and H₁: µ<40 (b) STAT-->TESTS-->T-Test (µ₀=40, overbar x=39.4, Sx=4.3, n=250, <µ₀) t₀ = -2.21 p = 0.014 (c) Reject H₀ because the P-value is less than the α=0.05 level of significance. (d) There is sufficient evidence at the α=0.05 level of significance to conclude that the new policies were effective. (e) No, because while there is significant evidence that shows the new policies were effective in lowering the mean wait-time of customers, the difference between the previous mean wait-time and the new mean wait-time is not large enough to be considered important.
The average daily volume of a computer stock in 2011 was μ=35.1 million shares, according to a reliable source. A stock analyst believes that the stock volume in 2014 is different from the 2011 level. Based on a random sample of 30 trading days in 2014, he finds the sample mean to be 30.6 million shares, with a standard deviation of s=14.4 million shares. Test the hypotheses by constructing a 95% confidence interval. (a) State the hypotheses for the test. (b) Construct a 95% confidence interval about the sample mean of stocks traded in 2014. (c) Will the researcher reject the null hypothesis?
(a) H₀: μ=35.1 and H₁: μ≠35.1 (b) STAT-->TESTS-->TInterval (Stats, overbar x = 30.6, Sx = 14.4, n = 30, C-level = 0.95) The lower bound is 25.223 million shares. The upper bound is 35.977 million shares. (c) Do not reject the null hypothesis because μ=35.1 million shares falls in the confidence interval.
A math teacher claims that she has developed a review course that increases the scores of students on the math portion of a college entrance exam. Based on data from the administrator of the exam, scores are normally distributed with μ=516. The teacher obtains a random sample of 1800 students, puts them through the review class, and finds that the mean math score of the 1800 students is 521 with a standard deviation of 114. (a) State the null and alternative hypotheses. Let μ be the mean score. (b) Test the hypothesis at the α=0.10 level of significance. Is a mean math score of 521 statistically significantly higher than 516? Conduct a hypothesis test using the P-value approach. (c) Do you think that a mean math score of 521 versus 516 will affect the decision of a school admissions administrator? In other words, does the increase in the score have any practical significance? (d) Test the hypothesis at the α=0.10 level of significance with n=350 students. Assume that the sample mean is still 521 and the sample standard deviation is still 114. Is a sample mean of 521 significantly more than 516? Conduct a hypothesis test using the P-value approach.
(a) H₀: μ=516 and H₁: μ>516 (b) Find the test statistic and P-value: STAT-->TESTS-->T-Test (Stat, μ=516, overbar x = 521, Sx=114, n=1800, >μ₀) t₀ = 1.86 P = 0.031 Is the sample mean statistically significantly higher? Yes, because the P-value is 0.031, which is less than α=0.10. (If the P-value is less than the level of significance, reject the null hypothesis. Otherwise, do not reject the null hypothesis.) (c) No, because the score became only 0.97% greater. (d) Find the test statistic and P-value: t₀ = 0.82 P = 0.206 Is the sample mean statistically significantly higher? No because the P-value is 0.206, which is greater than α=0.10. If the P-value is less than the level of significance, reject the null hypothesis. Otherwise, do not reject the null hypothesis. What do you conclude about the impact of large samples on the P-value? As n increases, the likelihood of rejecting the null hypothesis increases. However, large samples tend to overemphasize practically insignificant differences.
Two researchers conducted a study in which two groups of students were asked to answer 42 trivia questions from a board game. The students in group 1 were asked to spend 5 minutes thinking about what it would mean to be a professor, while the students in group 2 were asked to think about soccer hooligans. These pretest thoughts are a form of priming. The 200 students in group 1 had a mean score of 24.1 with a standard deviation of 4.8, while the 200 students in group 2 had a mean score of 16.1 with a standard deviation of 2.9. (a) Determine the 90% confidence interval for the difference in scores, μ1−μ2. (b) Interpret the interval. (c) What does this say about priming?
(a) STAT-->TESTS-->2-SampTInt (Inpt=Stats, x1=24.1, Sx1=4.8, n1=200, x2=16.1, Sx2=2.9, n2=200, C-Level=0.90, Pooled=No) The lower bound is 7.346. The upper bound is 8.654. (b) The researchers are 90% confident that the difference of the means is in the interval. (c) Since the 90% confidence interval does not contain zero, the results suggest that priming does have an effect on scores.
A researcher wondered if attainment within six years among students who receive grants as part of their educational funding (Group 1) was lower than attainment within six years among students who did not receive grants as part of their educational funding (Group 2). Attainment is defined as whether the student earned the degree or certificate that he/she set out to earn upon enrollment. (a) Is the response variable qualitative or quantitative? (b) How many groups are being compared? (c) In part (b), we learned that two groups (students who receive grants and students who do not receive grants) are being compared. In addition, the sampling method is independent. State the null and alternative hypotheses for this test.
(a) Qualitative (b) 2 (c) H₀: p₁=p₂ and H₁: p₁<p₂
Assume that the differences are normally distributed. (a) Determine di=Xi−Yi for each pair of data. (b) Compute overbar-d and s sub-d. (c) Test if μd<0 at the α=0.05 level of significance. What are the correct null and alternative hypotheses? (d) What is the P-value? (e) Choose the correct conclusion below. (f) Compute a 95% confidence interval about the population mean difference μd.
(a) STAT-->EDIT-->Xi into L1, Yi into L2, (L1-L2) into L3 to calculate di-Xi-Yi (b) STAT-->TESTS-->TTest (Inpt=Data, µ₀=0, List=L3, Freq=1, <µ₀) Overbar-d = overbar-x = -1.575 S sub-d = Sx = 1.769 (c) H₀: µd=0 and H₁: µd<0 (d) p = 0.020 (e) Reject the null hypothesis. There is sufficient evidence that μd<0 at the α=0.05 level of significance. (f) STAT-->TESTS-->TInterval (Inpt=Data, List=L3, Freq=1, C-Level=0.95) The lower bound is −3.05. The upper bound is −0.10.
A blind taste test is conducted to determine which of two colas, Brand A or Brand B, individuals prefer. Individuals are randomly asked to drink one of the two types of cola first, followed by the other cola, and then asked to disclose the drink they prefer. Results of the taste test indicate that 52 of 100 individuals prefer Brand A. (a) Conduct a hypothesis test (preferably using technology) H₀: p=p₀ versus H₁: p≠p₀ for p₀=0.41, 0.42, 0.43, ..., 0.61, 0.62, 0.63 at the α=0.05 level of significance. For which values of p₀ do you not reject the null hypothesis? What do each of the values of p₀ represent? (b) Construct a 95% confidence interval for the proportion of individuals who prefer Brand A. (c) Suppose you changed the level of significance in conducting the hypothesis test to α=0.01. What would happen to the range of values for p₀ for which the null hypothesis is not rejected? Why does this make sense? Choose the correct answer below.
(a) STAT-->TESTS-->1-PropZTest (0.41, 52, 100, p≠0)-->p=0.025-->Continue trying each p₀ value until you find the lower and upper limits at which p₀>0.05. Do not reject the null hypothesis for the values of p₀ between 0.43 and 0.61, inclusively. (b) STAT-->TESTS-->1-PropZInt (52, 100, 0.95) The lower bound is 0.422. The upper bound is 0.618. (c) STAT-->TESTS-->1-PropZInt (52, 100, 0.99) = 0.391, 0.649) The range of values would increase because the corresponding confidence interval would increase in size.
The research group asked the following question of individuals who earned in excess of $100,000 per year and those who earned less than $100,000 per year: "Do you believe that it is morally wrong for unwed women to have children?" Of the 1,205 individuals who earned in excess of $100,000 per year, 715 said yes; of the 1,310 individuals who earned less than $100,000 per year, 700 said yes. Construct a 95% confidence interval to determine if there is a difference in the proportion of individuals who believe it is morally wrong for unwed women to have children. (a) The lower bound is ______ and the upper bound is ______. (b) Because the confidence interval _______ 0, there is _______ evidence at the α=0.05 level of significance to conclude that there is a difference in the proportions. It seems that the proportion of individuals who earn over $100,000 that feel it is morally wrong for unwed women to have children is _______ the proportion of individuals who earn less than $100,000 that feel it is morally wrong for unwed women to have children.
(a) STAT-->TESTS-->2-PropZInt (x1=715, n1=1205, x2=700, n2=1310, C-Level=0.95) Lower: 0.020 Upper: 0.098 (b) Does not include, sufficient, greater than
A random sample of 40 adults with no children under the age of 18 years results in a mean daily leisure time of 5.02 hours, with a standard deviation of 2.39 hours. A random sample of 40 adults with children under the age of 18 results in a mean daily leisure time of 4.41 hours, with a standard deviation of 1.86 hours. Construct and interpret a 90% confidence interval for the mean difference in leisure time between adults with no children and adults with children μ1−μ2. Let μ1 represent the mean leisure hours of adults with no children under the age of 18 and μ2 represent the mean leisure hours of adults with children under the age of 18. (a) The 90% confidence interval for μ1−μ2 is the range from ____ hours to ____ hours. (b) What is the interpretation of this confidence interval?
(a) STAT-->TESTS-->2-SampTInt (Inpt=Stats, x1=5.02, Sx1=2.39, n1=40, x2=4.41, Sx2=1.86, n2=40, C-Level=0.90, Pooled=No) The 90% confidence interval for μ1−μ2 is the range from -0.19 hours to 1.41 hours. (b) There is 90% confidence that the difference of the means is in the interval. Conclude that there is insufficient evidence of a significant difference in the number of leisure hours.
Suppose a simple random sample of size n=37 is obtained from a population with μ=67 and σ=17. (a) What must be true regarding the distribution of the population in order to use the normal model to compute probabilities regarding the sample mean? (b) Assuming the normal model can be used, describe the sampling distribution overbar x. (c) Assuming the normal model can be used, determine P(overbar x<71.1). (d) Assuming the normal model can be used, determine P(overbar x≥69.1).
(a) Since the sample size is large enough, the population distribution does not need to be normal. (b) Approximately normal, with µ v overbar x=67 and σ v overbar x=17/(√37) (c) normalcdf (-9999999, 71.1, 67, (17/(√37)) = 0.9288 (d) normalcdf (69.1, 9999999, 67, (17/√(37)) = 0.2262
A simple random sample of size n is drawn from a population that is normally distributed. The sample mean, overbar x, is found to be 115, and the sample standard deviation, s, is found to be 10. (a) Construct a 90% confidence interval about μ if the sample size, n, is 21. (b) Construct a 90% confidence interval about μ if the sample size, n, is 16. How does decreasing the sample size affect the margin of error, E? (c) Construct an 80% confidence interval about μ if the sample size, n, is 21. Compare the results to those obtained in part (a). How does decreasing the level of confidence affect the size of the margin of error, E? (d) Could we have computed the confidence intervals in parts (a)-(c) if the population had not been normally distributed?
(a) TInterval Input: Stats (115, 10, 21, 0.9) = 111.2, 118.8 (b) TInterval Input: Stats (115, 10, 16, 0.9) = 110.6, 119.4 As the sample size decreases, the margin of error increases. (c) TInterval Input: Stats (115, 10, 21, 0.8) = 112.1, 117.9. As the level of confidence decreases, the size of the interval decreases. (d) No, the population needs to be normally distributed.
A college entrance exam company determined that a score of 21 on the mathematics portion of the exam suggests that a student is ready for college-level mathematics. To achieve this goal, the company recommends that students take a core curriculum of math courses in high school. Suppose a random sample of 200 students who completed this core set of courses results in a mean math score of 21.7 on the college entrance exam with a standard deviation of 3.6. Do these results suggest that students who complete the core curriculum are ready for college-level mathematics? That is, are they scoring above 21 on the math portion of the exam? (a) State the appropriate null and alternative hypotheses. (b) Verify that the requirements to perform the test using the t-distribution are satisfied. Select all that apply. (c) Use the P-value approach at the α=0.05 level of significance to test the hypotheses in part (a). Identify the test statistic and approximate the P-value. (d) Write a conclusion based on the results.
(a) The appropriate null and alternative hypotheses are H₀: μ=21 versus H₁: μ>21. (b) The students' test scores were independent of one another. The students were randomly sampled. The sample size is larger than 30. (c) STAT-->TESTS-->T-Test (Stats, 21, 21.7, 13.6, 200, >µ₀) t₀ = 2.75 p = 0.003. The P-value is in the range P-value<0.1 (d) Reject the null hypothesis and claim that there is sufficient evidence to conclude that the population mean is greater than 21.
The shape of the distribution of the time required to get an oil change at a 15-minute oil-change facility is unknown. However, records indicate that the mean time is 16.1 minutes, and the standard deviation is 4.8 minutes. (a) To compute probabilities regarding the sample mean using the normal model, what size sample would be required? (b) What is the probability that a random sample of n=45 oil changes results in a sample mean time less than 15 minutes? (c) Suppose the manager agrees to pay each employee a $50 bonus if they meet a certain goal. On a typical Saturday, the oil-change facility will perform 45 oil changes between 10 A.M. and 12 P.M. Treating this as a random sample, at what mean oil-change time would there be a 10% chance of being at or below? This will be the goal established by the manager.
(a) The sample size needs to be greater than 30. (b) normalcdf (-9999999, 15, 16.1, (4.8/(√45) = 0.0621 (c) invNorm (0.10, 16.1, (4.8/(√45)) = 15.2
To test the belief that sons are taller than their fathers, a student randomly selects 13 fathers who have adult male children. She records the height of both the father and son in inches and obtains the following data. Are sons taller than their fathers? Use the α=0.10 level of significance. Note: A normal probability plot and boxplot of the data indicate that the differences are approximately normally distributed with no outliers. (a) Which conditions must be met by the sample for this test? Select all that apply. (b) Write the hypotheses for the test. Use the difference "Fathers−Sons." (c) Calculate the test statistic and P-value. (d) Approximate the P-value for this hypothesis test. (e) Should the null hypothesis be rejected?
(a) The differences are normally distributed or the sample size is large. The sampling method results in a dependent sample. The sample size is no more than 5% of the population size. (b) H₀: µd=0 and H₁: µd>0 (c) STAT-->EDIT-->Xi into L1, Yi into L2, (L1-L2) into L3 STAT-->TESTS-->2-SampTTest (Inpt=Data, List1=L1, List2=L2, Freq1=1, Freq2=1, <µ₂, Pooled=No) t₀ = 0.01 (d) p=0.503 The P-value is in the range 0.25 ≤ P-value < 1. (e) Do not reject H₀ because the P-value is greater than the level of significance. There is not sufficient evidence to conclude that sons are taller than their fathers at the 0.10 level of significance.
In a survey of 2055 adults in a certain country conducted during a period of economic uncertainty, 65% thought that wages paid to workers in industry were too low. The margin of error was 2 percentage points with 95% confidence. For parts (a) through (d) below, which represent a reasonable interpretation of the survey results? For those that are not reasonable, explain the flaw. (a) We are 95% confident 65% of adults in the country during the period of economic uncertainty felt wages paid to workers in industry were too low. (b) We are 93% to 97% confident 65% of adults in the country during the period of economic uncertainty felt wages paid to workers in industry were too low. (c) We are 95% confident that the interval from 0.63 to 0.67 contains the true proportion of adults in the country during the period of economic uncertainty who believed wages paid to workers in industry were too low. (d) In 95% of samples of adults in the country during the period of economic uncertainty, the proportion who believed wages paid to workers in industry were too low is between 0.63 and 0.67.
(a) The interpretation is flawed. The interpretation provides no interval about the population proportion. (b) The interpretation is flawed. The interpretation indicates that the level of confidence is varying. (c) The interpretation is reasonable. (d) The interpretation is flawed. The interpretation suggests that this interval sets the standard for all the other intervals, which is not true.
According to a study, the proportion of people who are satisfied with the way things are going in their lives is 0.76. Suppose that a random sample of 100 people is obtained. (a) Suppose the random sample of 100 people is asked, "Are you satisfied with the way things are going in your life?" Is the response to this question qualitative or quantitative? Explain. (b) Explain why the sample proportion, ^p, is a random variable. What is the source of the variability? (c) Describe the sampling distribution of ^p, the proportion of people who are satisfied with the way things are going in their life. Be sure to verify the model requirements. (d) In the sample obtained in part (a), what is the probability that the proportion who are satisfied with the way things are going in their life exceeds 0.79? (e) Using the distribution from part (c), would it be unusual for a survey of 100 people to reveal that 70 or fewer people in the sample are satisfied with their lives?
(a) The response is qualitative because the responses can be classified based on the characteristic of being satisfied or not. (b) The sample proportion ^p is a random variable because the value of ^p varies from sample to sample. The variability is due to the fact that different people feel differently regarding their satisfaction. (c) np(1-p) = (100*0.76)*(1-0.76) = 18.240 µ^p = ^p = 0.76 σ^p = √(p*(1-p)/n) = √(0.76*(1-0.76)/100) = 0.0427 Since the sample size is no more than 5% of the population size and np(1−p)=18.240≥10, the distribution of ^p is approximately normal with μ^p=0.760 and σ^p=0.043. (d) normalcdf (0.79, 9999999, 0.76, 0.043) = 0.2427 (e) normalcdf (-9999999, 0.7, 0.76, 0.043) = 0.0815 The probability that 70 or fewer people in the sample are satisfied is 0.0815, which is not unusual because this probability is not less than 5%.
A researcher with the Department of Education followed a cohort of students who graduated from high school in a certain year, monitoring the progress the students made toward completing a bachelor's degree. One aspect of his research was to determine whether students who first attended community college took longer to attain a bachelor's degree than those who immediately attended and remained at a 4-year institution. The data in the table attached below summarize the results of his study. (a) What is the response variable in this study? What is the explanatory variable? (b) Explain why this study can be analyzed using inference of two sample means. Determine what qualifications are met to perform the hypothesis test about the difference between two means. (c) Does the evidence suggest that community college transfer students take longer to attain a bachelor's degree? Use an α=0.01 level of significance. Perform a hypothesis test. Determine the null and alternative hypotheses. (d) Determine the test statistic and P-value. (e) Should the hypothesis be rejected? (f) Construct a 95% confidence interval for μ(community college)−μ(no transfer) to approximate the mean additional time it takes to complete a bachelor's degree if you begin in community college. (g) Do the results of parts e) and f) imply that community college causes you to take extra time to earn a bachelor's degree?
(a) The response variable is the time to graduate. The explanatory variable is the use of community college or not. (b) The samples are independent. The samples can be reasonably assumed to be random. The sample sizes are not more than 5% of the population. The sample sizes are large (both greater than or equal to 30). (c) H₀: μ(community college)=μ(no transfer) and H₁: μ(community college)>μ(no transfer) (d) STAT-->TESTS-->2-SampTTest (Inpt=Stats, x1=5.44, Sx1=1.131, n1=253, x2=4.47, Sx2=1.006, n2=1122, >µ2, Pooled=No) t₀ = 12.57 p = 0.001 (e) Reject the null hypothesis. The evidence does suggest that community college transfer students take longer to attain abachelor's degree at the α=0.01 level of significance. (f) STAT-->TESTS-->2-SampTInt (Inpt=Stats, x1=5.44, Sx1=1.131, n1=253, x2=4.47, Sx2=1.006, n2=1122, C-Level=0.95, Pooled=no) The confidence interval is the range from 0.818 to 1.122. (g) No.
A(n) _______ is any collection of outcomes from a probability experiment.
Event
The acceptable level for insect filth in a certain food item is 3 insect fragments (larvae, eggs, body parts, and so on) per 10 grams. A simple random sample of 40 ten-gram portions of the food item is obtained and results in a sample mean of overbar x=3.2 insect fragments per ten-gram portion. (a) Why is the sampling distribution of overbar x approximately normal? (b) What is the mean and standard deviation of the sampling distribution of overbar x assuming μ=3 and σ=√3? (c) What is the probability a simple random sample of 40 ten-gram portions of the food item results in a mean of at least 3.2 insect fragments? Is this result unusual? What might we conclude?
(a) The sampling distribution of overbar x is approximately normal because the sample size is large enough. (b) µ v overbar x = 3 σ v overbar x = (√3)/(√40) = 0.274 (c) P(overbar x≥3.2) normalcdf (3.2, 9999999, 3, 0.274) = 0.2327 This result is not unusual because its probability is large. Since this result is not unusual, it is not reasonable to conclude that the population mean is higher than 3.
A sociologist randomly selects 378 females 15 to 19 years old and asks each to disclose her family structure at age 14 and whether she has had sexual intercourse. (a) Compute the expected values of each cell under the assumption of independence. (b) Verify that the requirements for performing a chi-square test of independence are satisfied. Select all requirements for a chi-square test of independence. (c) Compute the chi-square test statistic and P-value. (d) Test whether family structure and sexual activity of 15- to 19-year-old females are independent at the α=0.1 level of significance. State the hypotheses. Choose the correct answer below. (e) Make the proper conclusion. (f) Compare the observed frequencies with the expected frequencies. Which cell contributed most to the test statistic? (g) Was the expected frequency greater than or less than the observed frequency? (h) What does this information tell you?
(a) To find the expected frequencies in a cell when performing a chi-square independence test, multiply the row total of the row containing the cell by the column total of the column containing the cell and divide this result by the table total, e.g. for both biological or adoptive parents: Yes = ((31+43+65+61=200) * (31+17=48))/(200+178=378) = (200*48)/378 = 25.40 No = (178*48)/378 = 22.60 (b) No more than 20% of the expected frequencies are less than 5. All expected frequencies are greater than or equal to 1. (c) 2nd-->X-1-->EDIT-->[A]--> 2 x 4 matrix Enter given values in each cell. Calculator will automatically compute expected values for [B]. STAT-->TESTS-->X2-Test (Observed: A, Expected: B, Calculate) x2 = 9.584 p = 0.022 (d) H₀: Sexual activity and family structure are independent. H₁: Sexual activity and family structure are dependent. (e) The P-value is less than α, so reject H₀. There is sufficient evidence at the α=0.05 level of significance to conclude that sexual activity and family structure are related. (f) The cell for females who did not have sexual intercourse and lived with a parent and a stepparent contributed most to the test statistic. (g) The observed frequency was greater than the expected frequency. (h) This means that family structure seems to have an effect on whether or not a child has had sexual intercourse.
A researcher studies water clarity at the same location in a lake on the same dates during the course of a year and repeats the measurements on the same dates 5 years later. The researcher immerses a weighted disk painted black and white and measures the depth (in inches) at which it is no longer visible. The collected data is given in the table below. (a) Why is it important to take the measurements on the same date? (b) Does the evidence suggest that the clarity of the lake is improving at the α=0.05 level of significance? Note that the normal probability plot and boxplot of the data indicate that the differences are approximately normally distributed with no outliers. Let di=Xi−Yi. Identify the null and alternative hypotheses. (c) Determine the test statistic and P-value for this hypothesis test. (d) What is your conclusion regarding H₀?
(a) Using the same dates makes the second sample dependent on the first and reduces variability in water clarity attributable to date. (b) H₀: µd=0 and H₁: µd<0 (c) STAT-->ENTER-->Xi into L1, Yi into L2, (L1-L2) into L3. STAT-->TESTS-->T-Test (Inpt=Data, µ₀=0, List=L3, Freq=1, <µ₀) t = -3.50 p = 0.009 (d) Reject H₀. There is sufficient evidence at the α=0.05 level of significance to conclude that the clarity of the lake is improving.
The mean waiting time at the drive-through of a fast-food restaurant from the time an order is placed to the time the order is received is 84.9 seconds. A manager devises a new drive-through system that he believes will decrease wait time. As a test, he initiates the new system at his restaurant and measures the wait time for 10 randomly selected orders. The wait times are provided in the table to the right. (a) Because the sample size is small, the manager must verify that the wait time is normally distributed and the sample does not contain any outliers. The normal probability plot is shown below and the sample correlation coefficient is known to be r=0.978. Are the conditions for testing the hypothesis satisfied? (b) Is the new system effective? Conduct a hypothesis test using the P-value approach and a level of significance of α=0.05. i. First determine the appropriate hypotheses. ii. Find the test statistic. iii. Find the P-value. (c) Use the α=0.05 level of significance. What can be concluded from the hypothesis test?
(a) Yes, the conditions are satisfied. The normal probability plot is linear enough, since the correlation coefficient is greater than the critical value. In addition, a boxplot does not show any outliers. (b) i. H₀: μ=84.9 and H₁: μ<84.9 ii. STAT-->EDIT-->T-Test (Stats, µ₀=84.9, overbar x=79, Sx=15.14, n=10, <µ₀) t₀ = -1.23 iii. STAT-->EDIT-->T-Test (Stats, µ₀=84.9, overbar x=79, Sx=15.14, n=10, <µ₀) p = 0.125 (c) The P-value is greater than the level of significance so there is not sufficient evidence to conclude the new system is effective.
Suppose that the probability that a passenger will miss a flight is 0.0945. Airlines do not like flights with empty seats, but it is also not desirable to have overbooked flights because passengers must be "bumped" from the flight. Suppose that an airplane has a seating capacity of 52 passengers. (a) If 54 tickets are sold, what is the probability that 53 or 54 passengers show up for the flight resulting in an overbooked flight? (b) Suppose that 58 tickets are sold. What is the probability that a passenger will have to be "bumped"? (c) For a plane with seating capacity of 60 passengers, how many tickets may be sold to keep the probability of a passenger being "bumped" below 5%?
(a) binompdf (54, [1-0.0945=0.9055], 53) + binompdf (54, 0.9055, 54) = 0.0312 (b) 1 - binomcdf (58, 0.9055, 52) = 0.5281 (c) binompdf (61, 0.9055, 61) = 0.0023 binompdf (62, 0.9055, 62) + binompdf (62, 0.9055, 61) = 0.0159 → still under 5% binompdf (63, 0.9055, 63) + binompdf (63, 0.9055, 62) + binompdf (63, 0.9055, 61) = 0.5547 → over 5%, so the maximum number is 62.
The number of chocolate chips in a bag of chocolate chip cookies is approximately normally distributed with a mean of 1261 chips and a standard deviation of 117 chips. (a) Determine the 26th percentile for the number of chocolate chips in a bag. (b) Determine the number of chocolate chips in a bag that make up the middle 96% of bags. (c) What is the interquartile range of the number of chocolate chips in a bag of chocolate chip cookies?
(a) invNorm (0.26, 1261, 117) = 1186 (b) invNorm (0.02, 1261, 117) = 1021 invNorm (0.98, 1261, 117) = 1501 (c) invNorm (0.25, 1261, 117) = 1182 invNorm (0.75, 1261, 117) = 1340 1340 - 1182 = 158
Determine the t-value in each of the cases. (a) Find the t-value such that the area in the right tail is 0.025 with 15 degrees of freedom. (b) Find the t-value such that the area in the right tail is 0.20 with 11 degrees of freedom. (c) Find the t-value such that the area left of the t-value is 0.02 with 14 degrees of freedom. (d) Find the critical t-value that corresponds to 90% confidence. Assume 11 degrees of freedom.
(a) invT (1-0.025, 15) = 2.131 (b) invT (1-0.20, 11) = 0.876 (c) invT (0.02, 14) = -2.264 (d) 1-0.90 = 0.10/2 = 0.05 per tail invT (1-0.05, 11) = 1.796
In a trial of 150 patients who received 10-mg doses of a drug daily, 27 reported headache as a side effect. (a) Are the requirements for constructing a confidence satisfied? (b) Construct and interpret a 95% confidence interval for the population proportion of patients who receive the drug and report a headache as a side effect.
(a) n^p = 150 * 0.27 = 40.5 n(1-^p) = 150(1-0.27) = 109.5 Both 40.5 and 109.5 ≥ 10, so yes, the requirements for constructing a confidence interval are satisfied. (B) STAT-->TEST-->1-PropZInt (27, 150, 0.95) = (0.11852, 0.24148) One can be 95% confident that the proportion of patients who receive the drug and report a headache as a side effect is between 0.119 and 0.241.
Suppose the lengths of the pregnancies of a certain animal are approximately normally distributed with mean μ=206 days and standard deviation σ=10 days. (a) What is the probability that a randomly selected pregnancy lasts less than 203 days? Interpret this probability. (b) Suppose a random sample of 23 pregnancies is obtained. Describe the sampling distribution of the sample mean length of pregnancies. (c) What is the probability that a random sample of 23 pregnancies has a mean gestation period of 203 days or less? Interpret this probability. (d) What is the probability that a random sample of 50 pregnancies has a mean gestation period of 203 days or less? Interpret this probability. (e) What might you conclude if a random sample of 50 pregnancies resulted in a mean gestation period of 203 days or less? (f) What is the probability a random sample of size 19 will have a mean gestation period within 9 days of the mean?
(a) normalcdf (-9999999, 203, 206, 10) = 0.3821 If 100 pregnant individuals were selected independently from this population, we would expect 38 pregnancies to last less than 203 days. (b) n = 23, σ = σ/√n = 10/√23 = 2.0851 The sampling distribution of overbar x is normal with μx=206 and σx=2.0851 (c) normalcdf (-9999999, 203, 206, (10/(√23)) = 0.0751 If 100 independent random samples of size n=23 pregnancies were obtained from this population, we would expect 8 sample(s) to have a sample mean of 203 days or less. (d) normalcdf (-9999999, 203, 206, (10/(√50)) = 0.0169 If 100 independent random samples of size n=50 pregnancies were obtained from this population, we would expect 2 sample(s) to have a sample mean of 203 days or less. (e) This result would be unusual, so the sample likely came from a population whose mean gestation period is less than 206 days. (f) Within 9 days of the mean = 197 to 215 days normalcdf (197, 215, 206, (10/√19)) = 0.9999
In randomized, double-blind clinical trials of a new vaccine, monkeys were randomly divided into two groups. Subjects in group 1 received the new vaccine while subjects in group 2 received a control vaccine. After the second dose, 114 of 724 subjects in the experimental group (group 1) experienced fever as a side effect. After the second dose, 71 of 605 of the subjects in the control group (group 2) experienced fever as a side effect. Does the evidence suggest that a higher proportion of subjects in group 1 experienced fever as a side effect than subjects in group 2 at the α=0.05 level of significance? (a) Verify the model requirements. Select all that apply. (b) Determine the null and alternative hypotheses. (c) Find the test statistic and P-value for this hypothesis test. (d) Interpret the P-value. (e) State the conclusion for this hypothesis test.
(a) n₁p-hat₁(1−p₁)≥10 and n₂p-hat₂(1−p₂)≥10 The samples are independent. The sample size is less than 5% of the population size for each sample. (b) H₀: p₁ = p₂ and H₁: p₁ > p₂ (c) STAT-->TESTS-->2-PropZTest (x1=114, n1=724, x2=71, n2=605, >p2) z = 2.10 p=0.018 (d) If the population proportions are equal, one would expect a sample difference proportion greater than the one observed in about 18 out of 1000 repetitions of this experiment. (e) Reject H₀. There is sufficient evidence to conclude that a higher proportion of subjects in group 1 experienced fever as a side effect than subjects in group 2 at the α=0.05 level of significance.
Determine the point estimate of the population proportion, the margin of error for the following confidence interval, and the number of individuals in the sample with the specified characteristic, x, for the sample size provided. Lower bound=0.108 Upper bound=0.422 n=1200 (a) The point estimate of the population proportion is _______. (b) The margin of error is _______. (c) The number of individuals in the sample with the specified characteristic is _______.
(a) p = (0.108+0.422)/2 = 0.265 (b) 0.422-0.265 = 0.157 (c) p = x/n 0.265 = x/1200 x = 0.265 * 1200 = 318
A cellular phone company monitors monthly phone usage. The following data represent the monthly phone use in minutes of one particular customer for the past 20 months. (a) Determine the standard deviation and interquartile range of the data. (b) Suppose the month in which the customer used 325 minutes was not actually that customer's phone. That particular month the customer did not use their phone at all, so 0 minutes were used. How does changing the observation from 325 to 0 affect the standard deviation and interquartile range? (c) What property does this illustrate?
(a) s = 59.83, IQR = 84.5 (b) The standard deviation increases and the interquartile range is not affected. (c) Resistance
Suppose X is a normal random variable with mean μ=65 and standard deviation σ=8. (a) Compute the z-value corresponding to X=55. (b) Suppose the area under the standard normal curve to the left of the z-value found in part (a) is 0.1056. What is the area under the normal curve to the left of X=55? (c) What is the area under the normal curve to the right of X=55?
(a) z = (x-µ)/σ = (55-65)/8 = -1.25 (b) The area to the left of X=55 is 0.1056 (c) 1 - 0.1056 = 0.8944
What is the probability of an event that is impossible?
0
According to an almanac, 80% of adult smokers started smoking before turning 18 years old. (a) Compute the mean and standard deviation of the random variable X, the number of smokers who started smoking before 18 based on a random sample of 300 adults. (b) Interpret the mean. (c) Would it be unusual to observe 340 smokers who started smoking before turning 18 years old in a random sample of 400 adult smokers? Why?
(a) µx = (300 * 0.80) = 240 σx = √(300 * 0.80)*(1-0.80) = 6.928 (b) It is expected that in a random sample of 300 adult smokers, 240 will have started smoking before turning 18. (c) Yes, because 340 is greater than µ + 2σ (More than two standard deviations from the mean)
A researcher wishes to estimate the proportion of adults who have high-speed Internet access. What size sample should be obtained if she wishes the estimate to be within 0.03 with 99% confidence if: (a) she uses a previous estimate of 0.58? (b) she does not use any prior estimates?
(a) α = 1-0.99 = 0.01 0.01/2 = 0.005 Critical value of 0.005 = 2.575 n = 0.58(1-0.58)*(2.575/0.03)^2 = 1795 (b) Use 0.5 for estimated value. n = 0.5(1-0.5)*(2.575/0.03)^2 = 1842
To test H₀: μ=107 versus H₁: μ≠107 a simple random sample of size n=35 is obtained. (a) Does the population have to be normally distributed to test this hypothesis? Why? (b) If overbar x=103.9 and s=5.8, compute the test statistic. (c) Draw a t-distribution with the area that represents the P-value shaded. Choose the correct graph below. (d) Approximate the P-value. (e) Interpret the P-value. (f) If the researcher decides to test this hypothesis at the α=0.01 level of significance, will the researcher reject the null hypothesis?
(a) No, because n≥30. (b) STAT-->TESTS-->T-Test (Stats, 107, 103.9, 5.8, 35, ≠µ₀, Calculate) t₀ = -3.16 (c) Graph with two tails shaded. (d) STAT-->TESTS-->T-Test (Stats, 107, 103.9, 5.8, 35, ≠µ₀, Calculate) p = 0.003, so 0.002<P-value<0.005 (e) If 1000 random samples of size n=35 are obtained, about 3 samples are expected to result in a mean as extreme or more extreme than the one observed if μ=107. (f) Yes, because the P-value is less than the level of significance (α=0.05).
How do you calculate a z-score?
(value - mean) / standard deviation
Suppose you toss a coin 100 times and get 83 heads and 17 tails. Based on these results, what is the probability that the next flip results in a head?
0.83
When constructing 95% confidence intervals for the mean when the parent population is right skewed and the sample size is small, the proportion of intervals that include the population mean approaches _____ as the sample size, n, increases.
0.95
In a relative frequency distribution, what should the relative frequencies add up to?
1
What are the steps for finding quartiles?
1. Arrange the data in ascending order. 2. Determine the median (M) or second quartile (Q2). 3. Divide the data into two halves. Q1 is the median of the bottom half. Q3 is the median of the top half.
A quality-control manager randomly selects 100 bottles of mustard that were filled on July 25 to assess the calibration of the filling machine. 1. What is the population in the study? 2. What is the sample in the study?
1. All bottles of mustard produced in the plant on July 25. 2. The 100 bottles of mustard selected in the plant on July 25.
The linear correlation coefficient is always between _______ and _______, inclusive.
-1, 1
When constructing a 99% confidence interval, α = ______.
0.01
If the consequences of making a Type I error are severe, would you choose the level of significance, α, to equal 0.01, 0.05, or 0.10?
0.01 Choose a small α to make it difficult to reject H₀.
When constructing a 95% confidence interval, α = ______.
0.05
Suppose you flip a coin five times. What is the probability of obtaining five tails in a row assuming the coin is fair?
0.5⁵ = 0.03125
What are the steps for checking for outliers using quartiles?
1. Determine Q1 and Q3. 2. Compute the IQR. 3. Determine fences (cutoff points). Lower fence = Q1 - (1.5 * IQR). Upper fence = Q3 + (1.5 * IQR). 4. If data value is less than the lower fence or greater than the upper fence, it is considered an outlier.
The methods of statistics follow a process. Place the processes in the correct order.
1. Identify the research objective. 2. Collect the data needed to answer the research question(s). 3. Describe the data. 4. Perform inference.
Before interpreting a y-intercept, what two questions must be asked?
1. Is 0 a reasonable value for the explanatory variable (x)? 2. Do any observations near x=0 exist in the data set?
It is extremely important for a researcher to clearly define the variables in a study because this helps to determine the type of analysis that can be performed on the data. For example, if a researcher wanted to describe countries based on ISBN group identifier, what level of measurement would the variable "ISBN group identifier" be? Now suppose the researcher felt that certain countries who were farther north received higher identifying numbers. Does the level of measurement of the variable change? If so, how? 1. What is the level of measurement of the variable "ISBN group identifier" in the original scenario? 2. Does the level of measurement of the variable change in the second scenario?
1. Nominal 2. Yes, it changes to ordinal.
Match the form of bias to the scenario: 1. A survey mailed to residents of a town has a response rate of less than 2%. 2. A survey that pertains to feelings about federal income tax does not include high-income earners. 3. A survey of college students in which they are asked to disclose the number of hours they study. 4. A survey conducted by a tax revenue collection agents of taxpayers to identify sources of fraudulent deductions.
1. Nonresponse Bias 2. Sampling Bias 3. Response Bias 4. Response Bias
A polling organization conducts a study to estimate the percentage of households that have their children in private schools. It mails a questionnaire to 1997 randomly selected households across the country and asks the head of each household if he or she has their children in private schools. Of the 1997 households selected, 40 responded. 1. Which of these best describes the bias in the survey? 2. How can the bias be remedied?
1. Nonresponse bias 2. The polling organization should try contacting households that do not respond by phone or face-to-face.
What are the three steps for determining the sample distribution of the sample mean?
1. Obtain a simple random sample of size n. 2. Compute the sample mean. 3. Assuming that we are sampling from a finite population, repeat steps 1 and 2 until all distinct simple random samples of size n have been obtained.
An abortion rights advocate wants to estimate the percentage of people who favor opening abortion clinics. He conducts a nationwide survey of 1740 randomly selected adults 18 years and older. The interviewer asks the respondents, "Do you favor supporting women's rights by keeping abortion clinics open?" 1. Which of these best describes the bias in the survey? 2. How can the bias be remedied?
1. Response bias 2. The interviewer should reword the question.
The manager of a shopping mall wishes to expand the number of shops available in the food court. He has a market researcher survey the first 90 customers who come into the food court during weekend afternoons to determine what types of food the shoppers would like to see added to the food court. The survey has bias. Determine whether the flaw is due to the sampling method or the survey itself. For biased surveys, identify the cause of the error. 1. What is the cause of the bias? 2. Which of the following is the best way to remedy this problem?
1. Sampling bias 2. Ask customers throughout the day on both weekdays and weekends.
A health teacher wishes to do research on the weight of college students. She obtains the weights for all the students in her 9 A.M. class by looking at their driver's licenses or state IDs. 1. What is the type of bias? 2. What is a possible remedy?
1. Sampling bias 2. Use systematic random sampling
A medical journal published the results of an experiment on insomnia. The experiment investigated the effects of a controversial new therapy for insomnia. Researchers measured the insomnia levels of 74 adult women who suffer moderate conditions of the disorder. After the therapy, the researchers again measured the women's insomnia levels. The differences between the the pre- and post-therapy insomnia levels were reported. 1. What are the experimental units? 2. What is the response variable? 3. What is the treatment? 4. How many levels does the treatment have in this experiment?
1. The 74 adult women who suffer from insomnia. 2. The differences between the the pre- and post-therapy insomnia levels. 3. The therapy. 4. Two (pre- and post-therapy)
The area under a normal curve corresponding to a certain characteristic of the normal random variable may be interpreted in any of the following ways:
1. The area corresponds to the probability that a randomly selected individual from the population has the characteristic. 2. The area corresponds to the proportion of the population with the characteristic.
What are the requirements for constructing a confidence interval (t-interval) for a population mean µ?
1. The data must come from a random experiment (simple random sample). 2. n < 0.05N 3. It must be normally distributed (without outliers) or n ≥ 30.
What is the criteria for a binomial experiment?
1. The experiment is performed a fixed number of times. Each repetition of the experiment is called a trial. 2. The trials are independent. The outcome of one trial will not affect the outcome of the other trials. 3. For each trial, there are two mutually exclusive (disjoint) outcomes: success or failure. 4. The probability of success is fixed for each trial of the experiment.
To determine if topiramate is an effective treatment for alcohol dependence, researchers conducted a 14-week trial of 371 men and women aged 18 to 65 years diagnosed with alcohol dependence. In this double-blind, randomized, placebo-controlled experiment, subjects were randomly given either 300 milligrams (mg) of topiramate (183 subjects) or a placebo (188 subjects) daily, along with a weekly compliance enhancement intervention. The variable used to determine the effectiveness of the treatment was self-reported percentage of heavy drinking days. Results indicated that topiramate was more effective than placebo at reducing the percentage of heavy drinking days. The researchers concluded that topiramate is a promising treatment for alcohol dependence. 1. What does it mean for the experiment to be placebo-controlled? 2. What does it mean for the experiment to be double-blind? 3. Why do you think it is necessary for the experiment to be double-blind? 4. What does it mean for the experiment to be randomized? 5. What is the population for which this study applies? What is the sample? 6. What are the treatments? 7. What is the response variable?
1. The experiment will have a control group that takes a placebo, which is a innocuous medication, such as a sugar tablet. This control group serves as a baseline treatment that can be used to compare to the group that is actually taking the medication. 2. Neither the subject nor the researcher knows which treatment the subject is receiving. 3. The experiment is double-blind so that the subjects receiving the medication do not behave differently and so the individual monitoring the subjects does not treat those receiving medication differently from those receiving a placebo. 4. It means that the subjects are randomly assigned to take either the topiramate or the placebo. 5. The population is all 18-65 year olds with alcohol dependence. The sample is 371 men and women aged 18 to 65 years diagnosed with alcohol dependence. 6. 300 mg of topiramate or a placebo daily, and a weekly compliance enhancement intervention. 7. Percentage of heavy drinking days.
State the seven properties of a normal curve.
1. The normal curve is symmetric about its mean, µ. 2. Because mean=median=mode, the normal curve has a single peak and the highest point occurs at x = µ. 3. The normal curve has inflection points at µ-σ and µ+σ. 4. The area under the normal curve is 1. 5. The area under the normal curve to the right of µ equals the area under the normal curve to the left of µ, which equals 1/2. 6. As x increases without bound (gets larger and larger), the graph approaches, but never reaches, the horizontal axis. As x decreases without bond (becomes more and more negative), the graph approaches, but never reaches, the horizontal axis. 7. The Empirical Rule: Approximately 68% of the area under the normal curve is between x = µ - σ and x = µ + σ, approximately 95% of the area is between x = µ - 2σ and x = µ + 2σ, and approximately 99.7% of the area is between x = µ - 3σ and x = µ + 3σ.
Parking at a large university has become a very big problem. University administrators are interested in determining the average parking time (e.g. the time it takes a student to find a parking spot) of its students. An administrator inconspicuously followed 130 students and carefully recorded their parking times. 1. Identify the population of interest to the university administration. 2. Identify the sample of interest to the university administration.
1. The parking times of the entire set of students that park at the university. 2. The parking times of the 130 students.
What are the criteria for testing hypotheses regarding the difference between two population means (dependent samples)?
1. The sample is obtained by simple random sampling or the data result from a matched-pairs design experiment. 2. The sample data are matched pairs (dependent). 3. The differences are normally distributed with no outliers or the sampling size, n, is large (n ≥ 30). 4. The sampled values are independent of each other. That is, the sample size is no more than 5% of the population size (n ≤ 05N).
What three conditions must be satisfied before testing a hypothesis regarding population proportion p?
1. The sample is obtained by simple random sampling or the data result from a randomized experiment. 2. The sample comes from a population that is normally distributed with no outliers, or the sample, n, is large (n ≥ 30). 3. The sampled values are independent of each other. This means that the sample size is no more than 5% of the population size (n≤0.05N).
What are the criteria to test the difference between two population proportions (independent sample)?
1. The samples are independently obtained using simple random sampling or the data result from a completely randomized experiment with two levels of treatment. 2. n₁p-hat₁(1 - p-hat₁) ≥10 and n₂p-hat₂(1 - p-hat₂) ≥10 3. The sampled values are independent of each other. This means that each sample size is no more than 5% of the population size (n₁ ≤ 0.05N₁ and n₂ ≤ 0.05N₂). This ensures the independence necessary for a binomial experiment.
What are the six properties of a t-distribution?
1. The t-distribution is different for different degrees of freedom. 2. The t-distribution is centered at 0 and is symmetric about 0. 3. The area under the curve is 1. The area under the curve to the right of 0 equals the area under the curve to the left of 0, which equals 1/2. 4. As t increases or decreases without bound, the graph approaches, but never equals, 0. 5. The area in the tails of the t-distribution is a little greater than the area in the tails of the standard normal distribution because we are using s as an estimate of σ, thereby introducing further variability into the t-statistic. 6. As the sample size n increases, the density curve of t gets closer to the standard normal density curve. This result occurs because as the sample size increases, the values of s get closer to the value of σ by the Law of Large Numbers. See the figure to the right, which shows the t-distribution for samples of size n=5 and n=15, along with the standard normal density curve.
The data on the right relate to characteristics of high-definition televisions A through E. Identify the individuals, variables, and data corresponding to the variables. Determine whether each variable is qualitative, continuous, or discrete. 1. What are the individuals being studied? 2. What are the variables and their corresponding data being studied? 3. Determine whether each variable is qualitative, continuous, or discrete.
1. The high-definition television setups A through E. 2. Size (57, 48, 55, 54, 60), screen type (Plasma, Projection, Projection, Projection, Plasma), and number of channels available (298, 108, 423, 270, 289) 3. Size is a continuous variable. Screen type is a qualitative variable. Number of channels available is a discrete variable.
What are the five steps for testing a hypothesis about a population mean, µ?
1. Two-tailed is H₀: µ = µ₀ and H₁: µ ≠ µ₀ Left-tailed is H₀: µ = µ₀ and H₁: µ < µ₀ Right-tailed is H₀: µ = µ₀ and H₁: µ > µ₀ 2. Select a level of significance, α, depending on the seriousness of making a Type I Error. 3. Use P-value and test statistic (1 population-mean) on calculator to obtain the P-value: a. STAT-->Edit-->Type data into L₁ (Freq: 1) b. Check for normality and outliers. c. STAT-->TEST-->T-Test Inpt: Stats if you have mean and standard deviation. µ₀ = null hypothesis mean overbar x = sample data mean Sx = sample standard deviation n = sample size µ: ≠, <, > 4. If p-value < α, reject the null hypothesis. If p-value > α, do not reject. 5. State the conclusion.
What are the 5 steps for testing a hypothesis regarding the difference between two population means (dependent samples)?
1. Two-tailed is H₀: µ = µ₀ and H₁: µ ≠ µ₀ Left-tailed is H₀: µ = µ₀ and H₁: µ < µ₀ Right-tailed is H₀: µ = µ₀ and H₁: µ > µ₀ 2. Select a level of significance, α, depending on the seriousness of making a Type I Error. 3. Use P-value and test statistic (2 populations-mean) on calculator to obtain the P-value: a. STAT-->EDIT-->Type data into L1 and L2.TEST-->2-PropZTest (x1: # of successes, n1: # of sample in first proportion, x2: # of successes in second proportion, n2: # of sample in second proportion, p1: ≠, <, >) 4. If p-value < α, reject the null hypothesis. If p-value > α, do not reject. 5. State the conclusion.
What are the five steps for testing a hypothesis regarding the difference between two population proportions (independent sample)?
1. Two-tailed is H₀: µ = µ₀ and H₁: µ ≠ µ₀ Left-tailed is H₀: µ = µ₀ and H₁: µ < µ₀ Right-tailed is H₀: µ = µ₀ and H₁: µ > µ₀ 2. Select a level of significance, α, depending on the seriousness of making a Type I Error. 3. Use P-value and test statistic (2 populations-proportion) on calculator to obtain the P-value: a. STAT-->TEST-->2-PropZTest (x1: # of successes, n1: # of sample in first proportion, x2: # of successes in second proportion, n2: # of sample in second proportion, p1: ≠, <, >) 4. If p-value < α, reject the null hypothesis. If p-value > α, do not reject. 5. State the conclusion.
Find the z-scores that separate the middle 20% of the distribution from the area in the tails of the standard normal distribution.
100 - 20 = 80 80/2 = 40% in each tail invNorm (0.40, 0, 1) = -0.25 invNorm ([0.40 + 0.20 = 0.60], 0, 1) = 0.25
What does it mean when the 10th percentile of the weight of males 36 months of age in a certain city is 13.0 kg?
10% of 36-month-old males weigh 13.0 kg or less, and 90% of 36-month-old males weigh more than 13.0 kg.
If a variable has a distribution that is bell-shaped with mean 22 and standard deviation 3, then according to the Empirical Rule, 99.7% of the data will lie between which values?
13 (22-3-3-3) and 31 (22+3+3+3)
The following is a summary of the violent-crime rate (violent crimes per 100,000 population) for all states of a country in a certain year. Q1 = 271.8 Q2 =387.4 Q3 =529.7 What do these numbers mean?
25% of the states have a violent-crime rate that is 271.8 crimes per 100,000 population or less. 50% of the states have a violent-crime rate that is 387.4 crimes per 100,000 population or less. 75% of the states have a violent-crime rate that is 529.7 crimes per 100,000 population or less.
According to the Empirical Rule, if a distribution is bell-shaped, then approximately _______ of the data will lie within 1 standard deviation of the mean; approximately _______ of the data will lie within 2 standard deviations of the mean; approximately _______ of the data will lie within 3 standard deviations of the mean.
68%, 95%, 99.7%
When α = 0.003, we are constructing a ______ confidence interval.
99.7%
A researcher wants to show the mean from population 1 is less than the mean from population 2 in matched-pairs data. If the observations from sample 1 are Xi and the observations from sample 2 are Yi, and di=Xi−Yi, then the null hypothesis is H0: μd=0 and the alternative hypothesis is H1: μd ___ 0.
<
What is a closed question and an open question?
A closed question has fixed choices for answers, whereas an open question is a free-response question.
What is a confounding variable?
A confounding variable is an explanatory variable that was considered in a study whose effect cannot be distinguished from a second explanatory variable in the study.
What is a designed experiment?
A designed experiment is when a researcher assigns individuals to a certain group, intentionally changing the value of an explanatory variable, and then recording the value of the response variable for each group.
What is a binomial probability distribution?
A discrete probability distribution that describes probabilities for experiments in which there are two mutually exclusive (disjoint) qualitative outcomes. The two outcomes are generally referred to as success and failure. Experiments in which only two outcomes are possible are referred to as binomial experiments, provided that certain criteria are met.
What is a frame?
A frame is a list of the individuals in the population being studied.
What is an ogive?
A graph that represents the cumulative frequency or cumulative relative frequency for the class.
What is a lurking variable?
A lurking variable is an explanatory variable that was not considered in a study, but that affects the value of the response variable in the study. In addition, lurking variables are typically related to explanatory variables in the study.
What is the linear correlation coefficient?
A measure of the strength and direction of the linear relationship between two variables.
Explain the difference between a population and a sample.
A population is the entire group that is being studied while a sample is a subset of the population that is being studied.
What is the sampling distribution of a statistic?
A probability distribution for all possible values of the statistic computed from a sample of size n.
What does it mean when an observational study is prospective?
A prospective study collects the data over time.
What does it mean when an observational study is retrospective?
A retrospective study requires that individuals look back in time or require the researcher to look at existing records.
Explain the difference between an independent and dependent sample.
A sample is independent when an individual selected for one sample does not dictate which individual is to be in the second sample. A sample is dependent when an individual selected for one sample dictates which individual is to be in the second sample. Dependent samples are often referred to as matched-pairs samples.
Define simple random sampling.
A sample of size n from a population of size N is obtained through simple random sampling if every possible sample of size n has an equally likely chance of occurring. The sample is then called a simple random sample.
What is a factor?
A variable whose effect on the response variable is to be assessed by the experimenter.
Some have argued that throwing darts at the stock pages to decide which companies to invest in could be a successful stock-picking strategy. Suppose a researcher decides to test this theory and randomly chooses 100 companies to invest in. After 1 year, 53 of the companies were considered winners; that is, they outperformed other companies in the same investment class. To assess whether the dart-picking strategy resulted in a majority of winners, the researcher tested H₀: p=0.5 versus H₁: p>0.5 and obtained a P-value of 0.2743. Explain what this P-value means and write a conclusion for the researcher. (Assume α is 0.1 or less.)
About 27 in 100 samples will give a sample proportion as high or higher than the one obtained if the population proportion really is 0.5. Because the P-value is large, do not reject the null hypothesis. There is not sufficient evidence to conclude that the dart-picking strategy resulted in a majority of winners.
What can be said about a set of data with a standard deviation of 0?
All of the observations are the same value. If all observations have the same value, then that value will also be the mean of the data. Therefore, the sum of the squared differences from the mean will be 0, and the standard deviation will be 0.
Define the complement of an event E.
All of the outcomes in the sample space that are not outcomes in the event E.
What is the sampling distribution of the sample mean overbar x?
All possible values of the random variable computed from a sample size of n from a population with the mean µ and standard deviation σ.
The cumulative relative frequency for the last class must always be 1. Why?
All the observations are less than or equal to the last class.
What is an advantage that the standard deviation has over the interquartile range?
An advantage of the standard deviation is that it uses all the observations in its computation.
What does it mean for an event to be unusual?
An event is unusual if it has a low probability of occurring.
A confidence interval for an unknown parameter consists of what?
An interval of numbers based on a point estimate.
What is an observational study?
An observational study measures the value of the response variable without attempting to influence the value of either the response or explanatory variables.
What is a a treatment?
Any combination of the values of the factors (explanatory variables).
Spread (σ sub-overbar x): As the sample size n increases, what happens to the standard deviation of the distribution of the sample mean?
As n increases, the standard deviation will decrease.
Explain the Law of Large Numbers. How does this law apply to gambling casinos?
As the number of repetitions of a probability experiment increases, the proportion with which a certain outcome is observed gets closer to the probability of the outcome. This applies to casinos because they are able to make a profit in the long run because they have a small statistical advantage in each game.
State the requirements to perform a goodness-of-fit test.
At least 80% of expected frequencies ≥ 5. All expected frequencies ≥ 1.
Explain what a P-value is. What is the criterion for rejecting the null hypothesis using the P-value approach?
A P-value is the probability of observing a sample statistic as extreme or more extreme than the one observed under the assumption that the statement in the null hypothesis is true. If P-value<α, reject the null hypothesis.
What is an experimental unit?
A person, object, or some other well-defined item upon which a treatment is applied.
When constructing 95% confidence intervals for the mean when the parent population is right skewed and the sample size is small, the proportion of intervals that include the population mean is _______ 0.95.
Below
Brad and Allison have three girls. Brad tells Allison that he would like one more child because they are due to have a boy. What do you think of Brad's logic?
Brad is incorrect due to the nonexistent Law of Averages. The fact that Brad and Allison had three girls in a row does not matter. The likelihood the next child will be a boy is about 0.5.
What is a case-control study?
Case-control studies are observational studies that are retrospective, meaning that they require individuals to look back in time or require the researcher to look at existing records.
The _________________ is the difference between consecutive lower class limits.
Class width
_______ are the categories by which data are grouped.
Classes
Why shouldn't classes overlap when summarizing continuous data in a frequency or relative frequency distribution?
Classes shouldn't overlap so there is no confusion as to which class an observation belongs.
What are the advantages and disadvantages of both closed and open questions?
Closed questions are easier to analyze, but limit the responses. Open questions allow respondents to state exactly how they feel, but are harder to analyze due to the variety of answers and possible misinterpretation of answers.
A travel industry researcher interviews all of the passengers on five randomly selected cruises. What sampling technique is used?
Cluster
To determine customer opinion of their check-in service, American Airlines randomly selects 90 flights during a certain week and surveys all passengers on the flights. What type of sampling is used?
Cluster
A(n) _______ is obtained by dividing the population into groups and selecting all individuals from within a random sample of the groups.
Cluster sample
What is meant by confounding?
Confounding in a study occurs when the effects of two or more explanatory variables are not separated. Therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable or variables not accounted for in the study.
A internet site asks its members to call in their opinion regarding their reluctance to provide credit information online. What type of sampling is used?
Convenience
A statistics student interviews everyone in his apartment building to determine who owns a cell phone. What sampling technique is used?
Convenience
In a recent online survey, participants were asked to answer "yes" or "no" to the question "Are you in favor of stricter gun control?" 6571 responded "yes" while 4237 responded "no". There was a fifty-cent charge for the call. What sampling technique was used?
Convenience
What is the value z v α/2 called?
Critical value. It represents the number of standard deviations the sample statistic can be from the parameter and still result in an interval that includes the parameter.
What is a cross-sectional study?
Cross-sectional studies are observational studies that collect information about individuals at a specific point in time or over a very short period of time.
Find the critical values χ²1−α/2 and χ²α/2 for a 99% confidence level and a sample size of n=20.
Degrees of freedom: 20-1 = 19 α = 1-0.99 = 0.01 α/2 = 0.01/2 = 0.005 χ²1−α/2 = 0.005 χ²α/2 = 0.995 Check chi-square table for critical values. 0.995 with 19 degrees of freedom = 6.844 0.005 with 19 degrees of freedom = 38.582
_______ statistics consists of organizing and summarizing information collected, while _______ statistics uses methods that generalize results obtained from a sample to the population and measure the reliability of the results.
Descriptive, inferential
Which (observational study or a designed experiment) allows the researcher to claim causation between an explanatory variable and a response variable?
Designed experiment
Discrete or continuous: Number of cars owned.
Discrete because it is countable.
Discrete or continuous: Number of words in a poem.
Discrete because it is countable.
How is margin of error (E) calculated?
E = (upper bound - lower bound)/2
What is the formula for the expected number of successes in a binomial experiment with n trials and probability of success p?
E(x) = np
There are two college entrance exams that are often taken by students, Exam A and Exam B. The composite score on Exam A is approximately normally distributed with mean 20.6 and standard deviation 5.2. The composite score on Exam B is approximately normally distributed with mean 1016 and standard deviation 209. Suppose you scored 23 on Exam A and 1243 on Exam B. Which exam did you score better on? Justify your reasoning using the normal model.
Exam A: normalcdf (-9999999, 27, 20.6, 5.2) = 0.91 Exam B: normalcdf (-9999999, 1243, 1016, 209) = 0.86 The score on Exam A is better, because the percentile for the Exam A score is higher.
In probability, a(n) ________ is any process that can be repeated in which the results are uncertain.
Experiment
In a scatter diagram, the _______ variable is plotted on the horizontal axis and the _______ variable is plotted on the vertical axis.
Explanatory, response
In statistical studies, researchers want to determine how varying one or more _______ variables may impact the value of a(n) _______ variable.
Explanatory, response
What does it mean if a statistic is resistant?
Extreme values (very large or small) relative to the data do not affect its value substantially.
T/F: When a factor is controlled by setting it to three levels, the particular factor is of no interest to the researcher.
False, because a factor that is controlled and set at various levels is a factor of interest to the researcher.
T/F: If r is close to 0, then little or no evidence exists of a relation between the two quantitative variables.
False. A value of r close to zero does not imply no relation, just no linear relation.
T/F: A data set will always have exactly one mode.
False. The mode of a variable is the most frequent observation of the variable that occurs in the data set. To compute the mode, tally the number of observations that occur for each data value. The data value that occurs most often is the mode. A set of data can have no mode, one mode, or more than one mode. If no observation occurs more than once, the data have no mode.
T/F: The standard deviation can be negative.
False. There is no way that the calculation of the population or sample standard deviation can produce a negative number. This makes intuitive sense because the standard deviation measures the spread of the data from the mean.
T/F: When two events are disjoint, they are also independent.
False. Two events are disjoint if they have no outcomes in common. In other words, the events are disjoint if, knowing that one of the events occurs, we know the other event did not occur. Independence means that one event occurring does not affect the probability of the other event occurring. Therefore, knowing two events are disjoint means that the events are not independent.
T/F: A 95% confidence interval may be interpreted by saying there is a 95% probability that the interval includes the unknown parameter.
False. A 95% confidence interval does not mean that there is a 95% probability that the interval contains the parameter. The 95% in a 95% confidence interval represents the proportion of all samples that will result in intervals that include the population proportion.
T/F: The standard deviation is a resistant measure of spread.
False. Since extreme values will increase the standard deviation greatly, the standard deviation cannot be a resistant measure of spread.
T/F: In statistics, results are always reported with 100% certainty.
False. In statistics, results are not reported with 100% certainty. Because statistical studies draw on samples, and because there is variation within groups, results cannot be reported with 100% certainty.
T/F: Statistical studies are not concerned with understanding the sources of variability in data, only with describing the variability in the data.
False. Statistical studies are concerned with both describing the variability in the data and understanding the sources of variability in data. Understanding the sources allows researchers to control it and reach better conclusions.
T/F: Stem-and-leaf plots are particularly useful for large sets of data.
False. Stem-and-leaf plots lose their usefulness when data sets are large or when they consist of a large range of values.
T/F: The chi-square distribution is symmetric.
False. The chi-square distribution is skewed to the right.
T/F: When obtaining a stratified sample, the number of individuals included within each stratum must be equal.
False. Within stratified samples, the number of individuals sampled from each stratum should be proportional to the size of the strata in the population.
What is the difference between y-coordinates for a frequency polygon and a frequency ogive?
Frequency polygon: Class frequencies Frequency ogive: Cumulative frequencies
What is the difference between x-coordinates for a frequency polygon and a frequency ogive?
Frequency polygon: Class midpoints Frequency ogive: Upper class limits
One graph in the figure represents a normal distribution with mean μ=8 and standard deviation σ=2. The other graph represents a normal distribution with mean μ=14 and standard deviation σ=2. Determine which graph is which and explain how you know.
Graph A has a mean of μ=8 and graph B has a mean of μ=14 because a larger mean shifts the graph to the right.
The U.S. Department of Housing and Urban Development (HUD) uses the median to report the average price of a home in the United States. Why do you think HUD uses the median?
HUD uses the median because the data are skewed to the right, and the median is better for skewed data.
According to a report, the standard deviation of monthly cell phone bills was $6.11 three years ago. A researcher suspects that the standard deviation of monthly cell phone bills is less today. Determine the null and alternative hypotheses.
H₀: σ = $6.11 H₁: σ < $6.11
How are upper and lower bounds calculated?
Lower bound: overbar x - t v α/2 * (s/√n), or point estimate - margin of error Upper bound: overbar x + t v α/2 * (s/√n), or point estimate + margin of error
If we do not reject the null hypothesis when the statement in the alternative hypothesis is true, we have made a Type _______ error.
II. A Type I error occurs if the null hypothesis is rejected when, in fact, the null hypothesis is true. A Type II error occurs if the null hypothesis is not rejected when, in fact, the alternative hypothesis is true.
What does "90% confidence" mean in a 90% confidence interval?
If 100 different confidence intervals are constructed, each based on a different sample of size n from the same population, then we expect 90 of the intervals to include the parameter and 10 to not include the parameter.
What does it mean when a residual is positive?
If it is positive, then the observed value is greater than the predicted value.
Explain the difference between a single-blind and a double-blind experiment.
In a single-blind experiment, the subject does not know which treatment is received. In a double-blind experiment, neither the subject nor the researcher in contact with the subject knows which treatment is received.
The _______ represents the expected proportion of intervals that will contain the parameter if a large number of different samples of size n is obtained. It is denoted _______.
Level of confidence, (1-α) * 100%
For a distribution that is skewed left, the left whisker is _______ the right whisker.
Longer than
The ______ class limit is the smallest value within the class and the ______ class limit is the largest value within the class.
Lower, upper
What is the difference between univariate data and bivariate data?
In univariate data, a single variable is measured on each individual. In bivariate data, two variables are measured on each individual.
A sampling method is _______ when an individual selected for one sample does not dictate which individual is to be in the second sample.
Independent
Two events E and F are ________ if the occurrence of event E in a probability experiment does not affect the probability of event F.
Independent
A(n) _________ is a person or object that is a member of the population being studied.
Individual
A continuous random variable has _______ values.
Infinitely many
Which of the following are resistant measures of dispersion?
Interquartile Range
Level of measurement: Years of elections: 1988, 1990, 1992, 1994, and 1996.
Interval
What is the shape of the distribution of the sample mean as the sample size increases?
It becomes approximately normal as the sample size, n, increases, regardless of the shape of the underlying population.
What is the shape of the distribution of the sample proportion as the sample size increases?
It becomes approximately normal.
Spread: As the sample size n increases, what happens to the standard deviation of the distribution of the sample proportion?
It decreases.
As the sample size n increases, what happens to the margin of error?
It decreases. If the sample size is quadrupled, the margin of error will be cut in half.
Center: As the sample size n increases, what does the mean of the sampling distribution of the sample proportion equal?
It equals the population, p.
What does it mean to say that a continuous random variable is normally distributed?
It is bell-shaped.
Center: As the sample size n increases, what does the mean of the distribution of the sample mean, overbar x, equal?
It is equal to the mean of the underlying population. (The center is not affected.)
What happens to a graph as the mean increases or decreases?
It shifts right or shifts left.
What happens to a graph as the standard deviation increases or decreases?
It widens and flattens out or narrows and the peak rises.
Two researchers, Jaime and Mariya, are each constructing confidence intervals for the proportion of a population who is left-handed. They find the point estimate is 0.26. Each independently constructed a confidence interval based on the point estimate, but Jaime's interval has a lower bound of 0.204 and an upper bound of 0.301, while Mariya's interval has a lower bound of 0.258 and an upper bound of 0.262. Which interval is wrong? Why?
Jaime's interval is wrong because it is not centered on the point estimate.
For a distribution that is skewed right, the median is _______ of the box.
Left of the center
The null and alternative hypotheses are given. Determine whether the hypothesis test is left-tailed, right-tailed, or two-tailed. What parameter is being tested? H₀: μ=2 H₁: μ<2
Left-tailed test. The parameter being tested is the population mean (µ).
Katrina wants to estimate the proportion of adults who read at least 10 books last year. To do so, she obtains a simple random sample of 100 adults and constructs a 95% confidence interval. Matthew also wants to estimate the proportion of adults who read at least 10 books last year. He obtains a simple random sample of 400 adults and constructs a 99% confidence interval. Assuming both Katrina and Matthew obtained the same point estimate, whose estimate will have the smaller margin of error? Justify your answer.
Matthew's estimate will have the smaller margin of error because the larger sample size more than compensates for the higher level of confidence.
What value is associated with the peak of a normal curve?
Mean
The standard deviation is used in conjunction with the ______ to numerically describe distributions that are bell shaped. The ______ measures the center of the distribution, while the standard deviation measures the ______ of the distribution.
Mean, mean, spread
List the formulas for the mean, standard deviation, and checking for independence of the sampling distribution of overbar x.
Mean: µ sub-overbar x = µ Standard deviation: σ sub-overbar x = σ/√n Independence: n≤0.05N
What are the formulas for the mean and standard deviation of the sampling distribution of ^p.
Mean: µ sub-p-hat = p Standard deviation: σ sub-p-hat = √(p(1-p)/n)
What are the formulas for the mean (expected value) and standard deviation of a binomial random variable?
Mean: µx = np Standard deviation: σx = square root of np(1-p)
Which of the following are resistant measures of central tendency?
Median
What values make up the five-number summary?
Minimum value, Q1, Median, Q3, Maximum value
Which is the superior observational study between cross-sectional and case-control? Why?
Neither study is always the superior to the other. Both have advantages and disadvantages that depend on the situation.
What does it mean if r=0?
No linear relationship exists between the variables.
What are the requirements for the chi-square test for independence?
No more than 20% of the expected counts are less than 5. All expected counts are greater than 1.
According to a center for disease control, the probability that a randomly selected person has hearing problems is 0.153. The probability that a randomly selected person has vision problems is 0.088. Can we compute the probability of randomly selecting a person who has hearing problems or vision problems by adding these probabilities? Why or why not?
No, because hearing and vision problems are not mutually exclusive. So, some people have both hearing and vision problems. These people would be included twice in the probability.
When the results of a hypothesis test are determined to be statistically significant, then we _______________ the null hypothesis.
Reject
Researchers wanted to know if there is a link between proximity to high-tension wires and the rate of leukemia in children. To conduct the study, researchers compared the rate of leukemia for children who lived within 1/2 mile of high-tension wires to the rate of leukemia for children who did not live within 1/2 mile of high-tension wires. The researchers found that the rate of leukemia for children near high-tension wires was higher than the rate for those not near high-tension wires. Can the researchers conclude that proximity with high-tension wires causes leukemia in children?
No, because this is an observational study.
A simple random sample of size n=31 is obtained from a population that is skewed left with μ=70 and σ=99. Does the population need to be normally distributed for the sampling distribution of overbar x to be approximately normally distributed? Why? What is the sampling distribution of overbar x?
No. The central limit theorem states that regardless of the shape of the underlying population, the sampling distribution of overbar x becomes approximately normal as the sample size, n, increases.
Surveys tend to suffer from low response rates. Based on past experience, a researcher determines that the typical response rate for an e-mail survey is 40%. She wishes to obtain a sample of 300 respondents, so she e-mails the survey to 1500 randomly selected e-mail addresses. Assuming the response rate for her survey is 40%, will the respondents form an unbiased sample?
No. The survey still suffers from undercoverage (sampling bias), nonresponse bias, and potentially response bias.
Suppose that a probability is approximated to be zero based on empirical results. Does this mean that the event is impossible?
No. When a probability is based on an empirical experiment, a probability of zero does not mean that the event cannot occur. The probability of an event E is approximately the number of times event E is observed divided by the number of repetitions of the experiment, as shown below. Just because the event is not observed, does not mean that the event is impossible.
Level of measurement: Favorite type of music.
Nominal
The _______ hypothesis, denoted H0, is a statement to be tested, and is a statement of no change, no effect, or no difference.
Null
A frequency distribution lists the _______ of occurrences of each category of data, while a relative frequency distribution lists the _______ of occurrences of each category of data.
Number, proportion
How do you calculate residual?
Observed y minus predicted y.
Which of the following scenarios could be analyzed using a randomization approach for two sample proportions?
Obtaining a random sample of Republicans and an independent random sample of Democrats. Ask each sample, "Do you approve or disapprove of the job the president of the United States is doing?" Obtaining 300 volunteers suffering from a skin rash and randomly dividing them into two groups. Group 1 receives an experimental drug once a week for 10 weeks; group 2 receives a placebo once a week for 10 weeks. After the ten weeks, it is determined whether the skin rash cleared up, or not.
What are some solutions to nonresponse?
Offer rewards and incentives, attempt callbacks.
What does it mean when sampling is done without replacement?
Once an individual is selected, the individual is removed from the possible choices for that sample and cannot be chosen again.
A trade magazine routinely checks the drive-through service times of fast-food restaurants. A 95% confidence interval that results from examining 579 customers in one fast-food chain's drive-through has a lower bound of 160.1 seconds and an upper bound of 163.1 seconds. What does this mean?
One can be 95% confident that the mean drive-through service time of this fast-food chain is between 160.1 seconds and 163.1 seconds.
What value is associated with the inflection points of a normal curve?
One standard deviation above or below the mean.
How do you calculate the probability that between 8 and 10 flights, inclusive, are on time?
P(8≤x≤10) = P(x≤10) - P(x≤[8-1])
Suppose that events E and F are independent, P(E)=0.6, and P(F)=0.6. What is the P(E and F)?
P(E and F) = P(E) * P(F) = 0.6*0.6 = 0.36
If P(E)=0.50, P(E or F)=0.60, and P(E and F)=0.05,find P(F). If P(E)=0.60, P(E or F)=0.70 and P(E and F)=0.10 Find P(F)?
P(E or F) - P(E) + P(E and F) = P(F) 0.60 - 0.50 + 0.05 = P(F) = 0.15
Find the probability of the indicated event if P(E)=0.20 and P(F)=0.45: Find P(E or F) if P(E and F)=0.05.
P(E or F) = P(E) + P(F) - P(E and F) 0.20 + 0.45 - 0.05 = 0.60 P(E or F) = 0.60
If E and F are disjoint events, then P(E or F) = _______.
P(E) + P(F)
What is the Complement Rule?
P(E^c) = 1 - P(E)
A(n) _______ is a numerical summary of a population.
Parameter
Determine whether the underlined value is a parameter or a statistic: Mark retired from competitive athletics last year. In his career as a sprinter he had competed in the 100-meters event a total of 328 times. His average time for these 328 races was 11 seconds.
Parameter
Determine whether the underlined value is a parameter or a statistic: 51.3% of the residents of a certain city are female.
Parameter
A ________ ________ is the value of a statistic that estimates the value of a parameter.
Point estimate
Suppose a polling agency reported that 45.0% of registered voters were in favor of raising income taxes to pay down the national debt. The agency states that results are based on telephone interviews with a random sample of 1017 registered voters. Suppose the agency states the margin of error for 90% confidence is 2.6%. Determine and interpret the confidence interval for the proportion of registered voters who are in favor of raising income taxes to pay down the national debt.
Point estimate ± margin of error: 0.45 - 0.026 = 0.424 0.45 + 0.026 = 0.476 We are 90% confident that the proportion of registered voters in favor of raising income taxes to pay down the national debt is between 0.424 and 0.476.
What is interquartile range (IQR)?
Q3 minus Q1. The range of the middle 50% of the participants.
Qualitative or quantitative: Country of residence.
Qualitative because it is an attribute characteristic.
What type of data is needed to construct a confidence interval for a population proportion p?
Qualitative data with two outcomes (success or failure).
For what type of variable does it make sense to construct a confidence interval about a population proportion?
Qualitative with 2 possible outcomes.
What type of variable is required when drawing a time-series plot?
Quantitative
Qualitative or quantitative: Miles per hour at which a car is traveling.
Quantitative because it is a numerical measure.
What type of data is needed to construct a confidence interval for a population mean µ?
Quantitative data
The word _______ suggests an unpredictable result or outcome.
Random
A _______ represents scenarios where the outcome of any particular trial of an experiment is unknown, but the proportion (or relative frequency) a particular outcome is observed approaches a specific value.
Random process
What are the requirements to perform a one-way ANOVA? Is the test robust?
Requirements: The k samples must be independent of each other; that is, the subjects in one group cannot be related in any way to subjects in a second group. The populations must have the same variance; that is, each treatment group has population variance σ². The populations must be normally distributed. There must be k simple random samples, one from each of k populations or a randomized experiment with k treatments. Is the test robust? Yes, small departures from the normality requirement do not significantly affect the results.
The manufacturer of hardness testing equipment uses steel-ball indenters to penetrate metal that is being tested. However, the manufacturer thinks it would be better to use a diamond indenter so that all types of metal can be tested. Because of differences between the two types of indenters, it is suspected that the two methods will produce different hardness readings. The metal specimens to be tested are large enough so that two indentions can be made. Therefore, the manufacturer uses both indenters on each specimen and compares the hardness readings. Construct a 95% confidence interval to judge whether the two indenters result in different measurements. Note: A normal probability plot and boxplot of the data indicate that the differences are approximately normally distributed with no outliers. (a) Construct a 95% confidence interval to judge whether the two indenters result in different measurements, where the differences are computed as 'diamond minus steel ball'. (b) State the appropriate conclusion.
STAT-->EDIT-->Steel ball in L1, Diamond in L2, (L2-L1) in L3. STAT-->TESTS-->TInterval (Inpt=Data, List=L3, Freq=1, C-Level=0.95) Lower level: 0.3 Upper level: 3.1 (b) There is sufficient evidence to conclude that the two indenters produce different hardness readings.
A survey asked, "How many tattoos do you currently have on your body?" Of the 1239 males surveyed, 178 responded that they had at least one tattoo. Of the 1010 females surveyed, 133 responded that they had at least one tattoo. Construct a 99% confidence interval to judge whether the proportion of males that have at least one tattoo differs significantly from the proportion of females that have at least one tattoo. Interpret the interval.
STAT-->TESTS-->2-PropZInt (x1=178, n1=1239, x2=133, n2=1010, C-level=0.99) The lower bound is −0.026. The upper bound is 0.050. There is 99% confidence that the difference of the proportions is in the interval. Conclude that there is insufficient evidence of a significant difference in the proportion of males and females that have at least one tattoo.
Construct a confidence interval for p₁−p₂ at the given level of confidence. x₁=29 n₁=254 x₂=39 n₂=288 90% confidence
STAT-->TESTS-->2-PropZInt (x1=29, n1=254, x2=39, n2=288, C-level=0.9) The researchers are 90% confident the difference between the two population proportions, p₁−p₂, is between −0.068 and 0.025.
The _____ _____, denoted ^p, is given by the formula ^p=_____, where x is the number of individuals with a specified characteristic in a sample of n individuals.
Sample, Proportion, x/n
Suppose that a radio station predicted that Candidate A would defeat Candidate B in a certain election. They conducted a poll of home owners with a response rate of 25%. On the basis of the results, the radio station predicted that Candidate A would win with 57% of the popular vote. However, Candidate B won the election with about 62% of the popular vote. At the time of this poll, most home owners belonged to the party of Candidate A. Name two biases that led to this incorrect prediction.
Sampling bias: Using an incorrect frame led to undercoverage. Nonresponse bias: The low response rate caused bias.
How do you fill in an ANOVA table?
See image.
Based on 12,500 responses from 37,000 questionnaires sent to all its members, a major medical association estimated that the annual salary of its members was $77,000 per year. What sampling technique was used?
Simple random
Sony wants to administer a satisfaction survey to its current customers. Using their customer database, the company randomly selects 80 customers and asks them about their level of satisfaction with the company. What type of sampling is used?
Simple random
_______ is a technique used to recreate a random event.
Simulation
An experiment in which the experimental unit (or subject) does not know which treatment he or she is receiving is called a ________________.
Single-blind experiment
For annual household incomes in a country, state whether you would expect a histogram of the data to be bell-shaped, uniform, skewed left, or skewed right.
Skewed right
Determine the area under the standard normal curve that lies to the left of: (a) Z = 0.81 (b) Z = 1.39 (c) Z = 0.01 (d) Z = 0.95
Standard normal curve always has µ = 0 and σ = 1 For area to the left: (a) normalcdf (-9999999, 0.81, 0, 1) = 0.7910 (b) normalcdf (-9999999, 1.39, 0, 1) = 0.9177 (c) normalcdf (-9999999, 0.01, 0, 1) = 0.5040 (d) normalcdf (-9999999, 0.95, 0, 1) = 0.8289
A(n) _______ is a numerical summary of a sample.
Statistic
Determine whether the underlined value is a parameter or a statistic: Telephone interviews of 389 employees of a large electronics company found that 75% were dissatisfied with their working conditions.
Statistic
Determine whether the underlined numerical value is a parameter or a statistic: In a poll of a sample of 12,000 adults in a certain city, 12% said they left for work before 6 a.m.
Statistic, because the data set of a sample of 12,000 adults in a city is a sample.
Explain what "statistical significance" means.
Statistical significance means that the result observed in a sample is unusual when the null hypothesis is assumed to be true.
Explain the difference between statistical significance and practical significance.
Statistical significance means that the sample statistic is not likely to come from the population whose parameter is stated in the null hypothesis. Practical significance refers to whether the difference between the sample statistic and the parameter stated in the null hypothesis is large enough to be considered important in an application.
A writer for an art magazine randomly selects and interviews fifty male and fifty female artists. What sampling technique is used?
Stratified
Thirty-five math majors, 48 music majors and 51 history majors are randomly selected from 222 math majors, 410 music majors and 252 history majors at the state university. What sampling technique isused?
Stratified
To determine her breathing rate, Samantha divides up her day into three parts: morning, afternoon, and evening. She then measures her breathing rate at 4 randomly selected times during each part of the day. What type of sampling is used?
Stratified
The closer r is to +1, the _______ the evidence is of _______ association between the two variables.
Stronger, positive
A t-distribution is _______.
Symmetric about 0.
To estimate the percentage of defects in a recent manufacturing batch, a quality control manager at Toyota selects every 15th car that comes off the assembly line starting with the ninth until she obtains a sample of 90 cars. What type of sampling is used?
Systematic
Why can the Empirical Rule be used to identify results in a binomial experiment?
The Empirical Rule can be used to identify results in binomial experiments when (1−p)≥10.
Why should the cutoff for identifying unusual events not always be 0.05?
The choice of a cutoff should consider the context of the problem.
Consider the following question from a recent poll: Thinking about how the social security issue might affect your vote for major offices, would you vote only for a candidate who shares your views on social security or consider a candidate's position on social security as just one of many important factors? [rotated] Why is it important to rotate the two choices presented in the question?
The choices need to be rotated to minimize response biases.
Suppose you are reading an article online when the following text appeared in a popup window: "Would you be interested in participating in a short health-related survey? If you qualify and complete this survey, you will receive $1.00." What tactic is being used to increase the response rate for this survey?
The company is using a reward in the form of the $1.00 payment.
For the following, indicate whether a confidence interval for a proportion or mean should be constructed to estimate the variable of interest. A researcher with a golf association obtained a random sample of 25 rounds of golf on a Saturday morning and recorded the time it took to complete the round. The goal of the research was to estimate the amount of time it typically takes to complete a round of golf on Saturday morning.
The confidence interval for a population mean should be constructed because the variable of interest is time to complete the round, which is a quantitative variable.
Researchers within an organization asked a random sample of 1016 adults aged 21 years or older, "Right now, do you think the state of moral values in the country as a whole is getting better, or getting worse?"
The confidence interval for a proportion should be constructed because the variable of interest is an individual's opinion, which is a qualitative variable.
Assuming all model requirements for conducting the appropriate procedure have been satisfied, what proportion of registered voters is in favor of a tax increase to reduce the federal debt? Explain which statistical procedure would most likely be used for the research objective given.
The correct procedure is a confidence interval for a single proportion. The goal is to determine the proportion of the population that favors a tax increase. There is no comparison being made and there is only one population, so rather than hypothesis testing, it is appropriate to use a confidence interval.
Assuming all model requirements for conducting the appropriate procedure have been satisfied, is the mean IQ of the students in the professor's statistics class higher than that of the general population, 100? Explain what statistical procedure should be used for this research objective.
The correct procedure is a hypothesis test for a single mean. The comparison is between the mean IQ of the class and the national average IQ. The class is a sample of the population, so it is not a comparison between two population means. The objective is to find whether the sample mean is higher than the population mean, so it is a hypothesis test and not a confidence interval.
What is a residual?
The difference between an observed value of the response variable y and the predicted value of y.
Explain the differences between the chi-square test for independence and the chi-square test for homogeneity. What are the similarities?
The difference is that the chi-square test for independence compares two characteristics from one population and the chi-square test for homogeneity compares one characteristic from more than one population. Similarities: The assumptions are the same. The procedures are the same.
Suppose a simple random sample of size n is obtained from a population whose distribution is skewed right. As the sample size n increases, what happens to the shape of the distribution of the sample mean?
The distribution becomes approximately normal. According to the Central Limit Theorem, if the mean values for increasing sample sizes are obtained, the distribution of sample means will be normally distributed, even if the individual samples do not have normal distributions. Typically, sample sizes of 30 or greater are recommended.
What is confounding?
The effect of two factors (explanatory variables on the response variable) cannot be distinguished.
Describe the difference between classical and empirical probability.
The empirical method obtains an approximate empirical probability of an event by conducting a probability experiment. The classical method of computing probabilities does not require that a probability experiment actually be performed. Rather, it relies on counting techniques, and requires equally likely outcomes.
What is sampling error?
The error that results because a sample is being used to estimate information about a population.
What is nonsampling error?
The error that results from undercoverage, nonresponse bias, response bias, or data-entry errors.
Suppose there are n independent trials of an experiment with k>3 mutually exclusive outcomes, where pi represents the probability of observing the ith outcome. What would be the formula of an expected count in this situation?
The expected counts for each possible outcome are given by Ei=npi.
What is a randomized block design?
The experimental units are divided into homogeneous groups called blocks. Within each block, the experimental units are randomly assigned to treatments.
What is a matched-pairs design?
The experimental units are paired up. The pairs are selected so that they are related in some way (that is, the same person before and after a treatment, twins, husband and wife, same geographical location, and so on).
A study was conducted that resulted in the following relative frequency histogram. Determine whether or not the histogram indicates that a normal distribution could be used as a model for the variable.
The histogram is not bell-shaped, so a normal distribution could not be used as a model for the variable.
Explain the circumstances for which the interquartile range is the preferred measure of dispersion.
The interquartile range is preferred when the data are skewed or have outliers.
Which of the following is true about the least squares regression line, y = b₁x + b₀?
The least-squares regression line always contains the point (⁻x,⁻y) The predicted value of y, ^y, is an estimate of the mean value of the response variable for that particular value of the explanatory variable. The least-squares regression line minimizes the sum of squared residuals. The sign of the linear correlation coefficient, r, and the sign of the slope of the least-squares regression line, b₁, are the same.
What is the least-squares regression line?
The line of best fit; it minimizes the sum of the squared residuals.
A group conducted a poll of 2049 likely voters just prior to an election. The results of the survey indicated that candidate A would receive 45% of the popular vote and candidate B would receive 44% of the popular vote. The margin of error was reported to be 2%. The group reported that the race was too close to call. Use the concept of a confidence interval to explain what this means.
The margin of error suggests candidate A may receive between 43% and 47% of the popular vote and candidate B may receive between 42% and 46% of the popular vote. Because the poll estimates overlap when accounting for margin of error, the poll cannot predict the winner.
Why is the median resistant, but the mean is not?
The mean is not resistant because when data are skewed, there are extreme values in the tail, which tend to pull the mean in the direction of the tail. The median is resistant because the median of a variable is the value that lies in the middle of the data when arranged in ascending order and does not depend on the extreme values of the data.
A histogram of a set of data indicates that the distribution of the data is skewed right. Which measure of central tendency will likely be larger, the mean or the median? Why?
The mean will likely be larger because the extreme values in the right tail tend to pull the mean in the direction of the tail.
The mean incubation time of fertilized eggs is 22 days. Suppose the incubation times are approximately normally distributed with a standard deviation of 1 day. Determine the incubation times that make up the middle 95%.
The middle 95% is within two standard deviations above or below the mean. 22-1-1 = 20 and 22+1+1 = 24.
The level of confidence represents the expected proportion of intervals that will contain what?
The parameter if a large number of different samples is obtained.
What is the point estimate for the population mean?
The point estimate for the population mean, µ, is the sample mean, overbar x.
What are inflection points?
The points where the curvature of the graph changes.
In a certain card game, the probability that a player is dealt a particular hand is 0.47. Explain what this probability means. If you play this card game 100 times, will you be dealt this hand exactly 47 times? Why or why not?
The probability 0.47 means that approximately 47 out of every 100 dealt hands will be that particular hand. No, you will not be dealt this hand exactly 47 times since the probability refers to what is expected in the long-term, not short-term.
What does the area under the graph of a probability density function over an interval represent?
The probability of observing a value of the random variable in that interval.
What does a cumulative frequency distribution display?
The proportion (or percentage) of observations less than or equal to the class.
What is a response variable?
The quantitative or qualitative variable for which the experimenter wishes to determine how its value is affected by the explanatory variable.
For a distribution that is symmetric, the left whisker is _______ the right whisker.
The same length as
Determine whether the following sampling is dependent or independent. Indicate whether the response variable is qualitative or quantitative. A researcher wishes to compare academic aptitudes of pharmacists and non-pharmacists. She obtains a random sample of 288 professionals of each category who take an academic aptitude test and determines each individual's academic aptitude.
The sampling is independent because an individual selected for one sample does not dictate which individual is to be in the second sample. The variable is qualitative because it classifies the individual.
Why should correlations should always be reported with scatter diagrams?
The scatter diagram is needed to see if the correlation coefficient is being affected by the presence of outliers.
As the sample size n increases, what happens to the standard error of the mean?
The standard error of the mean decreases.
A poll is conducted by a school's English department in which tenth grade students are asked if they prefer to be in their English class or their science class. Choose the correct description of the study.
The study is an observational study because the study examines individuals in a sample, but does not try to influence the response variable.
A study is conducted to determine if there is a relationship between lung capacity and proximity to coal mines. A sample of 100 people living within 1 mile of a coal mine is collected and their lung capacity measured. Does the description correspond to an observational study or an experiment?
The study is an observational study because the study examines individuals in a sample, but does not try to influence the response variable.
Determine whether the underlined value is a parameter or a statistic: The average age of men who had walked on the moon was 39 years, 11 months, 15 days.
The value is a parameter because the men who had walked on the moon are a population.
Determine whether the underlined value is a parameter or a statistic: In a championship football game, a quarterback completed 59% of his passes for a total of 265 yards and 2 touchdowns.
The value is a parameter because the quarterback's passes are a population.
Whether a confidence interval contains the population parameter depends solely on what?
The value of the sample statistic. Any sample statistic that is in the tails of the sampling distribution will result in a confidence interval that does not include the population parameter.
What does it mean to say that two variables are negatively associated?
There is a linear relationship between the variables, and whenever the value of one variable increases, the value of the other variable decreases.
What does it mean to say that two variables are positively associated?
There is a linear relationship between the variables, and whenever the value of one variable increases, the value of the other variable increases.
Suppose the null hypothesis is not rejected. State the conclusion based on the results of the test. Six years ago, 11.9% of registered births were to teenage mothers. A sociologist believes that the percentage has increased since then.
There is not sufficient evidence to conclude that the percentage of teenage mothers has increased.
State the conclusion based on the results of the test. According to the report, the standard deviation of monthly cell phone bills was $48.58 three years ago. A researcher suspects that the standard deviation of monthly cell phone bills is higher today. The null hypothesis is not rejected.
There is not sufficient evidence to conclude that the standard deviation of monthly cell phone bills is higher than its level three years ago of $48.58.
Explain why the t-distribution has less spread as the number of degrees of freedom increases.
The t-distribution has less spread as the degrees of freedom increase because, as n increases, s becomes closer to σ by the law of large numbers. As the sample size n increases, the density curve of t gets closer to the standard normal density curve. The variability introduced into the t-statistic becomes less.
Determine if the following probability experiment represents a binomial experiment. If not, explain why. If the probability experiment is a binomial experiment, state the number of trials, n. Four cards are selected from a standard 52-card deck without replacement. The number of nines selected is recorded.
This is not a binomial experiment because the trials of the experiment are not independent since the probability of success differs from trial to trial.
Determine if the following probability experiment represents a binomial experiment. If not, explain why. If the probability experiment is a binomial experiment, state the number of trials, n. A random sample of 25 professional athletes is obtained, and the individuals selected are asked to state their hair length.
This is not a binomial experiment because there are more than two mutually exclusive outcomes for each trial.
Why is the following not a probability model?
This is not a probability model because at least one probability is less than 0.
Why do we draw time-series plots?
Time-series plots are used to identify trends in the data over time.
T/F: Randomization is used so that those factors not controlled in the experiment "average out" their effect on the response variable.
True
T/F: Generally, the goal of an experiment is to determine the effect that the treatment will have on the response variable.
True. An experiment is defined as a controlled study conducted to determine the effect varying one or more factors has on a response variable, and any combination of the values of the factors is called a treatment.
T/F: The expected frequencies in a chi-square test for independence are found using the formula below. Expected frequency=(row total * column total) / (table total)
True. It is a simplification of multiplying the proportion of a row variable by the proportion of the column variable to find the proportion for a cell, then multiplying by the table total.
T/F: Suppose three different individuals conduct the same statistical study, such as estimating the average commute time of students at a college. It is possible that all three studies end up with different results?
True. Statistical studies typically look at samples rather than entire populations. Since each study is likely to draw different samples, it is quite possible that each study ends up with different results, due to variability in the data.
T/F: The mean of the sampling distribution of ^p is p.
True. The mean of the sample distribution of the sampling proportion equals the population proportion, p.
_________ are the characteristics of the individuals of the population being studied.
Variables
What does it mean to say that we should not use the regression model to make predictions outside the scope of the model?
We should not use the regression model to make predictions for values of the explanatory variable (x) that are much larger or smaller than those observed.
When can the Empirical Rule be used to identify unusual results in a binomial experiment?
When the binomial distribution is approximately bell shaped, about 95% of the outcomes will be in the interval from μ−2σ to μ+2σ.
What is a chi-square test for independence used to determine?
Whether or not there is an association between a row variable and a column variable in a contingency table constructed from sample data. H₀: Variables are independent. H₁: Variables are dependent.
With a probability of 0.026, would you consider it unusual to find a college student who never wears a seat belt when riding in a car driven by someone else?
Yes, because P(never)<0.05.
Determine whether the distribution is a discrete probability distribution.
Yes, because the sum of the probabilities is equal to 1 and each probability is between 0 and 1, inclusive.
The head of institutional research at a university believed that the mean age of full-time students was declining. In 1995, the mean age of a full-time student was known to be 27.4 years. After looking at the enrollment records of all 4934 full-time students in the current semester, he found that the mean age was 27.1 years, with a standard deviation of 7.3 years. He conducted a hypothesis of H₀: μ=27.4 years versus H₁: μ<27.4 years and obtained a P-value of 0.0020. He concluded that the mean age of full-time students did decline. Is there anything wrong with his research?
Yes, the head of institutional research has access to the entire population, inference is unnecessary. He can say with 100% confidence that the mean age has decreased. There is no need for inferential claims when the population statistics have been determined. The average age of students in the year the study is being conducted is less than the average age of students in 1995.
The _______ represents the number of standard deviations an observation is from the mean.
Z-score
The sum of the deviations about the mean always equals _______.
Zero
In a trial of 163 patients who received 10-mg doses of a drug daily, 39 reported headache as a side effect. Obtain a point estimate for the population proportion of patients who received 10-mg doses of a drug daily and reported headache as a side effect.
^p = 39/163 = 0.24
In a survey of 1019 adults, a polling agency asked, "When you retire, do you think you will have enough money to live comfortably or not?" Of the 1019 surveyed, 534 stated that they were worried about having enough money to live comfortably in retirement. Construct a 99% confidence interval for the proportion of adults who are worried about having enough money to live comfortably in retirement.
^p = 534/1019 = 0.524 α = 0.005 (1-0.99 = 0.01/2 = 0.005) Critical value for 99% = 2.575 Lower Bound: 0.524-(2.576*√(0.524(1-0.524)/1019)) = 0.524-0.04030152 = 0.484 Upper Bound: 0.524+(2.576*√(0.524(1-0.524)/1019)) = 0.524+0.04030152 = 0.564 There is 99% confidence that the true proportion of worried adults is between 0.484 and 0.564.
Suppose a random sample of n=320 teenagers 13 to 17 years of age was asked if they use social media. Of those surveyed, 243 stated that they do use social media. Find the sample proportion of teenagers 13 to 17 years of age who use social media.
^p = x/n = 243/320 = 0.759
About 18% of the population of a large country is nervous around strangers. a) If two people are randomly selected, what is the probability both are nervous around strangers? b) What is the probability at least one is nervous around strangers?
a) 0.18*0.18 = 0.0324 b) 1-0.18 = 0.82 1-0.82*0.82 = 0.3276
Suppose Ari wins 44% of all bingo games. (a) What is the probability that Ari wins two bingo games in a row? (b) What is the probability that Ari wins six bingo games in a row? (c) When events are independent, their complements are independent as well. Use this result to determine the probability that Ari wins six bingo games in a row, but does not win seven in a row.
a) 0.44² = 0.1936 b) 0.44⁶ = 0.0073 c) 0.44⁶ * (1-0.44) = 0.0041
Under what condition is the shape of the sampling distribution of ^p approximately normal?
n*p(1-p) ≥ 10
A student entering a doctoral program in educational psychology is required to select two courses from the list of courses provided as part of his or her program. a) List all possible two-course selections. b) Comment on the likelihood that EPR 668 and EPR 695 will be selected.
a) 668 and 640; 683 and 640; 666 and 683; 668 and 695; 666 and 695; 666 and 668; 683 and 695; 683 and 668; 640 and 695; 666 and 640 b) There is a 1 in 10 chance that these courses will be selected.
Determine whether the events E and F are independent or dependent. Justify your answer. a) E: A person having an at-fault accident. F: The same person being prone to road rage. b) E: A randomly selected person accidentally killing a spider. F: A different randomly selected person accidentally swallowing a spider. c) E: The war in a major oil-exporting country. F: The price of gasoline.
a) E and F are dependent because being prone to road rage can affect the probability of a person having an at-fault accident. b) E cannot affect F and vice versa because the people were randomly selected, so the events are independent. c) The war in a major oil-exporting country could affect the price of gasoline, so E and F are dependent.
Is there a relation between the age difference between husband/wives and the percent of a country that is literate? Researchers found the least-squares regression between age difference (husband age minus wife age), y, and literacy rate (percent of the population that is literate), x, is y = −0.0498x + 6.8. The model applied for 17≤x≤100. a) Interpret the slope. b) Does it make sense to interpret the y-intercept? Explain. c) Predict the age difference between husband/wife in a country where the literacy rate is 30 percent. d) Would it make sense to use this model to predict the age difference between husband/wife in a country where the literacy rate is 12%? e) The literacy rate in a country is 97% and the age difference between husbands and wives is 1.5 years. Is this age difference above or below the average age difference among all countries whose literacy rate is 97%
a) For every unit increase in literacy rate (x), the age difference (y) falls by 0.0498 units, on average. b) No—it does not make sense to interpret the y-intercept because an x-value of 0 is outside the scope of the model. (17≤x≤100) c) y = (-0.0498*30) + 6.8 = 5.3 years d) No—it does not make sense because an x-value of 12 is outside the scope of the model. e) Below--the average age difference among all countries whose literacy rate is 97% is 2.0 years.
Researchers wanted to determine if having a computer in the bedroom is associated with obesity. The researchers administered a questionnaire to 355 twelve-year-old adolescents. After analyzing the results, the researchers determined that the body mass index of the adolescents who had a computer in their bedroom was significantly higher than that of the adolescents who did not have a computer in their bedroom. a) Why is this an observational study? What type of observational study is this? b) What is the response variable in the study? Is the response variable qualitative or quantitative? What is the explanatory variable? c) Can you think of any lurking variables that may affect the results of the study? d) In the report, the researchers stated, "These results remain significant after adjustment for socioeconomic status." What does this mean? e) Does a computer in the bedroom cause a higher body mass index? Explain.
a) It is an observational study because the researchers administered a questionnaire to obtain their data without trying to influence an explanatory variable of the study. The type of study is a cross-sectional study. b) The response variable is the body mass index of the adolescents. The response variable is quantitative. The explanatory variable is whether the adolescent has a computer in the bedroom or not. c) Yes. For example, possible lurking variables might be eating habits and the amount of exercise per week. d) The researchers made an effort to avoid confounding by accounting for potential lurking variables. e) No. It can only be said that a computer in the bedroom and obesity are associated because the body mass index of the adolescents who had a computer in their bedroom was significantly higher than that of the adolescents who did not have a computer in their bedroom.
A club wants to sponsor a panel discussion on the upcoming national election. The club wants four of its members to lead the panel discussion. Write a short description of the processes that can be used to generate your sample. Obtain a simple random sample of size 4 from the table. a) Which of the following would produce a simple random sample? b) The list of digits below is from a random number generator using technology. Use the list of numbers to obtain a simple random sample of size 44 from this list. If you start with the first number in the list, and take the first four numbers between 1 and 25, what four members would be selected from the numbered list? 12, 5, 7, 7, 23, 10, 21, 8, 19, 5, 23, 11, 3, 24, 10
a) List each name on a separate piece of paper, place them all in a hat, and pick four. Number the names from 1 to 25 and use a random number generator to produce 44 different numbers corresponding to the names. b) Lukens, Cooper, Engler, Williams
As part of a college literature course, students must read three classic works of literature from the provided list. Write a short description of the processes that can be used to generate a simple random sample of three books. Obtain a simple random sample of size 3 from this list. a) Which of the following would produce a simple random sample? b) The list of digits below is from a random number generator using technology. Use the list of numbers to obtain a simple random sample of size 3 from this list. If you start on the left and take the first three numbers between 1 and 9, what three books would be selected from the numbered list? 3, 1, 1, 1, 3, 8, 8, 8, 9, 5, 5, 8, 5, 3, 6
a) Number the books from 1 to 9 and use a random number generator to produce 33 different numbers from 1 to 9 that correspond to the books selected. List each book on a separate piece of paper, place them all in a hat, and pick three. b) The Scarlet Letter (3), Crime and Punishment (1), Huckleberry Finn (8)
Suppose a life insurance company sells a $270,000 one-year term life insurance policy to a 24-year-old female for $220. The probability that the female survives the year is 0.999468. a) Compute and interpret the expected value of this policy to the insurance company. b) Which of the following interpretation of the expected value is correct?
a) Probability that she doesn't survive: 1-0.999468 = 0.000532 If she survives: Profit of $220. If she dies: Loss of 220-270000 = -$269,780. Expected value: (Profit * Probability she survives) - ([Potential payout - Premium paid] * Probability she dies) (220*0.999468)-(269780*0.000532) = $76.36 b) The insurance company expects to make an average profit of $76.36 on every 24-year-old female it insures for 1 year.
A web page design firm has two designs for an online hardware store. To determine which is the more effective design, the firm uses one page in the Dallas area and a second page in the Denver area. For each visit, the firm records the amount of time visiting the site and the amount spent by the visitor. a) What is the explanatory variable in this study? Is it qualitative or quantitative? b) What are the two response variables? For each response variable, state whether it is qualitative or quantitative. c) Explain how confounding might be an issue with this study.
a) The explanatory variable is the web page design. The explanatory variable is qualitative. b) One response variable is the amount spent by the visitor. This response variable is quantitative. Another is the amount of time visiting the site. This response variable is quantitative. c) Since the designs are being tested with two different locations, preferences depending on the location may affect the response variables for those groups.
Determine whether the random variable is discrete or continuous. In each case, state the possible values of the random variable. a) The number of light bulbs that burn out in the next week in a room with 16 bulbs. b) The time it takes for a light bulb to burn out.
a) The random variable is discrete. The possible values are x=0, 1, 2, ...16. b) The random variable is continuous. The possible values are t>0.
The data below represent commute times (in minutes) and scores on a well-being survey. a) Find the least-squares regression line treating the commute time, x, as the explanatory variable and the index score, y, as the response variable. b) Interpret the slope. c) Interpret the y-intercept d) Predict the well-being index of a person whose commute time is 30 minutes. e) Suppose Barbara has a 15-minute commute and scores 66.7 on the survey. Is Barbara more "well-off" than the typical individual who has a 15-minute commute?
a) y = a+bx → y = -0.096x + 69.025 b) For every unit increase in commute time, the index score falls by 0.096, on average. c) For a commute time of zero minutes, the index score is predicted to be 69.025. d) y = (-0.096*30) + 69.025 = 66.1 e) No, Barbara is less well-off because the typical individual who has a 15-minute commute scores 67.6.
Researchers wanted to determine if there was an association between the level of stress of an individual and their risk of lung cancer. The researchers studied 1508 people over the course of 10 years. During this 10-year period, they interviewed the individuals and asked questions about their daily lives and the hassles they face. In addition, hypothetical scenarios were presented to determine how each individual would handle the situation. These interviews were videotaped and studied to assess the emotions of the individuals. The researchers also determined which individuals in the study experienced any type of lung cancer over the 10-year period. After their analysis, the researchers concluded that the stress-free individuals were less likely to experience lung cancer. a) What type of observational study was this? b) What is the response variable? What is the explanatory variable? c) In the report, the researchers stated that "the research team also hasn't ruled out that a common factor like genetics could be causing both the emotions and the lung cancer." Explain what this sentence means.
a) This was a cohort study, because information was collected about a group of individuals by observing them over a long period of time. b) The response variable is whether or not lung cancer was contracted, because it is the variable of interest. The explanatory variable is level of stress, because it affects the other variable. c) The researchers may be concerned with confounding that occurs when the effects of two or more explanatory variables are not separated or when there are some explanatory variables that were not considered in a study, but that affect the value of the response variable.
A news service conducted a survey of 1001 adults ages 18 years or older in a certain country, August 31−September 2, 2015. The respondents were asked, "Of every tax dollar that goes to the federal government, how many cents of each dollar would you say are wasted?" The four possible responses are that the federal government wastes less than 10 cents, between 11 cents and 25 cents, between 26 cents and 50 cents, or 51 cents or more. Of the 1001 individuals surveyed, 35% indicated that 51 cents or more is wasted. The news service reported that 35% of all adults in the country 18 years or older believe the federal government wastes at least 51 cents of each dollar spent, with a margin of error of 3% and a 95% level of confidence. a) What is the research objective? b) What is the population? c) What is the sample? d) List the descriptive statistics. e) What can be inferred from this survey?
a) To determine the percent of adults in the country who believe the federal government wastes 51 cents or more of every dollar. b) Adults in the country aged 18 years or older. c) The 1001 adults in the country that were surveyed. d) 35% of the individuals surveyed indicated that 51 cents or more is wasted. e) The news service is 95% confident that the percentage of all adults in the country who believe the federal government wastes 51 cents or more of every dollar received is between 32% and 38%.
The human resource department at a certain company wants to conduct a survey regarding worker benefits. The department has an alphabetical list of all 4087 employees at the company and wants to conduct a systematic sample of size 40. a) What is k? b) Determine the individuals who will be administered the survey. Randomly select a number from 1 to k. Suppose that we randomly select 6. Starting with the first individual selected, the individuals in the survey will be:
a) k = N (population) / n (sample) = 102 b) p, p+k, p+2k...p+(n-1)k = 6, 108, 210...3984
A student at a junior college conducted a survey of 20 randomly selected full-time students to determine the relation between the number of hours of video game playing each week, x, and grade-point average, y. She found that a linear relation exists between the two variables. The least-squares regression line that describes this relation is y = -0.0572x + 2.9205 a) Predict the grade-point average of a student who plays video games 8 hours per week. b) Interpret the slope. c) If appropriate, interpret the y-intercept.
a) y = (-0.0572*8) + 2.9205 = 2.46 b) For each additional hour that a student spends playing video games in a week, the grade-point average will decrease by 0.0572 points, on average. c) The grade-point average of a student who does not play video games is 2.9205. (x=0)
In the probability distribution to the right, the random variable X represents the number of marriages an individual aged 15 years or older has been involved in. a) Compute and interpret the mean of the random variable X. b) Which of the following interpretations of the mean is correct?
a) µx=0.928 marriages b) If many individuals aged 15 year or older were surveyed, the sample mean number of marriages should be close to the mean of the random variable.
Put the following in order for the most area in the tails of the distribution. (a) Standard Normal Distribution (b) Student's t-Distribution with 25 degrees of freedom. (c) Student's t-Distribution with 40 degrees of freedom.
b, c, a
What is the Central Limit Theorem?
n>30 for the sample mean (overbar x) to be normal if the parent population is not known to be normal.
Find the value of z(0.44)
invNorm ([1-0.44 = 0.56], 0, 1) = 0.15
Suppose the size of a population is N=4000 and the sample size desired is n=9. What value of k should be used to obtain a systematic sample from this population?
k = N/n = 4000/9 = 444
A professor wants to randomly select 4 students to go to the board. She decides to randomly select the seventh student who enters the classroom and every eighth student after that. Determine the students who will be going to the board. Write down the student numbers.
n, (n+8), (n+16), (n+24) = 7, 15, 23, 31
What is the formula for estimating p₁-p₂ if prior estimates of p₁ and p₂ are not available?
n = 0.5* [(z sub-α/2)/E]²
In a certain state, 45% of adults indicated that sausage is their favorite pizza. Suppose a simple random sample of adults in the state of size 23 is obtained and the number of adults who indicated that sausage is their favorite pizza was 13. What are values of the parameters n, p, and x in the binomial probability experiment?
n = 23 p = 0.45 x = 13
What do n, p, and 1-p represent in a binomial probability distribution?
n = number of independent trials p = probability of success 1-p = probability of failure
Aside from the fact that the sample must be obtained by simple random sampling or through a randomized experiment, list the two conditions that must be met when constructing a confidence interval for a population proportion p.
np(1-p)≥10 n≤0.05N
Aside from the fact that the sample must be obtained by simple random sampling or through a randomized experiment and the sample size must be small relative to the size of the population, what other condition must be satisfied?
n≥30 (Central Limit Theorem invoked) n<30 (needs to be normally distributed with no outliers)
What is the slope-intercept form?
y = mx + b m is the slope and b is the y-intercept.
A physical therapist wants to determine the difference in the proportion of men and women who participate in regular sustained physical activity. What sample size should be obtained if he wishes the estimate to be within 5 percentage points with 99% confidence, assuming that: (a) He uses the estimates of 22.3% male and 19.3% female from a previous year? (b) He does not use any prior estimates?
z sub-α/2 of 99% = 2.575 E = 0.05 (a) n = [p-hat₁(1 - p-hat₁)+p-hat₂(1 - p-hat₂)] * [(z sub-α/2)/E]² = [(0.223)(1-0.223) + (0.193)(1-0.193)] * (2.575/0.05)² = 873 (b) n = 0.5* [(z sub-α/2)/E]² = 0.5 * (2.575/0.05)² = 1327
Determine μ overbar x and σ overbar x from the given parameters of the population and sample size. μ=85 σ=22 n=29
µ overbar x = 85 σ overbar x = σ/√n = 22/√29 = 4.085
If events E and F are disjoint and the events F and G are disjoint, must the events E and G necessarily be disjoint? Give an example to illustrate your opinion.
No, events E and G are not necessarily disjoint. For example, E={0,1,2}, F={3,4,5}, and G={2,6,7} show that E and F are disjoint events, F and G are disjoint events, and E and G are events that are not disjoint.
T/F: When comparing two populations, the larger the standard deviation, the more dispersion the distribution has, provided that the variable of interest from the two populations has the same unit of measure.
True, because the standard deviation describes how far, on average, each observation is from the typical value. A larger standard deviation means that observations are more distant from the typical value, and therefore, more dispersed.
What are the two requirements for a discrete probability distribution?
∑P(x)=1 0≤P(x)≤1