math 201 final
Reconsider the psychic from Question 5, where he was successful in 50 of 200 random trials. Suppose we now perform a hypothesis test, with the null hypothesis being that the psychic has no ability, so p = 0.20. What is the P-value for the two-sided test? 0.01 < P-value < 0.05 Correct! 0.05 < P-value < 0.10 0.10 < P-value 0.001 < P-value < 0.01 P-value < 0.001
0.05 < P-value < 0.10
A particular poker game has the following probabilities of receiving aces in your two-card hand: 0 aces: 85%, 1 ace: 14%, 2 aces: 1%. What is the chance of having at least one ace in your hand? 0.85 0.14 + 0.01 0.01 0.85 + 0.14 0.14
0.14 + 0.01
A survey organization is hired to do a survey of people who have purchased a new vehicle in the past year in Wisconsin. They buy a list of all persons who have purchased a new vehicle in the last 12 months from the Wisconsin Department of Transportation. There are 100,000 names on the list. The survey organization takes a simple random sample of 400 names and interviews all 400 people selected. The organization finds that the average reported purchase price of the new vehicles is $25,000 with a standard deviation of $4,000. An approximate 95% confidence interval for the average purchase price of new vehicles in Wisconsin is: $24,400 to $25,600 $24,600 to $25,400 $21,600 to $28,400 $21,000 to $29,000
$24,600 to $25,400
The Packers start each game with a coin flip, to determine who gets the choice of receiving first or kicking first. What is the chance that they win this coin flip in each of their first 5 games of the season, assuming the chance of heads is exactly 50% with this coin? (1/2)5 (1/2) We need more information to calculate this probability. (1/2) * 5 1 - (1/2) * 5
(1/2)5
It is estimated that 780,000 surgical site infections (SSIs) occur each year. The national SSI rate is 1.9%. A Georgetown medical office was interested in determining if their SSI rate were smaller than the national average. Out of a sample of 277 patients in their study, only one infection occurred. What test statistic is used to test the hypothesis that the Georgetown medical office SSI rate is less than the national average SSI rate? 0.03 1.876 0.06 Correct! -1.876
-1.876
Suppose we wish to test the hypotheses 𝐻0:𝜇=10H0:μ=10 versus 𝐻𝑎:𝜇<10Ha:μ<10, where 𝜇μ represents the mean age of children not in high school who are members of a large gymnastics club in a metropolitan area. Assume age follows a Normal distribution with 𝜎=2σ=2. A random sample of n = 16 children is drawn from the population, and we find the average age of these observations to be 𝑥¯=8.76x¯=8.76. What is the value of the P-value for this hypothesis test? 0.2676 0.9500 0.1075 0.0066 0.0000
0.0066
SAT scores of an entering freshman at University Y have an approximate normal distribution with a mean of 1215 and a standard deviation of 110. What is the probability that the sample mean of 100 randomly chosen freshmen will be less than 1190? 0.012 0.590 0.988 0.410 0.000
0.012
An agricultural researcher takes a simple random sample of 400 English carrot farms. (Assume there are thousands of carrot farms in England.) For this sample, she finds that 120 of the farms require irrigation, and two of the farms have over 100 acres. Which of the following is an approximate 90% confidence interval for the percentage of farms that have over 100 acres? Correct Answer 0.2% to 1.8% 0.1% to 1.1% A 90% confidence interval is not possible; only 95% and 99% confidence intervals are possible. You Answered 26% to 34%
0.2% to 1.8%
Ten people are being assigned to two treatments in an experiment by using a fair coin. What is the chance that exactly 5 of the people are assigned to the first treatment? 0.623 5.000 0.500 0.246 0.55
0.246
A noted psychic was tested for extrasensory perception. The psychic was presented with 200 cards face down and asked to determine if each card were one of five symbols: a star, a cross, a circle, a square, or three wavy lines. The psychic was correct in 50 cases. Let p represent the probability that the psychic correctly identifies the symbol on the card in a random trial. Assume the 200 trials can be treated as a simple random sample from the population of all guesses the psychic would make in his lifetime. Based on the results, what is a 95% confidence interval for p? (Note: Use the large-sample confidence interval.) 0.25 ± 0.004 0.25 ± 0.055 Correct! 0.25 ± 0.060 0.25 ± 0.05
0.25 ± 0.060
Suppose 34% of male drivers and 12% of female drivers own a Dodge vehicle. If 48% of the driving population are female (and therefore 52% are male) what is the chance that a randomly chosen Dodge owner is a male driver? 0.34(0.52)0.12(0.48)0.34(0.52)0.12(0.48) 0.34(0.52)0.34(0.52)+0.12(0.48)0.34(0.52)0.34(0.52)+0.12(0.48) 0.12(0.52)0.12(0.52) 0.34(0.52)+0.12(0.48)0.34(0.52)+0.12(0.48) 0.34(0.12)(0.48)
0.34(0.52)/ 0.34(0.52)+0.12(0.48)
Bob has recently been hired by a shop downtown to help customers with various computer-related problems. Lately, two different viruses have been bugging many customers: Dummy and Smarty. It is estimated that about 65% of the customers with virus problems are bothered by Dummy and the remaining 35% by Smarty. If the computer is infected by Dummy, Bob has a 90% chance of fixing the problem. However, if the computer is infected by Smarty, this chance is only 70%. If a virus-infected computer is randomly selected from the shop, and we know it was fixed by Bob, what is the probability that it was infected with Dummy? 0.705 0.245 0.585 0.83
0.705
The breaking strength of yarn used in the manufacture of woven carpet material is Normally distributed with 𝜎σ = 2.4. A random sample of n = 16 specimens of yarn from a production run was measured for breaking strength, and based on the mean of the sample ( 𝑥¯=130x¯=130 in this case) a confidence interval was found to be (128.7, 131.3). What is the confidence level of this interval? 0.90 0.95 The confidence level cannot be determined with the information provided. 0.99 0.97
0.97
Sheila buys a ticket in the "Pick 5" lottery every day, always betting on 90210, her favorite number. Each day, Sheila has a chance of 0.006 of winning, independently of other days, as a new "Pick 5" drawing is held each day. What is the chance that Sheila wins for the first time on the 20th day she plays? 0.00619×0.994 0.00619×0.994 0.99419×0.006 0.99419×0.006 0.00620 0.00620 1−0.99420 1−0.99420 0.99420
0.994^19×0.006
The game of craps starts with a "come-out" roll, in which the shooter rolls a pair of dice. If the total of the "spots" on the up-faces is 2, 3, or 12, the shooter loses immediately. What is the proportion of come-out rolls that ends up in an immediate loss for the shooter, assuming fair six-sided dice? 1/9 1/12 1/4 8/9
1/9
Skinfold thickness is used to measure body fat. A histogram for skinfold thickness is shown below; the units on the horizontal axis are millimeters (mm). The first quartile (Q1) of skinfold thickness is closest to which value? 55 20 0 90 35
20
An agricultural researcher takes a simple random sample of 400 English carrot farms. (Assume there are thousands of carrot farms in England.) For this sample, she finds that 120 of the farms require irrigation, and two of the farms have over 100 acres. Which of the following is an approximate 99% confidence interval for the percentage of farms that require irrigation? 26% to 34% Correct! 24% to 36% A 99% confidence interval is not possible; only 90% and 95% confidence intervals are possible. 25% to 35%
24% to 36%
Koker's Pizza advertises that their pizza has the most cheese in town. However, Edwards' Pizza also advertises that their pizza has the most cheese in town. The National Pizza Consumers Commission (NPCC) is consulted to settle the dispute. NPCC is an unbiased consumer advocate group, and their only interest is in determining if there is any difference in the amount of cheese used by the two pizza rivals. After the hypotheses above were formulated, NPCC checked the amount of cheese on samples of 10 pizzas from each restaurant. The results: Koker's average amount of cheese is 0.55 pounds with a standard deviation of 0.30 pounds, Edwards' average amount of cheese is 0.46 pounds with a standard deviation of 0.20 pounds. The P-value of the two-sided significance test is: 16% Less than 5% 44% 22% 78%
44%
An agricultural researcher takes a simple random sample of 400 English carrot farms. (Assume there are thousands of carrot farms in England.) For this sample, she calculates the average yield of the carrots on the sample farms is 50 bushels with a standard deviation of 5 bushels. Which of the following is an approximate confidence interval for the average bushels produced by English carrot farms? 40.2 bushels to 59.8 bushels 45 bushels to 55 bushels 49.5 bushels to 50.5 bushels A 95% confidence interval is not possible in this situation; yields of carrot plants cannot be normally distributed.
49.5 bushels to 50.5 bushels
Suppose you win a certain game 25% of the time. You want to play just one game with your buddy. Your buddy, however, wants to play a "best two out of three" series with you instead. (In this sort of series, the series ends whenever either you or your buddy have won 2 games, so there may not actually be a third game played.) If you list the entire sample space for this "best two out of three" series, how many elements will it have? 2. You either win or you lose. 5. 2 outcomes plus 3 games. 8. A three game series has 23=823=8 elements. 6. Usually, a 3 game series would have 8 elements, but two elements here cannot happen as the series has already ended after either 2 wins or 2 losses. 6. 2 outcomes times 3 games.
6. Usually, a 3 game series would have 8 elements, but two elements here cannot happen as the series has already ended after either 2 wins or 2 losses.
Find the 37th percentile of heights, assuming heights have a normal distribution with a mean of 64 inches and a standard deviation of 3 inches. 62 65 66 61 63
63
Find the area under the standard normal curve (mean 0, standard deviation 1) from -1 to +3. 99.7% 48% 34% 84% 16%
84%
What is the meaning of the 68-95-99.7 numbers as related to the normal curve? -These numbers represent the percentiles of the normal distribution. -These numbers are the cumulative areas in the normal curve, below, between, and above the values -1 and 1, -2 and 2, and -3 and 3. -These values represent the proportion of a list that are within 1, 2, and 3 standard deviations of the mean, respectively. -These numbers represent the proportion of a list that are beyond 1, 2, and 3 standard deviations from the mean, respectively.
?? (it is not the second option at)
Which of the following displays gives the most similar view as that given by a histogram? A stem and leaf plot. An empirical distribution plot. A box plot. A pie graph.
?? (not the an empirical distribution plot)
Suppose two lists of data have the exact same means and the exact same standard deviations. What can we say about their box plots? Both box plots will have the same center bar, but one box will be noticeably wider. We cannot say anything certain about the relative shapes of their box plots. The two box plots must look identical as well. Both box plots will have the same box width, but one box plot will be shifted to the right.
?? not the: The two box plots must look identical as well.
What is the exact chance that a student will pass a 12-question, multiple choice exam purely by random guessing? Assume you must get more than 70% correct to pass (9 or more answers correct) and that there are four choices for each problem. binomcdf( 12, 0.25, 8 ) 1 - binomcdf( 12, 0.25, 8 ) 1 - binomcdf( 12, 0.25, 9 ) normalcdf( 70, 100, 12*0.25, √( 12 * 0.25 * 0.75) ) binomcdf( 12, 0.25, 9 )
?? not: 1 - binomcdf( 12, 0.25, 9 )
John Q., an English carrot farmer, does not believe the researcher's results, so he interviews the 100 farmers nearest his farm. His confidence intervals differ markedly from those given by the agricultural researcher. What is the best explanation for this discrepancy? The researcher must have used an invalid sampling procedure. Samples of convenience are more accurate than simple random samples. The researcher's results should be trusted; she used a probability sample. John's results should be trusted; he used a sample of convenience.
?? not: John's results should be trusted; he used a sample of convenience.
In a major corporation, health analysts wish to study the relationship between salary and absence from work due to illness last year. Some minimum wage employees and some upper management employees are randomly selected and for each selected employee the number of days missed last year due to illness is recorded from their personnel file. This is an example of: Just a controlled experiment. An observational study. Just a randomized experiment. A randomized, controlled experiment.
?? not: Just a randomized experiment.
A student organization has 55 members. A TI-84 calculator is used ( randInt( 1, 55, 5 ) ) to select 5 of them. What form of sampling is this? Convenience Sampling. Cluster Sampling. Systematic Sampling. Simple Random Sampling Stratified Random Sampling.
?? not: Stratified Random Sampling.
At a small investment firm an analyst divides the company's clients into three groups: small accounts (812 clients with under $50,000 income), medium accounts (150 clients with between $50,000 and $200,000 income), and large accounts (15 clients with over $200,000 income). A survey of the firm's clients is conducted by selecting 10 small accounts at random, 25 medium accounts at random, and all of the large accounts, for a total sample of 50 accounts. What sort of sampling scheme is this? Stratified Random Sampling. Systematic Random Sampling. Cluster Sampling. Simple Random Sampling. Convenience Sampling.
?? not: Systematic Random Sampling.
What is wrong with the following statement of the null hypothesis in a statistical test? 𝐻0: =10H0: =10 There is actually nothing wrong with this formulation of the null hypothesis. There is no mention of the sampling scheme used to collect the data. There is no mention of the significance level. There is no mention of the P-value. There is no mention of a parameter.
?? not: there is no mention of the significance level.
Exactly one of the four statements below is incorrect. Choose the false statement. The null hypothesis in the matched pairs t-procedures is often 𝐻0:𝜇𝐷=0H0:μD=0. The proper way to analyze matched pairs data is to use the two sample t-procedures. If data show outliers or strong skewness, the t-procedures should not be used, unless the sample size is quite large. The null hypothesis in one sample t-procedures is almost always 𝐻0:𝜇1=𝜇2H0:μ1=μ2.
?? not:The null hypothesis in one sample t-procedures is almost always 𝐻0:𝜇1=𝜇2H0:μ1=μ2.
A critic of a study complains that because the statisticians analyzing the data knew which subjects were in which group, there is a bias present. Which of the following statements addresses this objection? Only the person collecting the responses of the subjects should know which group each subject is in. This objection is only relevant if there is an unmeasured lurking variable in the study. Only the subjects themselves should know which group they are in. As long as the statisticians were unable to influence the responses given by the subjects, there is no reason to worry. The complaint is legitimate; the statisticians should not know group memberships when calculating the results.
?? not: Only the person collecting the responses of the subjects should know which group each subject is in.
A population of people is each assigned a random number, and these are then sorted from low to high. A sample of 20 people is selected by taking the first 20 names, i.e. those with the 20 smallest random numbers. What is the primary problem with this sampling method? We have not secured permission to use the people in a study. We have not used any probability mechanism. We have not taken a large enough sample for this to be considered a random sample. We do not know the true mean and standard deviation for the population. There is nothing wrong with this method; each person had the same chance to be selected.
?? not: We do not know the true mean and standard deviation for the population.
Suppose someone wants to know the chance of less than 16 successes from a binomial distribution with a sample size of 50 and a probability of success on each trial of p = 0.4. What is the corresponding calculation using the continuity correction and the normal approximation? normalcdf( 15.5, 999, 20, 3.46) normalcdf( 16.5, 999, 20, 3.46) normalcdf( -999, 16.5, 20, 3.46) normalcdf( -999, 16, 20, 3.46) normalcdf( 16, 999, 20, 3.46) normalcdf( -999, 15.5, 20, 3.46)
?? not: normalcdf( 16.5, 999, 20, 3.46)
Choose the single false statement from among the following. -The Inter Quartile Range (IQR) of a list of numbers conveys information about the size of a typical value in the list, i.e. the center of a list. -An interval in a histogram with no values (an empty box) is an example of an outlier. -If the standard deviation of a list of numbers is zero, then all of the numbers in the list are the same number. -If all of the numbers in a list are greater than zero, then the standard deviation of the numbers in that list is less than the mean of the numbers in that list.
?? not: If all of the numbers in a list are greater than zero, then the standard deviation of the numbers in that list is less than the mean of the numbers in that list.
Why should we experiment with changing the width of the boxes on the histograms that our software produces initially? The software makes an arbitrary estimate, but we always need the box widths to be a whole number, not a decimal. Actually we do not change box widths; that would produce an error using software. Histograms can disguise where in an interval data lie; we need to see several histograms with different interval widths to be sure. We only change the interval widths if we have also produced an empirical distribution plot; we need to make sure they use the same scale.
??? (not the first one)
For the major corporation in the preceding problem, what would be a possible lurking variable explaining a relationship between salary and absences? Age. Older employees may earn more and may be sick more. Genetics. Absences due to illness may have been inherited. Prior work history. Employees with many prior years of absences are more likely to be absent last year. Education. Employees with more education often earn more.
??? Not: Genetics. Absences due to illness may have been inherited.
Continuing Question 10, where you win each individual game 25% of the time. What is the chance of you winning the "best two out of three" series? 0.438 0.25 0.062 0.156 0.75
??? not: 0.438
What is the primary consideration regarding sample sizes and the Central Limit Theorem (CLT)? To use the CLT, the sample size must be larger than the population size. The sample size is not a consideration when considering using the Central Limit Theorem. To use the CLT, the sample size must be large when the data is skewed or contains an outlier. To use the CLT, the sample size must be smaller than the population size. To use the CLT, the sample size must be large when the data itself has a normal distribution.
??? not: To use the CLT, the sample size must be larger than the population size.
Suppose we want to know the chance that a carton of a dozen Jumbo eggs weighs between 800 and 850 g. Assume the weights of these eggs have a nearly normal distribution with mean 70 grams and standard deviation 4 grams. Which normal curve should we use to calculate the solution? The normal curve with a mean of 70 × 12 and a standard deviation of 4 ÷ √12. The normal curve with a mean of 70 and a standard deviation of 4 x √12. We should not use any normal curve, as 12 is not considered a large sample size in this situation. The normal curve with a mean of 70 × 12 and a standard deviation of 4 × √12. The normal curve with a mean of 70 and a standard deviation of 4 ÷ √12. The normal curve with a mean of 70 and a standard deviation of 4.
??ithinkitisthe./. not: The normal curve with a mean of 70 and a standard deviation of 4 x √12.
The registrar's records show that the average credit load of all 13,000 UWO students is 14 credits with a standard deviation of 2.5. Which of the following statements is true? A confidence interval is not necessary in this situation because we know the actual population average credit load already. There is a 68% chance that the population average credit load is between 13.978 and 14.022 (14 ± 0.022). A 68% confidence interval for the population average credit load is 14 ± 0.022. There is a 68% chance that the population average credit load is between 11.5 and 16.5 (14 ± 2.5).
A confidence interval is not necessary in this situation because we know the actual population average credit load already.
To calculate both the mean and the median in a data set, you must do which of the following activities: -Only add up all of the numbers. -Add up all of the numbers and also sort them from low to high. -Square the deviations from average, and then add them up. -Only sort the numbers.
Add up all of the numbers and also sort them from low to high.
Which of the following statements best describes the "Law of Large Numbers"? If the sample size is greater than 1,000, the sample mean will equal the population mean. As the sample size increases, the sample average gets closer and closer to the population average. Even for small samples (less than 30 observations), the sample average is almost always within a point or two of the population average. For a large number of disjoint sets, the probability of any one of them occurring is found by adding all the individual probabilities together. If a sample space has a large number of outcomes, then we can calculate the probability of each of the outcomes by using formulas.
As the sample size increases, the sample average gets closer and closer to the population average.
A study with 7,500 subjects reported a result that was statistically significant at the 5% level. Which of the following explanations is most reasonable? This result must therefore be very important (practically significant). Because of the large sample size, this result may be real (statistically significant) but not particularly important (practically significant). Because we found a significant result, the null hypothesis cannot be true. Because we do not know the sampling scheme (SRS vs cluster sample for example) we cannot comment on the results.
Because of the large sample size, this result may be real (statistically significant) but not particularly important (practically significant).
ACT scores are not exactly normally distributed (because they must be integers only) but the normal distribution is a very close match to the histogram of ACT scores. How does this fact impact our ability to use the normal curve to calculate probabilities about the average of 30 students' ACT scores? Because the sample size is relatively large, we can only calculate probabilities for sums, not averages. We need samples that are much larger than 30 before we can use the normal curve for calculating probabilities when we deal with data that can be integers only. Because ACT scores cannot exceed 36, we cannot use probabilities from the normal curve, which extends out to infinity. Because the sample size is relatively large and the data is quite close to normal, there are no concerns with using the normal curve to calculate probabilities.
Because the sample size is relatively large and the data is quite close to normal, there are no concerns with using the normal curve to calculate probabilities.
Exactly one of the four statements below is incorrect. Choose the false statement. Bias and variability are two names for the same thing. A parameter describes a population. We reduce sampling variability by increasing the sample size. A sampling distribution describes how a statistic varies in repeated samples.
Bias and variability are two names for the same thing.
From 1936 to 1948 , Gallup employed quota sampling, which turns out to have been biased toward Republican voters. Which of the following statements best describes quota sampling? Census data is used to establish sample sizes within certain sub-categories, but interviewers are still given discretion choosing subjects. The interviewer samples people until the desired sample size is achieved. At a predetermined number of locations in the city, determined by the census data, the interviewer samples subjects using a probability mechanism like a random number generator. Male interviewers select only males to interview, and female interviewers choose only females.
Census data is used to establish sample sizes within certain sub-categories, but interviewers are still given discretion choosing subjects.
There are seven sections of an introductory statistics course at a certain university. A statistician draws a random sample of three of the seven sections and then draws a random sample of eight students from each of these three chosen sections. What form of sampling is this? Convenience Sampling. Systematic Random Sampling. Cluster Sampling. Simple Random Sampling. Stratified Random Sampling.
Cluster Sampling.
In which of the following relationships would it be most reasonable to view one of the two variables as the explanatory variable and the other as a response variable, as opposed to simply exploring the relationship between the two variables? The daily protein intake for a patient vs the daily fat intake. College GPA and High School GPA. High school English grades and high school math grades. Whether or not a person likes to sing and whether or not a person likes to dance.
College GPA and High School GPA.
A student trying to decide whether to use a t-test or a z-test on a statistical inference problem asks you to help. What is the best advice to give? Determine if you know the population standard deviation or not. Determine if you have a large sample or a small sample. Determine if you know the population size or not. Determine if you know the sample standard deviation or not. Determine if you know the sample size or not.
Determine if you know the population standard deviation or not.
The scores of individual students on a newly created exam have an exact Normal distribution with a mean of 18.6 and a standard deviation of 6.0. At Northside High School, 36 seniors take the test. Assume the scores at this school have the same distribution as the nation. What is the sampling distribution of the sample mean score for a random sample of 36 students? Approximately Normal, but the approximation is poor. Approximately Normal, and the approximation is good. Neither Normal nor non-Normal—it depends on the particular 36 students selected. Exactly Normal because the population at Northside High School has a normal distribution.
Exactly Normal because the population at Northside High School has a normal distribution.
Why does the size of the population have little influence on the behavior of statistics from simple random samples (as long as the population is at least 20 times the size of the sample)? If the population is properly "mixed" (as when we take a simple random sample), the size of the population doesn't matter; only the size of the sample determines the variability. Actually, the population size is much more important than the sample size concerning the behavior of statistics. The population size has little influence on the behavior of statistics because parameters are facts about populations and statistics are facts about samples. The population size doesn't have much influence on the behavior of statistics because we use the normal curve when using the Central Limit Theorem.
If the population is properly "mixed" (as when we take a simple random sample), the size of the population doesn't matter; only the size of the sample determines the variability.
Our text notes that given a 95% confidence interval, we can perform the associated level 5% hypothesis test. How is that accomplished? If the value of the hypothesized mean falls inside the confidence interval, we would conclude we had an insufficient sample size to conduct the hypothesis test. If the value of the hypothesized mean falls outside the confidence interval, we would reject the null hypothesis; otherwise we would fail to reject the null hypothesis. Because we do not know the sampling scheme (SRS vs cluster sample for example) we actually cannot perform the associated level 5% hypothesis test. If the value of the hypothesized mean falls outside the confidence interval, we would fail to reject the null hypothesis; otherwise we would reject the null hypothesis.
If the value of the hypothesized mean falls outside the confidence interval, we would reject the null hypothesis; otherwise we would fail to reject the null hypothesis.
Exactly one of the four statements below is incorrect. Choose the false statement. If two events are independent, then they cannot be disjoint events. Bayes' Rule involves conditional probabilities. If two events are disjoint, the chance that either event occurs is the sum of the probabilities of the two events. In a probability tree, the probabilities used to label the branches are unconditional probabilities, except for the initial branching which uses conditional probabilities.
In a probability tree, the probabilities used to label the branches are unconditional probabilities, except for the initial branching which uses conditional probabilities.
Here is a probability distribution for a random variable X: Value of X1234Probability0.40.30.20.1 Find the mean and standard deviation of X. Mean = 2.5 and SD = 1.0 We cannot answer this question until we know the sample size n. Mean = 2.5 and SD = 1.29 Mean = 2.0 and SD = 1.00 Mean = 2.0 and SD = 1.29
Mean = 2.0 and SD = 1.00
In the Literary Digest poll of 1936, the magazine mailed questionnaires to 10 million people (receiving over 2 million in return) mostly from lists of club memberships, phone directories, lists of magazine subscribers, etc. Which two major biases were present in this sampling scheme? (Recall that response bias occurs when respondents do not give truthful answers while non-response bias occurs when respondents cannot be contacted or refuse to participate.) Non-response and under-coverage bias. Response and under-coverage bias. Interviewer and non-response bias. Response and non-response bias. Interviewer and under-coverage bias.
Non-response and under-coverage bias.
An insurance company wishes to study the relationship between the productivity of its clerks in processing claims and the amount of training provided to the clerks. Over a period of one year, the personnel manager assigns each new clerk to receive either two or four weeks of training. It turned out that most of the people assigned two weeks of training had previously taken at least one college business course and most of the people assigned four weeks of training had never taken a college business course. Following the assigned training and a three-month work period, the productivity of each clerk was evaluated by a supervisor. A potential confounding factor is: Amount of training. Number of college business courses. The personnel manager. The clerk. Productivity.
Number of college business courses.
We have manufactured a product and are testing whether it is better than a competitor's product. We find that 13 of 20 people prefer our product to the competitor's product. Do we have enough evidence to declare that we have a superior product, or is there still doubt? Our one-sided P-value is above 10%, so there is still doubt. Correct Answer Our one-sided P-value is between 5% and 10%, so there is still doubt. Our one-sided P-value is below 5% so we feel confident that our product is better. You Answered We cannot answer this question with so few subjects. We would need at least 1,000 subjects before we could be confident of anything.
Our one-sided P-value is between 5% and 10%, so there is still doubt.
What is the fundamental assumption we make when pooling variances for two sample t-procedures? The data is sampled from normal distributions. The population means are equal. The sample size is large enough to use the Central Limit Theorem. The population variances are equal.
The population variances are equal.
An agricultural researcher plants 25 plots with a new variety of yellow corn. Assume that the yield per acre for the new variety of yellow corn follows a Normal distribution with an unknown mean of 𝜇μ and a standard deviation of 𝜎=10σ=10 bushels per acre.Which of the following would produce a confidence interval with a smaller margin of error than the 90% confidence interval? Plant one hundred plots rather than 25 because a larger sample size will result in a smaller margin of error. Plant only five plots rather than 25 because five are easier to manage and control, and are less expensive in terms of total costs. Compute a 99% confidence interval rather than a 90% confidence interval because a higher confidence level will result in a smaller margin of error. Plant ten plots rather than 25 because a smaller sample size will result in a smaller margin of error, and 5 plots is too few to check normality.
Plant one hundred plots rather than 25 because a larger sample size will result in a smaller margin of error.
Consider the following scatter plot of two variables, a "before" measurement on some people and an "after" measurement. You want to predict response for a "before" measurement of 2000 and one of 4000. Which statement best describes your strategy? Predicting a response at 2000 will be reasonable, but predicting at 4000 is ill-advised. Predicting a response at 2000 will be ill-advised, but predicting at 4000 is reasonable. Predicting a response at both 2000 and 4000 will be ill-advised. Predicting a response at both 2000 and 4000 will be reasonable.
Predicting a response at 2000 will be reasonable, but predicting at 4000 is ill-advised.
Which of the following pairs of variables cannot even be studied using the correlation coefficient? Employee height and vacation days taken this year. School size in number of students and number of members on the school board. Year of car manufacture and fuel efficiency. Profession and salary.
Profession and salary.
A simple random sample of 100 flights Airline 1 showed that 64 were on time. A simple random sample of 100 flights of Airline 2 showed that 80 were on time. Let p1 and p2 be the proportions of all flights that are were time for these two airlines. Is there evidence of a difference in the on-time rate for the two airlines? To determine this, test the hypotheses H0: p1 = p2 versus Ha: p1 ≠ p2 at a 10% significance level. What would our decision be, and what type of mistake might we have made? Fail to Reject H0; Type II error Fail to Reject H0; Type I error Correct! Reject H0; Type I error Reject H0; Type II error
Reject H0; Type I error
The instructor of an introductory statistics class notices that the students who sit in the first three rows of seats have higher scores on the first exam than the other students in the class. How could she change this observational study into a properly designed experiment? She should give every other student version A of the exam, and the rest version B. She should require everyone to sit in the first three rows (assume that there is enough seating). It is not possible for her to turn this study into an experiment. She needs consent. She should calculate an average score for each row, not just the first two rows. She should randomly assign students to a seating chart.
She should randomly assign students to a seating chart.
What is the proper interpretation of a confidence interval for the difference in two proportions that has a negative number and a positive number, such as (-25% to +35%)? Correct! Such an interval suggests that the two proportions are quite similar, and that we cannot tell which population's proportion is higher. We do not worry about the negative sign, and such an interval would have the same interpretation as (25% to 35%). The analyst has made an error, because such an interval cannot occur when calculated properly. Because 35% > 25%, we conclude that the first proportion is higher than the second proportion.
Such an interval suggests that the two proportions are quite similar, and that we cannot tell which population's proportion is higher.
John Q. wants to study the heights of people in Wisconsin and Michigan. John takes a simple random sample of 1% of the population in each state and plans to form confidence intervals for the true average height. (There are more people in Michigan than in Wisconsin and let's assume that the population standard deviations are equal in the two states.) All other things being equal, we expect: The Wisconsin confidence interval will be narrower than the Michigan confidence interval. The Wisconsin confidence interval will be wider than the Michigan confidence interval. The Wisconsin confidence interval will be the same width as the Michigan confidence interval. We do not have enough information to answer.
The Wisconsin confidence interval will be wider than the Michigan confidence interval.
A local dairy claims that exactly one gallon of milk goes into their containers. A critic believes that the measuring machine allows slightly less than one gallon to go into the containers. The critic decides to randomly select 100 of the thousands of milk containers that go through the machine. Each selected container is measured in gallons. The results show an average of 0.998 gallons. The standard deviation in the sample is 0.01 gallons. What null hypothesis would the critic test? The average milk volume in the population equals one gallon. The average milk volume in the population equals 0.998 gallons. The average milk volume in the population is greater than one gallon. The average milk volume in the population is less than one gallon.
The average milk volume in the population equals one gallon.
Suppose you are going to roll a fair, six-sided die a number of times and record the proportion of times that an even number (2, 4, or 6) is showing. Your first experiment used 60 rolls, but in your second experiment you used 240 rolls. For the second experiment, how is the center and spread of the sampling distribution different from the first experiment? The center will remain the same, but the spread will decrease. Both the center and the spread will remain the same. The center will remain the same, but the spread will increase. Both the center and the spread will decrease.
The center will remain the same, but the spread will decrease.
For the dairy in Questions 5 and 6, suppose another sample yielded a test statistic value of 0.50 with a P-value of 0.30. What conclusion should the critic draw? The dairy seems to be correct. One gallon goes in their containers. We are unable to draw a conclusion with a P-value this large. The negative value proves that the dairy has been cheating its customers. The result is statistically significant.
The dairy seems to be correct. One gallon goes in their containers.
What can we say about the relationship between two variables if a graph of our residuals from a linear regression shows a curved pattern? The deviations from an irregular horizontal pattern point out how the regression line fails to describe the overall pattern. Nothing. The graph of the residuals is supposed to show a curved pattern. The curved pattern proves that the linear regression was the proper model for the data. Nothing. We need to make a histogram of the residuals before we can make any conclusions.
The deviations from an irregular horizontal pattern point out how the regression line fails to describe the overall pattern.
In drawing a histogram, which of the following suggestions should be followed? The scale of the vertical axis should be that of the variable whose distribution you are displaying. The heights of the rectangles should be proportional to the number of observations in the class interval. Generally, the rectangles should be square so that both the height and width equal the number of observations in the class interval. Leave large gaps between the rectangles to allow room for comments.
The heights of the rectangles should be proportional to the number of observations in the class interval.
A population of graduate students at a large university was being studied through the use of a sample survey. The researcher was trying to estimate the mean debt carried by the students through student loans and the proportion of all students that carry a student loan. A simple random sample of 350 students was selected from the student body of the university. The sample results showed that 30% held a student loan and the mean debt was $18,450. Which of the following statements is true? The proportion 30%, is a statistic and the mean debt, $18,450, is the value of the parameter. The mean debt of all students and the proportion of all students holding student loans are both parameters, but the values 30% and $18,450 are statistics. The mean debt of all students is a parameter, and its value is $18,450. The proportion holding a student loan is also a parameter whose value is 30%. The mean debt of all students, the proportion holding a loan, and the values 30% and $18,450 are all examples of parameters. The mean debt held by all students is a parameter, and the 30% who hold student loans is also a parameter.
The mean debt of all students and the proportion of all students holding student loans are both parameters, but the values 30% and $18,450 are statistics.
The following scatter plot represents household natural gas usage (in therms) for various average monthly temperatures (in F°), over the course of several years. Which statement best describes the relationship? The pattern is a weak, negative, non-linear association The pattern is a weak, positive, non-linear association The pattern is a strong, positive, linear association The pattern is a strong, negative, linear association.
The pattern is a strong, negative, linear association.
A medical researcher is working on a new treatment for a certain type of cancer. The average survival time after diagnosis on the standard treatment is 2 years. In an early trial, she tries the new treatment on three subjects who have an average survival time after diagnosis of 4 years. Although the survival time has doubled, the results are not statistically significant, even at the 0.10 significance level. What is the best explanation? A placebo effect is present, which limits statistical significance. The sample size is too small to determine if the observed increase cannot be reasonably attributed to chance. The calculation was in error; an increase of 2 years is very significant. Although the survival time has doubled, the actual increase is only 2 years.
The sample size is too small to determine if the observed increase cannot be reasonably attributed to chance.
Choose the false statement from the following. The slope and intercept of a least squares regression line remain the same if you reverse the roles of the explanatory and response variables. An observation in a scatterplot is influential if removing it would have a marked effect on the statistical results. The correlation coefficient remains the same if you reverse the roles of the explanatory and response variables. Extrapolation is the use of a regression line for predictions outside the range of the data.
The slope and intercept of a least squares regression line remain the same if you reverse the roles of the explanatory and response variables.
For the gas usage data from question 2, the linear regression results are shown below. What is the best interpretation of the slope? The slope is 144.75 and suggests that when the temperature is 0 degrees, gas usage will be 144.75 therms. The slope is -2.21 and suggests that as the temperature increases by 1 degree, gas usage decreases by 2.21 therms. The slope is 0.86 and suggests that as the temperature increases by 1 degree, gas usage increases by 0.86 therms. The slope is -2.21 and suggests that when the temperature is 0 degrees, gas usage will decrease by 2.21 therms. The slope is 144.75 and suggests that as the temperature increases by 1 degree, gas usage increases by 144.75 therms.
The slope is -2.21 and suggests that as the temperature increases by 1 degree, gas usage decreases by 2.21 therms.
Which of the following best describes the meaning of "failing to reject the null hypothesis?" We have convincing proof that the null hypothesis is true. The null hypothesis was improperly formed. The alternative hypothesis has been clearly proven to be false. There was not enough evidence to indicate that the null hypothesis was false.
There was not enough evidence to indicate that the null hypothesis was false.
If the agricultural researcher from Question 8 had taken a sample four times as large, what would happen to her confidence intervals, assuming all other things are equal? They would be the same width. They would be four times as wide. They would be twice as wide. They would be half as wide. They would be one-fourth as wide.
They would be half as wide.
What is the purpose of producing a normal quantile plot? To determine if the data is exactly normally distributed. To determine the mean and the standard deviation of the normal curve. To determine if the data is approximately normally distributed. To determine if you should use a histogram to display normal data.
To determine if the data is approximately normally distributed.
A survey of students records what make of car the student drives and the age of their car, for those who drive a car regularly. What types of graphs would you use to study this data? We cannot use graphs to study this kind of data. Use a bar graph for the type of car and a pie graph for the car's age. Use a bar graph for the type of car and an empirical distribution plot for the car's age. Use a histogram for both variables. Use a stem and leaf plot for the type of car and a histogram for the car's age.
Use a bar graph for the type of car and an empirical distribution plot for the car's age.
A geography test was given to a simple random sample of 250 high school students in a certain large school district. One question involved an outline map of Europe, with the countries identified by number only. The students were asked to pick out Great Britain and France. As it turned out, 65.8% could find France and 70.2% could find Great Britain. Is this difference statistically significant? No, the two-sample P-value is large. Yes, the two-sample P-value is small. We can't answer because this was a census; no chance mechanism is present. Correct! We can't answer with the information given because this is a matched pairs situation, and we don't know how many students knew both answers. The question does not make sense. We should ask about practical importance.
We can't answer with the information given because this is a matched pairs situation, and we don't know how many students knew both answers.
What is the best interpretation of "95% confidence"? We have a 95% chance that our interval contains the value of the parameter. 95% of sample cases will be in the confidence interval. 95% of all samples will produce a sample average inside our interval. We have a 95% chance that our interval is based on a random sample.
We have a 95% chance that our interval contains the value of the parameter.
When we test for a proportion equalling a particular value, like 𝐻0:𝑝=0.8H0:p=0.8, which of the following questions is of concern? Was the data normally distributed? Was the sample size over 1,000? Was the probability of success above 50%? Correct! Were the sample responses collected using a random sample?
Were the sample responses collected using a random sample?
Exactly one of the four statements below is incorrect. Choose the false statement. Outliers can have a large effect on the formation of confidence intervals. When forming confidence intervals, it is very important that you only take large samples from Normal populations. The margin of error for a confidence interval gets larger as the confidence level increases. Very small effects can be highly significant when a test is based on a large sample.
When forming confidence intervals, it is very important that you only take large samples from Normal populations.
Exactly one of the four statements below is incorrect. Choose the false statement. When our goal is to understand cause and effect, observational studies give better evidence than experiments. In an experiment we deliberately impose some treatment on individuals and we observe their responses. The placebo effect occurs when individuals report improvement due to receiving a treatment, regardless of whether the treatment was in any way effective. If the design of a study systematically favors particular outcomes, we say it is biased.
When our goal is to understand cause and effect, observational studies give better evidence than experiments.
In the graph below, which point would be the most influential? A E D B C
a
Fill in the blanks: The Plus 4 Method for confidence intervals for proportions is recommended because the true confidence level of the__________-sample interval can be substantially __________ than the planned level. You Answered small, more large, more small, less Correct Answer large, less
large, less
The margin of error in a confidence interval for a proportion includes the effects of which of the following sources of error in a study? non-response bias design errors undercoverage bias Correct! sampling variability
sampling variability
For the dairy in Question 5, what is the observed test statistic for this data? Confidence = 95% P = 0.05 z = -2.00 t = -2.00
t = -2.00
The square footage of the several thousand apartments in a new development is advertised to be 1250 square feet, on average. A tenant group thinks that the apartments are smaller than advertised. They hire an engineer to measure a sample of apartments to test their suspicions. Let 𝜇μ represent the true average area (in square feet) of these apartments. What are the appropriate null and alternative hypotheses? 𝐻0:𝜇=1250H0:μ=1250 vs 𝐻𝑎:𝜇≠1250Ha:μ≠1250 𝐻0:𝜇=1250H0:μ=1250 vs 𝐻𝑎:𝜇>1250Ha:μ>1250 𝐻0:𝜇=1250H0:μ=1250 vs 𝐻𝑎:𝜇<1250Ha:μ<1250 𝐻𝑎:𝜇<1250Ha:μ<1250 vs 𝐻𝑎:𝜇>1250
𝐻0:𝜇=1250H0:μ=1250 vs 𝐻𝑎:𝜇<1250
A commuter must pass through five traffic lights on her way to work, and she will have to stop at each one that is red. Let X be the number of red lights she stops at on her way to work. She estimates the distribution for X to be as shown below. Value of X12345 Probability0.400.250.150.150.05 The standard deviation of the number of lights the commuter hits on her way to work equals 1.25. What is the appropriate symbol and what are the units for this standard deviation? 𝜎=1.25𝑙𝑖𝑔ℎ𝑡𝑠2σ=1.25lights2 𝜎=1.25𝑙𝑖𝑔ℎ𝑡𝑠σ=1.25lights 𝜎2=1.25𝑙𝑖𝑔ℎ𝑡𝑠2σ2=1.25lights2 𝑠=1.25𝑙𝑖𝑔ℎ𝑡𝑠s=1.25lights
𝜎=1.25𝑙𝑖𝑔ℎ𝑡𝑠