FINAL EXAM math statistics
A study of an association between which ear is used for cell phone calls and whether the subject is left-handed or right-handed began with a survey e-mailed to 5000 people belonging to an otology online group, and 717 surveys were returned. (Otology relates to the ear and hearing.) What percentage of the 5000 surveys were returned? Does that response rate appear to be low? In general, what is a problem with a very low response rate?
Of the 5000 surveys, 14% were returned. This response rate appears to be low. It creates a serious potential for getting a biased sample that consists of those with a special interest in the topic.
Which of the following is NOT true of using the binomial probability distribution to test claims about a proportion?
One requirement of this method is that np≥5 and nq≥5.
Which of the following is NOT a requirement of the Permutations Rule, nPr=n!/(n-r)! , for items that are alldifferent?
Order is not taken into account (rearrangements of the same items are considered to be the same).
In a study of all 4935 students at a college, it is found that 35% own a computer.
Parameter because the value is a numerical measurement describing a characteristic of a population.
Which of the following is NOT a level of measurement?
Quantitative
Which of the following is NOT a voluntary response sample?
Quiz scores from a college level statistics course are analyzed to determine student progress.
A man experienced a tax audit. The tax department claimed that the man was audited because he was randomly selected from all the taxpayers.
Random sampling
In a poll conducted by a certain research center, 1149 adults were called after their telephone numbers were randomly generated by a computer, and 67 % were able to correctly identify the president.
Random sampling
Which measure of variation is most sensitive to extreme values?
Range
For a data set of chest sizes (distance around chest in inches) and weights (pounds) of four anesthetized bears that were measured, the linear correlation coefficient is requals=0.958. Use the table available below to find the critical values of r. Based on a comparison of the linear correlation coefficient r and the critical values, what do you conclude about a linear correlation?
Since the correlation coefficient r is in the right tail above the positive critical value, there is sufficient evidence to support the claim of a linear correlation.
Which of the following is NOT true for conducting a hypothesis test for independence between the row variable and column variable in a contingency table?
Small values of the χ2 test statistic reflect significant differences between observed and expected frequencies.
In a test of homogeneity, which of the following is NOT true?
Small values of the χ2 test statistic would lead to a decision to reject the null hypothesis.
Which of the following is NOT needed to determine the minimum sample size required to estimate a population proportion?
Standard Deviation
Determine whether the underlined number is a statistic or a parameter. A sample of students is selected and it is found that 25% own a television
Statistic because the value is a numerical measurement describing a characteristic of a sample.
To determine her stress level, Britney divides up her day into three parts: morning, afternoon, and evening. She then measures her stress level at 44 randomly selected times during each part of the day.
Stratified
Which sampling method subdivides the population into categories sharing similar characteristics and then selects a sample from each subdivision?
Stratified
A researcher selects every 989th social security number and surveys the corresponding person.
Systematic sampling
Which of the following is NOT a requirement of the Combinations Rule, nCr=n!/(n-r)! , for items that are all different?
That order is taken into account (consider rearrangements of the same items to be different sequences).
Consider a difference of 20% between two values of a standard deviation to be significant. How does this computed value compare with the given standard deviation, 9.0?
The computed value is not significantly different from the given value
The exact distances (in centimeters) between the chairs in a college classroom
The data are continuous because the data can take on anythe data can take on any value in an interval.
The total numbers of flights by different airlines between two specific cities in the past month
The data are discrete because the data can only take on specific values
The numbers of telephone lines in different regions
The data are discrete because the data can only take on specific values
Determine whether the data described below are qualitative or quantitative and explain why. The blood groups of A, B, AB, and O
The data are qualitative because they don't measure orthey don't measure or count anything.
Does the graph suggest that the distribution is skewed? If so, how?
The distribution appears to be skewed to the left (or negatively skewed).
Which of the following is NOT a property of the sampling distribution of the variance?
The distribution of sample variances tends to be a normal distribution.
Which of the following is NOT a conclusion of the Central Limit Theorem?
The distribution of the sample data will approach a normal distribution as the sample size increases.
Which of the following is NOT a reason why the procedures to estimate differences of two proportions or testing a claim about two proportions work?
The form of the confidence interval utilizes the same variance as when testing claims using hypothesis tests.
Use the magnitudes, rounded to two decimal places, of the 100 earthquakes included in the accompanying data set to construct a frequency distribution. Use a class width of 0.50 and begin with a lower class limit of 0.00. Does the frequency distribution appear to be a normal distribution?
The frequency distribution could reasonably be a normal distribution because the frequencies start low, increase, and then decrease, and are roughly symmetric.
In a study of 412 children with a particular disease, the subjects were asked to complete a survey about their diet upon arrival to a hospital
The given description corresponds to an observational study.
A homeowner measured the voltage supplied to his home on 31 different days, and the average (mean) value is 147.1 volts.
The given value is a statistic for the year because the data collected represent a sample.
Which of the following does NOT describe the standard normal distribution?
The graph is uniform.
In a study designed to test the effectiveness of a medication as a treatment for lower back pain, 1643 patients were randomly assigned to one of three groups: (1) the 547 subjects in the placebo group were given pills containing no medication; (2) 550 subjects were in a group given pills with the medication taken at regular intervals; (3) 546 subjects were in a group given pills with the medication to be taken when needed for pain relief. In what specific way was replication applied in the study?
The group sample sizes are all large so the researchers could see the effects of the treatment.
The table available below shows the drive through service times (seconds) for lunches at a fast food restaurant. Use the data to construct a histogram. Begin with a lower class limit of 70 seconds and use a class width of 40 seconds. Does the histogram appear to be skewed? If so, identify the type of skewness.
The histogram has a longer right tail, so the distribution of the data is skewed to the right.
What does it mean for the findings of a statistical analysis of data to be statistically significant?
The likelihood of getting these results by chance is very small.
Which of the following is NOT a property of the chi-square distribution?
The mean of the chi-square distribution is 0.
Identify which of these designs is most appropriate for the given experiment: completely randomized design, randomized block design, or matched pairs design. Currently, there is no approved vaccine for the prevention of infection by a certain virus. A clinical trial of a possible vaccine is being planned to include subjects treated with the vaccine while other subjects are given a placebo.
The most appropriate is completely randomized design.
Assume that 100100 births are randomly selected and 33 of the births are girls. Use subjective judgment to describe the number of girls as significantly high, significantly low, or neither significantly low nor significantly high.
The number of girls is significantly low.
Which of the following is NOT a requirement of constructing a confidence interval estimate for a population variance?
The population must be skewed to the right.
The complement of "at least one" is _______.
"none"
"At least one" is equivalent to _______.
"one or more."
For the binomial distribution, which formula finds the standard deviation?
-/npq
Which of the following is NOT an equivalent expression for the confidence interval given by 161.7less than<μ<189.5?
161.7 ± 27.8
A magazine provided results from a poll of 2000 adults who were asked to identify their favorite pie. Among the 2000 respondents, 13% chose chocolate pie, and the margin of error was given as plus or minus±55 percentage points. Given specific sample data, which confidence interval is wider: the 99% confidence interval or the 80% confidence interval? Why is it wider?
A 99% confidence interval must be wider than an 80% confidence interval in order to be more confident that it captures the true value of the population proportion.
In an election poll, Evan received 16,994,174 votes
A discrete data set because there are a finite number of possible values.
Which of the following would be information in a question asking you to find the area of a region under the standard normal curve as a solution?
A distance on the horizontal axis is given
Which of the following is NOT true of the χ2 test statistic?
A small χ2 test statistic leads us to conclude that there is not a good fit with the assumed distribution.
Which of the following is NOT a true statement about error in hypothesis testing?
A type I error is making the mistake of rejecting the null hypothesis when it is actually false.
For this sample of paired data, what does r represent, and what does ρ represent? b. Without doing any research or calculations, estimate the value of r. c. Does r change if body temperatures are converted to Fahrenheit degrees?
A. r is a statistic that represents the value of the linear correlation coefficient computed from the paired sample data, and ρ is a parameter that represents the value of the linear correlation coefficient that would be computed by using all of the paired data in the population of all statistics students. B. The value of r is estimated to be 0, because it is likely that there is no correlation between body temperature and head circumference. C. The value of r does not change, because r is not affected by converting all values of a variable to a different scale.
A researcher collects a simple random sample of grade-point averages of statistics students, and she calculates the mean of this sample. Under what conditions can that sample mean be treated as a value from a population having a normal distribution? Select all that apply. A. The sample has more than 30 grade-point averages. Your answer is correct. B. If the population of statistics students has a normal distribution. C. The researcher collects more than 30 samples. D. If the population of grade-point averages has a normal distribution.
A. The sample has more than 30 grade-point averages. D. If the population of grade-point averages has a normal distribution.
Which of the following is NOT a principle of probability?
All events are equally likely in any probability procedure.
Twelve different video games showing alcohol use were observed. The duration times of alcohol use were recorded, with the times (seconds) listed below. What requirements must be satisfied to test the claim that the sample is from a population with a mean greater than 80 sec? Are the requirements all satisfied? What requirements must be satisfied? Select all that apply. A. The conditions for a binomial distribution must be satisfied. B. The sample observations must be a simple random sample. Your answer is correct. C. At least one observation must be above or below 8080 sec. D. Either the population is normally distributed, or ngreater than>30, or both. Are the requirements all satisfied?
B. The sample observations must be a simple random sample. D. Either the population is normally distributed, or n>30, or both No. The sample size is not greater than 30, the sample does not appear to be from a normally distributed population, and there is not enough information given to determine whether the sample is a simple random sample.
Which of the following is not true? A. If values are converted to standard z-scores, then procedures for working with all normal distributions are the same as those for the standard normal distribution. B. A z-score is a conversion that standardizes any value from a normal distribution to a standard normal distribution. C. A z-score is an area under the normal curve. Your answer is correct. D. The area in any normal distribution bounded by some score x is the same as the area bounded by the equivalent z-score in the standard normal distribution.
C. A z-score is an area under the normal curve.
To determine customer opinion of their pricing, Greyhound Lines randomly selects 140 busses during a certain week and surveys all passengers on the busses
Cluster
Which sampling method divides the population up into sections, randomly selects some of those sections, then chooses all the members from the selected sections to study?
Cluster
A woman is selected by a marketing company to participate in a paid focus group. The company says that the woman was selected because everyone in six randomly selected towns was being selected.
Cluster sampling
A newspaper asks its readers to call in their opinion regarding the number of books they have read this month
Convenience
Which of the following is NOT one of the three common errors involving correlation?
Correlation does not imply causality
Which of the following statistics are unbiased estimators of population parameters? Choose the correct answer below. Select all that apply. A. Sample median used to estimate a population median. B. Sample standard deviation used to estimate a population standard deviation. C. Sample range used to estimate a population range. D. Sample proportion used to estimate a population proportion. Your answer is correct. E. Sample mean used to estimate a population mean. Your answer is correct. F. Sample variance used to estimate a population variance.
D. Sample proportion used to estimate a population proportion. Your answer is correct. E. Sample mean used to estimate a population mean. Your answer is correct. F. Sample variance used to estimate a population variance.
Which of the following is associated with a parameter?
Data that were obtained from an entire population.
If A denotes some event, what does Upper A overbarA denote? If P(A)equals=0.9920.992, what is the value of P(Upper A overbarA)?
Event Upper A overbarA denotes the complement of event A, meaning that Upper A overbarA consists of all outcomes in which event A does not occur.
Which of the following is typically the least important factor to consider when conducting a statistical analysis of data?
Formula calculation
Which of the following is NOT true of the goodness-of-fit test?
Goodness-of-fit hypothesis tests may be left-tailed, right-tailed, or two-tailed.
The table below lists leading digits of 317 inter-arrival Internet traffic times for a computer, along with the frequencies of leading digits expected with Benford's law. When using these data to test for goodness-of-fit with the distribution described by Benford's law, identify the null and alternative hypotheses.
H0:p1 =0.301 and p2=0.176 and p 3=0.125 and ... and p 9=0.046 Upper H 1H1: At least one of the proportions is not equal to the given claimed value.
Which of the following would be classified as categorical data?
Hair color
Which of the following is not a commonly used practice?
If the distribution of the sample means is normally distributed, and n>30, then the population distribution is normally distributed. Subscript
Which of the following is NOT a requirement for testing a claim about a mean with σ known?
If the sample results (or more extreme results) cannot easily occur when the null hypothesis is true, we explain the discrepancy between the assumption and the sample results by concluding that the assumption is true, so we do not reject the assumption.
Which of the following is NOT a guideline for finding the best multiple regression equation?
If two predictor values have a very high linear correlation coefficient, both should be included in finding the multiple regression equation.
Which of the following is NOT true for a hypothesis test for correlation?
If |r| >critical value, we should fail to reject the null hypothesis and conclude that there is not sufficient evidence to support the claim of a linear correlation.
Which of the following is NOT a value in the 5-number summary?
Mean
Which of the following is a biased estimator? That is, which of the following does not target the population parameter?
Median
Do the data appear to have a distribution that is approximately normal?
No, it is not symmetric.
The statistics of n=22 and s=14.3 result in this 95% confidence interval estimate of sigmaσ: 11.0<σ<20.4. That confidence interval can also be expressed as (11.0, 20.4). Given that 15.7±4.7 results in values of 11.0 and 20.4, can the confidence interval be expressed as 15.7±4.7 as well?
No. The format implies that s=15.7, but s is given as 14.3. In general, a confidence interval for σ does not have s at the center.
Which level of measurement consists of categories only where data cannot be arranged in an ordering scheme?
Nominal
Which of the following consists of discrete data?
Number of suitcases on a plane
In the largest clinical trial ever conducted, 401,974 children were randomly assigned to two groups. The treatment group consisted of 201,229 children given the Salk vaccine for polio, and the other 200,745 children were given a placebo. Among those in the treatment group, 33 developed polio, and among those in the placebo group, 115 developed polio. If we want to use the methods for testing a claim about two population proportions to test the claim that the rate of polio is less for children given the Salk vaccine, are the requirements for a hypothesis test satisfied? Explain.
The requirements are satisfied; the samples are simple random samples that are independent, and for each of the two groups, the number of successes is at least 5 and the number of failures is at least 5.
Using the lengths (in.), chest sizes (in.), and weights (lb) of bears from a data set, a researcher gets the regression equation below. Weight = minus−274plus+0.426 Lengthplus+12.1 Chest Size Identify the response and predictor variables in this regression equation.
The response variable is weight and the predictor variables are length and chest size.
A particular country has 60 total states. If the areas of all 60 states are added and the sum is divided by 60, the result is 182, 073 square kilometers. Determine whether this result is a statistic or a parameter.
The result is a parameter because it describes some characteristic of a population
When analyzing polls, which of the following is NOT a consideration?
The sample should be a voluntary response or convenience sample.
A data set includes the age at marriage for 120 randomly selected married couples
The samples are dependent because there is a natural pairing between the two samples.
Which of the following is NOT required to determine minimum sample size to estimate a population mean?
The size of the population, N
What does the symbol ŷ represent?
The symbol ŷ represents the predicted value of price.
A survey found that 49% of all respondents have jobs.
The value is a statistic because it is a numerical measurement describing some characteristic of a sample.
Let the predictor variable x represent heights of males and let response variable y represent weights of males. A sample of 157 heights and weights results in s Subscript ese=16.71474 cm. In your own words, describe what that value of s Subscript ese represents.
The value of s Subscript ese is the standard error of the estimate, which is a measure of the differences between the observed weights and the weights predicted from the regression equation.
The accompanying data represent women's median earnings as a percentage of men's median earnings for recent years beginning with 1989. Is there a trend? How does it appear to affect women? Construct a time-series graph. Comment on any trends. Choose the correct comment below.
There is a general upward trend though there have been some down years. An upward trend would be helpful to women so that their earnings become equal to those of men.
For a data set of weights (pounds) and highway fuel consumption amounts (mpg) of eighteight types of automobile, the linear correlation coefficient is found and the P-value is 0.0140.014. Write a statement that interprets the P-value and includes a conclusion about linear correlation.
The P-value indicates that the probability of a linear correlation coefficient that is at least as extreme is 1.4%, which is low, so there is sufficient evidence to conclude that there is a linear correlation between weight and highway fuel consumption in automobiles.
Which of the following is NOT true when testing a claim about a standard deviation or variance?
The P-value method and the classical method are not equivalent to the confidence interval method in that they may yield different results.
A study involved 22,071 male physicians. Based on random selections, 11,037 of them were treated with aspirin and the other 11,034 were given placebos. The study was stopped early because it became clear that aspirin reduced the risk of myocardial infarctions by a substantial amount.
This is an experiment because the researchers apply a treatment to the individuals The results apply only to male physicians.
In a survey, 1465 Internet users chose to respond to this question posted on a newspaper's electronic edition: "Is news online as satisfying as print and TV news?" 52% of the respondents said "yes."
This is an observational study because the researchers do not attempt to modify the individuals. This is a convenience sample with voluntary response, which has a high chance of leading to bias.
Which of the following is an important business application related to counting?
Traveling Salesman Problem
Which of the following is not a strategy for finding P-values with the Student t distribution?
Use the table in the book to find the P-value rounded to at least 4 decimal places.
Which of the following is NOT an advantage of pooling sample variances?
We often know that σ Subscript 1=σ Subscript 2.
Which of the following is NOT a property of the standard deviation?
When comparing variation in samples with very different means, it is good practice to compare the two sample standard deviations.
Is there a relationship between cigarette tar and CO?
Yes, as the amount of tar increases the amount of carbon monoxide also increases.
Is there a linear relationship between weight and highway mileage?
Yes, as the weight increases the highway mileage decreases.
Consider the correlation between heights of fathers and mothers and the heights of their sons. Refer to the accompanying technology output. Should the multiple regression equation be used for predicting the height of a son based on the height of his father and mother? Why or why not?
Yes, the multiple regression equation should be used for predicting the height of a son based on the height of his father and mother because the P-value in the ANOVA table is very low.
If you are asked to find the 85th percentile, you are being asked to find _____.
a data value associated with an area of 0.85 to its left
a. What is a residual? b. In what sense is the regression line the straight line that "best" fits the points in a scatterplot?
a. A residual is a value of y−ŷ, which is the difference between an observed value of y and a predicted value of y. b. sum of squares lowest
In a certain survey, 503 people chose to respond to this question: "Should passwords be replaced with biometric security (fingerprints, etc)?" Among the respondents, 53% said "yes." We want to test the claim that more than half of the population believes that passwords should be replaced with biometric security. Complete parts (a) through (d) below.
a. The sample observations are not a random sample, so a test about a population proportion using the normal approximating method cannot be used. b. This statement means that if the P-value is very low, the null hypothesis should be rejected. c. This statement seems to suggest that with a high P-value, the null hypothesis has been proven or is supported, but this conclusion cannot be made. d. Choosing this specific of a significance level could give the impression that the significance level was chosen specifically to reach a desired conclusion.
a. What impression does the graph create? b. Does the graph depict the data fairly?
a. The graph creates the impression that men have salaries that are more than twice the salaries of women. b. No, because the vertical scale does not start at zero.
a. A statistics class with 36 students is arranged so that there are 6 rows with 6 students in each row, and the rows are numbered from 1 through 6. A die is rolled and a sample consists of all students in the row corresponding to the outcome of the die. b. For the same class described in part (a), the 36 student names are written on 36 individual index cards. The cards are shuffled and six names are drawn from the top. c. For the same class described in part (a), the six youngest students are selected.
a. This sample is not a simple random sample. It is a random sample. b. This sample is a simple random sample. It is a random sample. c. This sample is not a simple random sample. It is not a random sample.
The _______ event A occurring are the ratio P(A)/P(Ā)
actual odds in favor of
When using the _______ always be careful to avoid double-counting outcomes.
addition rule
Which of the following groups of terms can be used interchangeably when working with normal distributions?
areas, probability, and relative frequencies
A bottle contains a label stating that it contains pills with 500 mg of vitamin C, and another bottle contains a label stating that it contains pills with 325 mg of aspirin. When testing claims about the mean contents of the pills, which would have more serious implications: rejection of the vitamin C claim or rejection of the aspirin claim? Considering only a type I error and using the same sample size, is it wise to use the same significance level for hypothesis tests about the mean amount of vitamin C and the mean amount of aspirin?
aspirin aspirin vitamin C smaller
A _______ random variable has infinitely many values associated with measurements.
continuous
The _______ deviation is the vertical distance ŷ−y overbar, which is the distance between the predicted y-value and the horizontal line passing through the sample mean y overbar.
explained
Selections made with replacement are considered to be _______.
independent
Two events A and B are _______ if the occurrence of one does not affect the probability of the occurrence of the other.
independent
Paired sample data may include one or more ___________, which are points that strongly affect the graph of the regression line.
influential points
Identify the expression for calculating the mean of a binomial distribution.
np
In the binomial probability formula, the variable x represents the _______.
number of successes.
In modified boxplots, a data value is a(n) _______ if it is above Q Subscript 3+(1.5)(IQR) or below Q Subscript 1−(1.5)(IQR).
outlier
Identify the type I error and the type II error that correspond to the given hypothesis. The percentage of households with Internet access is equal to 60 %60%
part 1: Reject the null hypothesis that the percentage of households with Internet access is equal to 60% when that percentage is actually equal to 60%. part 2: Fail to reject the null hypothesis that the percentage of households with Internet access is equal to 60% when that percentage is actually different from 60%.
A _______ is an interval estimate of a predicted value of y.
prediction interval
If, under a given assumption, the probability of a particular observed event is extremely small, we conclude that the assumption is probably not correct. This represents the _______.
rare event rule
A _______ histogram has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.
relative frequency
A ______________ is a scatterplot of the (x,y) values after each of the y-coordinate values has been replaced by the residual value y-ŷ
residual plot
The Range Rule of Thumb roughly estimates the standard deviation of a data set as _______.
s= range/4
If we are collecting sample data for a study, the _______ that we choose can greatly influence the validity of our conclusions. For example, we can use sound statistical methods to analyze data in voluntary response samples, but the results are not necessarily valid.
sampling method
Class width is found by _______.
subtracting a lower class limit from the next consecutive lower class limit
Whenever a data value is less than the mean, _______.
the corresponding z-score is negative
For data sets having a distribution that is approximately bell-shaped, _______ states that about 68% of all data values fall within one standard deviation from the mean.
the empirical rule
Where would a value separating the top 15% from the other values on the graph of a normal distribution be found?
the right side of the horizontal scale of the graph
Surveying 100 college students and asking if they like pirates or ninjas better, recording Pirate or Ninja.
yes, because all 4 requirements are satisfied
Twenty different statistics students are randomly selected. For each of them, their body temperature (degrees°C) is measured and their head circumference (cm) is measured. If it is found that requals=0, does that indicate that there is no association between these two variables?
No, because while there is no linear correlation, there may be a relationship that is not linear.