Statistics Midterm
significantly low or significantly high
A data value is considered _______ if its z-score is less than −2 or greater than 2.
The term average is not used in statistics. The term mean should be used for the result obtained by adding all of the sample values and dividing by the total number of sample values.
A defunct website listed the "average" annual income for Florida as $35,031. What is the role of the term average in statistics? Should another term be used in place of average?
No, the value of 27.3 cents is not the mean because the 50 amounts are all weighted equally in the calculation, but some states consume more gas than others, so the mean amount of state sales tax should be calculated using a weighted mean.
A magazine published a list consisting of the state tax on each gallon of gas. If we add the 50 state tax amounts and then divide by 50, we get 27.3 cents. Is the value of 27.3 cents the mean amount of state sales tax paid by all U.S. drivers? Why or why not?
The sample has more than 30 grade-point averages. If the population of grade-point averages has a normal distribution.
A researcher collects a simple random sample of grade-point averages of statistics students, and she calculates the mean of this sample. Under what conditions can that sample mean be treated as a value from a population having a normal distribution?
a. is not; is b. is; is c. is not; is not
A simple random sample of n subjects is selected in such a way that every possible sample of the same size n has the same chance of being chosen. (A simple random sample is often called a random sample, but strictly speaking, a random sample has the weaker requirement that all members of the population have the same chance of being selected.) Determine whether each of the following is a simple random sample and a random sample. Complete parts a through c. a. In Major League Baseball, there are 30 teams, each with an active roster of 25 players. The names of the teams are printed on 30 separate index cards, the cards are shuffled, and one card is drawn. The sample consists of the 25 players on the active roster of the selected team. This sample ________ a simple random sample. It ______ is a random sample. b. For the same Major League Baseball population described in part (a), the 750 names of the players are printed on 750 separate index cards, and the cards are shuffled. Twenty-five different cards are selected from the top. The sample consists of the 25 selected players. This sample ______ a simple random sample. It ____ a random sample. c. For the same Major League Baseball population described in part (a), a sample is constructed by selecting the 25 youngest players. This sample _________ a simple random sample. It ______ a random sample.
4.81
A successful basketball player has a height of 6 feet 10 inches, or 208 cm. Based on statistics from a data set, his height converts to the z score of 4.81. How many standard deviations is his height above the mean? The player's height is ________ standard deviation(s) above the mean.
The minimum radiation would be a particularly helpful statistic, but none of these statistics is helpful for selecting a cell phone for purchase.
Are any of the resulting statistics helpful in selecting a cell phone for purchase?
The number of girls is significantly low.
Assume that 900 births are randomly selected and 4 of the births are girls. Use subjective judgment to describe the number of girls as significantly high, significantly low, or neither significantly low nor significantly high.
Although actresses include the oldest age, the boxplot representing actresses shows that they have ages that are generally lower than those of actors.
Compare the two boxplots. Choose the correct answer below.
The data are quantitative because they consist of counts or measurements.
Determine whether the data described below are qualitative or quantitative and explain why. The lengths (in minutes) of movies.
Observational study
Determine whether the description corresponds to an observational study or an experiment. Research is conducted to determine if there is a relation between colon cancer and fat consumption. Does the description correspond to an observational study or an experiment?
Statistic because the value is a numerical measurement describing a characteristic of a sample.
Determine whether the underlined number is a statistic or a parameter. A sample of professors is selected and it is found that 50% own a vehicle.
Parameter because the value is a numerical measurement describing a characteristic of a population.
Determine whether the underlined number is a statistic or a parameter. In a study of all 1541 seniors at a college, it is found that 55% own a computer.
Yes, because the frequencies start low, proceed to one or two high frequencies, then decrease to a low frequency, and the distribution is approximately symmetric.
Does the frequency distribution appear to have a normal distribution? Explain. Chart (with Temperature degrees F and Frequency) (#25)
No, there does not appear to be a correlation because there is no general pattern to the data.
Does there appear to be a correlation between the president's height and his opponent's height? (#37)
All of the weights end in 00, so they all appear to be rounded to the nearest 100 grams. This suggests that the mean and median should also be rounded.
Examine the list of birth weights to make an observation about those numbers. How does that observation affect the way that the results should be rounded? (#41)
P(56 or more girls); is not; greater than
For 100 births, P(exactly 56 girls)=0.0390 and P(56 or more girls)=0.136. Is 56 girls in 100 births a significantly high number of girls? Which probability is relevant to answering that question? Consider a number of girls to be significantly high if the appropriate probability is 0.05 or less. The relevant probability is ______________________, so 56 girls in 100 births ____________ a significantly high number of girls because the relative probability is ___________ 0.05.
(look at the number of males to find critical values & don't forget the positive and negative sign!!!) +- 0.950; in the right tail above the positive critical value; is
For a data set of brain volumes (cm3) and IQ scores of four males, the linear correlation coefficient is r=0.975. Use the table available below to find the critical values of r. Based on a comparison of the linear correlation coefficient r and the critical values, what do you conclude about a linear correlation? The critical values are ______. Since the correlation coefficient r is __________________, there ___________ sufficient evidence to support the claim of a linear correlation. (#38)
(look at the number of males to find critical values & don't forget the positive and negative sign!!!) +-0.576; between the critical values; is not
For a data set of brain volumes (cm3) and IQ scores of twelve males, the linear correlation coefficient is r=0.132. Use the table available below to find the critical values of r. Based on a comparison of the linear correlation coefficient r and the critical values, what do you conclude about a linear correlation? Since the correlation coefficient r is ______________, there ________ sufficient evidence to support the claim of a linear correlation.
There appears to be an upward trend, unlike drive-in movie theaters, which have a downward trend.
Given below are the numbers of indoor movie theaters, listed in order by row for each year. Use the given data to construct a time-series graph. What is the trend? How does this trend compare to the trend for drive-in movie theaters? What is the trend? How does this trend compare to the trend for drive-in movie theaters? (#35)
Chebyshev's Theorem
Go over #47.
Bell-shaped
Heights of adult males are normally distributed. If a large sample of heights of adult males is randomly selected and the heights are illustrated in a histogram, what is the shape of that histogram?
ordinal; Such data should not be used for calculations such as an average (mean).
Identify the level of measurement of the data, and explain what is wrong with the given calculation. In a set of data, car rankings are represented as 10 for first, 20 for second, and 30 for third. The average (mean) of the 692 car rankings is 25.4. The data are the _________ level of measurement. What is wrong with the given calculation?
nominal; Such data are not counts or measures of anything, so it makes no sense to compute their average (mean).
Identify the level of measurement of the data, and explain what is wrong with the given calculation. In a survey, the favorite foods of respondents are identified as 100 for italian food, 200 for mexican food, 300 for chinese food, and 400 for anything else. The average (mean) is calculated for 785 respondents and the result is 256.1. The data are at the ______________ level of measurement. What is wrong with the given calculation?
nominal; Such data are not counts or measures of anything, so it makes no sense to compute their average (mean).
Identify the level of measurement of the data, and explain what is wrong with the given calculation. In a survey, the hair colors of respondents are identified as 0 for brown hair, 1 for blond hair, 2 for black hair, and 3 for anything else. The average (mean) is calculated for 598 respondents and the result is 1.1. The data are at the __________ level of measurement. What is wrong with the given calculation?
nominal; Such data are not counts or measures of anything, so it makes no sense to compute their average (mean).
Identify the level of measurement of the data, and explain what is wrong with the given calculation. In a survey, the responses of respondents are identified as 0 for a "yes", 1 for a "no", 2 for a "maybe", and 3 for anything else. The average (mean) is calculated for 693 respondents and the result is 1.1. The data are at the _________ level of measurement. What is wrong with the given calculation?
cross-sectional
Identify the type of observational study. A researcher plans to obtain data by interviewing offspring of victims who perished in a bombing to see how they're coping now.
Systematic sampling
Identify the type of sampling used (random, systematic, convenience, stratified, or cluster sampling) in the situation described below. A researcher selects every 240th social security number and surveys the corresponding person. Which type of sampling did the researcher use?
Stratified
Identify which of these types of sampling is used: random, systematic, convenience, stratified, or cluster. To determine her blood pressure, Miranda divides up her day into three parts: morning, afternoon, and evening. She then measures her blood pressure at 4 randomly selected times during each part of the day. What type of sampling is used?
The z score of 2.00 is most preferable because it is 2.00 standard deviations above the mean and would correspond to the highest of the five different possible test scores.
If your score on your next statistics test is converted to a z score, which of these z scores would you prefer: −2.00, −1.00, 0, 1.00, 2.00? Why?
Yes, it is reasonably close.
In a genetics experiment on peas, one sample of offspring contained 429 green peas and 162 yellow peas. Based on those results, estimate the probability of getting an offspring pea that is green. Is the result reasonably close to the value of 34 that was expected? Is this probability reasonably close to 3/4? (#71)
The group sample sizes are all large so the researchers could see the effects of the treatment.
In a study designed to test the effectiveness of a medication as a treatment for lower back pain, 1643 patients were randomly assigned to one of three groups: (1) the 547 subjects in the placebo group were given pills containing no medication; (2) 550 subjects were in a group given pills with the medication taken at regular intervals; (3) 546 subjects were in a group given pills with the medication to be taken when needed for pain relief. In what specific way was replication applied in the study?
The subjects in the study did not know whether they were taking a placebo or the new medication, and those who administered the pills also did not know.
In a double-blind experiment designed to test the effectiveness of a new medication as a treatment for lower back pain, 1643 patients were randomly assigned to one of three groups: (1) the 547 subjects in the placebo group were given pills containing no medication; (2) 550 subjects were in a group given pills with the new medication taken at regular intervals; (3) 546 subjects were in a group given pills with the new medication to be taken when needed for pain relief. What does it mean to say that the experiment was "double-blind"?
outlier
In modified boxplots, a data value is a(n) _______ if it is above Q3+(1.5)(IQR) or below Q1−(1.5)(IQR).
The term linear refers to a straight line, and r measures how well a scatterplot fits a straight-line pattern.
In this section we use r to denote the value of the linear correlation coefficient. Why do we refer to this correlation coefficient as being linear?
discrete; there are a finite number of values
Is the random variable given in the accompanying table discrete or continuous? Explain The random variable given in the accompanying table is ___________ because _____________________________________.
The probability that the polygraph indicates lying given that the subject is actually telling the truth.
Let event A=subject is telling the truth and event B=polygraph test indicates that the subject is lying. Use your own words to translate the notation P(B|A) into a verbal statement.
The data set is too small for a dotplot to reveal important characteristics of the data. A time-series graph would be most effective, since the data are listed in order over a period of several years.
Listed below are the numbers of unprovoked shark attacks worldwide for the last several years. Why is it that a dotplot of these data would not be very effective in helping us understand the data? Which of the following graphs would be most effective for these data: dotplot, stemplot, time-series graph, Pareto chart, pie chart, frequency polygon? 70, 54, 68, 82, 79, 83, 76, 73, 98, 81 Why is it that a dotplot of these data would not be very effective in helping us understand the data? Which of the following graphs would be most effective for these data: dotplot, stemplot, time-series graph, Pareto chart, pie chart, frequency polygon?
descriptive
Methods used that summarize or describe characteristics of data are called _______ statistics.
It appears that weights of U.S. Army males increased from 1983 to 2020. (it always increases no matter what years/ if not sure look at graph #51)
Refer to the accompanying boxplots that are drawn on the same scale. The top boxplot represents weights (kg) of a sample of male U.S. Army personnel in 1983, and the bottom boxplot represents weights (kg) of a sample of male U.S. Army personnel in 2020. What story is told by these boxplots?
No. The data values in each class could take on any value between the class limits, inclusive.
Refer to the table summarizing service times (seconds) of dinners at a fast food restaurant. How many individuals are included in the summary? Is it possible to identify the exact values of all of the original service times? Chart (where you add frequencies to get # of individuals who are included in the summary) Is it possible to identify the exact values of all of the original service times?
The study is an experiment because subjects were given treatments.
Researchers conducted a study to determine whether magnets are effective in treating back pain. Pain was measured using the visual analog scale, and the results given below are among the results obtained in the study. Higher scores correspond to greater pain levels. Is this study an experiment or an observational study? Explain. Reduction in Pain Level After Magnet Treatment: n=20, x=0.485, s=0.963 Reduction in Pain Level After Sham Treatment: n=20, x=0.435, s= 1.41
is not; greater than (must get s = # cm and put that number and the standard deviation provided in absolute value brackets and subtract to get the answer to the wordy part)
The approximation _____ accurate because the error of the range rule of thumb's approximation is _______________ 1.9 cm.
The waiting line represented by the bottom boxplot is better because the times have much less variation, so fewer customers have to wait a significantly longer time. (real answers: bottom; much less variation; fewer customers have to wait a significantly longer time) (the smallest boxplot is the answer & it comes with the same two other answers)
The boxplots shown below represent customer waiting times (minutes) for two different waiting lines. Which line would you prefer, or does it not make a difference? Explain. (#50)
The z scores are numbers without units of measurement.
The original pulse rates are measured with units of "beats per minute". What are the units of the corresponding z scores?
With a data set that is so small, the true nature of the distribution cannot be seen with a histogram.
The population of ages at inauguration of all U.S. Presidents who had professions in the military is 62, 46, 68, 64, 57. Why does it not make sense to construct a histogram for this data set?
0, 1, 2, 3, ... is not discrete
The random variable x represents the number of phone calls an author receives in a day, and it has a Poisson distribution with a mean of 6.2 calls. What are the possible values of x? Is a value of x=2.6 possible? Is x a discrete random variable or a continuous random variable? What are the possible values of x? Is a value of x=2.6 possible? Is x a discrete random variable or a continuous random variable? A value of x=2.6 _________ possible because x is a ___________ random variable.
The selections are dependent, because the selection is done without replacement. Yes, because the sample size is less than 5% of the population.
There are 15,958,866 adults in a region. If a polling organization randomly selects 1235 adults without replacement, are the selections independent or dependent? If the selections are dependent, can they be treated as independent for the purposes of calculations? Are the selections independent or dependent? If the selections are dependent, can they be treated as independent for the purposes of calculations?
sample space
The _______ for a procedure consists of all possible simple events or all outcomes that cannot be broken down any further.
The histogram appears to depict a normal distribution. The frequencies generally increase to a maximum and then decrease, and the histogram is roughly symmetric.
Use the frequency distribution to construct a histogram. Does the histogram appear to depict data that have a normal distribution? Why or why not? Does the histogram appear to depict data that have a normal distribution? (#31)
No; the original population is normally distributed, so the sample means will be normally distributed for any sample size.
Weights of golden retriever dogs are normally distributed. Samples of weights of golden retriever dogs, each of size n=15, are randomly collected and the sample means are found. Is it correct to conclude that the sample means cannot be treated as being from a normal distribution because the sample size is too small? Explain.
Since the probability of each digit being selected is equal, lottery digits have a uniform distribution, not a normal distribution.
What's wrong with the following statement? "Because the digits 0, 1, 2, . . . , 9 are the normal results from lottery drawings, such randomly selected numbers have a normal distribution."
z-score
When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a _______.
probability of selecting an adult with blue eyes.; probability of selecting an adult who does not have blue eyes.
When randomly selecting an adult, A denotes the event of selecting someone with blue eyes. What do P(A) and P(A) (With a line over A) represent? P(A) represents the: P(A) (with line over A) represents the:
The probability of getting a male, given that someone with blue eyes has been selected. No, because P(B|M) represents the probability of getting someone with blue eyes, given that a male has been selected.
When randomly selecting adults, let M denote the event of randomly selecting a male and let B denote the event of randomly selecting someone with blue eyes. What does P(M|B) represent? Is P(M|B) the same as P(B|M)? What does P(M|B) represent? Is P(M|B) the same as P(B|M)?
the corresponding z-score is negative.
Whenever a data value is less than the mean, _______.
Variation
Which characteristic of data is a measure of the amount that the data values vary?
Range
Which measure of variation is most sensitive to extreme values?
Number of suitcases on a plane.
Which of the following consists of discrete data?
Quantitative
Which of the following is NOT a level of measurement?
Mean
Which of the following is NOT a value in the 5-number summary?
Data that were obtained from an entire population.
Which of the following is associated with a parameter?
(Squared root of 2), 5/3, -0.58, 1.23
Which of the following values cannot be probabilities? 1, squared root of 2, 0, 0.06, 1.23, 5/3, −0.58, 3/5