Statistics Final Ch. 1-3
A _______ helps us understand the nature of the distribution of a data set.
Frequency Distribution
A(n) _______ uses line segments to connect points located directly above class midpoint values.
Frequency Polygon
Which of the following is NOT a value in the 5-number summary?
mean
When drawings of objects are used to depict data, false impressions can be made. These drawings are called _______.
pictographs
In modified boxplots, a data value is a(n) _______ if it is above Q3plus+(1.5)(IQR) or below Q1minus−(1.5)(IQR).
outlier
A _______ is a plot of paired data (x,y) and is helpful in determining whether there is a relationship between the two variables.
scatterplot
The bars in a histogram _______.
touch
When calculating standard deviation
use the calculator and the handouts for ch. 3
The square of the standard deviation is called the _______.
variance v=Standard dev^2
Is it possible to identify the exact values of all of the original service times?
No, the data values in each class could take on any value between the class limits, inclusive.
The population of ages at inauguration of all U.S. Presidents who had professions in the military is 62, 46, 68, 64, 57. Why does it not make sense to construct a histogram for this data set?
With a data set that is so small, the true nature of the distribution cannot be seen with a histogram.
Methods used that summarize or describe characteristics of data are called _______ statistics.
descriptive
The heights of the bars of a histogram correspond to _____________ values.
frequency
What is a scatterplot and how does it help us?
A scatterplot is a graph of paired (x, y) quantitative data. It provides a visual image of the data plotted as points, which helps show any patterns in the data.
A histogram aids in analyzing the _______ of the data.
The shape of the distribution
Whenever a data value is less than the mean, _______.
the corresponding z-score is negative
The measure of center that is the value that occurs with the greatest frequency is the _______.
mode
A frequency table of grades has five classes (A, B, C, D, F) with frequencies of 3, 10, 14, 8, and 2 respectively. What are the relative frequencies of the five classes?
.08 .27 .38 .22 .05
Which of the following is always true?
A. In a symmetric and bell-shaped distribution, the mean, median, and mode are the same.<-- Correct B.The mean and median should be used to identify the shape of the distribution. C.For skewed data, the mode is farther out in the longer tail than the median. D.Data skewed to the right have a longer left tail than right tail.
Which sampling method divides the population up into sections, randomly selects some of those sections, then chooses all the members from the selected sections to study?
Cluster
A magazine published a list consisting of the state tax on each gallon of gas. If we add the 50 state tax amounts and then divide by 50, we get 27.3 cents. Is the value of 27.3 cents the mean amount of state sales tax paid by all U.S. drivers? Why or why not?
No, the value of 27.3 cents is not the mean because the 50 amounts are all weighted equally in the calculation, but some states consume more gas than others, so the mean amount of state sales tax should be calculated using a weighted mean.
Which measure of variation is very sensitive to extreme values?
Range
For data sets having a distribution that is approximately bell-shaped, _______ states that about 68% of all data values fall within one standard deviation from the mean.
The empirical rule
In this section we use r to denote the value of the linear correlation coefficient. Why do we refer to this correlation coefficient as being linear?
The term linear refers to a straight line, and r measures how well a scatterplot fits a straight-line pattern.
When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a _______.
Z-score
A _______ is a graph of each data value plotted as a point.
dot plot
We utilize statistical _______ to look for features that reveal some useful or interesting characteristics of the data set.
graphs
Which of the following is NOT a characteristic of the mean?
A. The mean is sensitive to outliers. B. The mean is relatively reliable. C. The mean takes every data value into account. D. The mean is called the average by statisticians.<--correct answer
Identify the symbols used for each of the following: (a) sample standard deviation; (b) population standard deviation; (c) sample variance; (d) population variance.
A. The symbol for sample standard deviation is s. b. The symbol for population standard deviation is σ. c. The symbol for sample variance is s^2 d. The symbol for population variance is σ^2.
Which of the following is NOT true about statistical graphs?
A. They utilize areas or volumes for data that are one-dimensional in nature.<-- Correct answer B. They can be used to identify extreme data values. C. Similar graphs can be constructed in order to compare data sets. D. They can be used to consider the overall shape of the distribution.
Identify which type of sampling is used: random, systematic, convenience, stratified, or cluster. To determine customer opinion of their check dash in servicecheck-in service, American Airlines randomly selects 60 flights during a certain week and surveys all passengers on the flight.
Cluster
A study of an association between which ear is used for cell phone calls and whether the subject is left-handed or right-handed began with a survey e-mailed to 5000 people belonging to an otology online group, and 717 surveys were returned. (Otology relates to the ear and hearing.) What percentage of the 5000 surveys were returned? Does that response rate appear to be low? In general, what is a problem with a very low response rate?
Convert to percentage 14%. It appears to be low. It creates a serious potential for getting a biased sample that consists of those with a special interest in the topic.
Explain the difference between a single-blind and a double-blind experiment.
In a single-blind experiment, the subject does not know which treatment is received. In a double-blind experiment, neither the subject nor the researcher in contact with the subject knows which treatment is received.
Refer to the accompanying data set and use the 30 screw lengths to construct a frequency distribution. Begin with a lower class limit of 0.720 in., and use a class width of 0.010 in. The screws were labeled as having a length of 3/4 in.
Length frequency 0.720-0.729 2 0.730-0.739 3 0.740-0.749 11 0.750-0.759 11 0.760-0.769 3
Identify which of these designs is most appropriate for the given experiment: completely randomized design, randomized block design, or matched pairs design. A drug is designed to treat insomnia. In a clinical trial of the drug, amounts of sleep each night are measured before and after subjects have been treated with the drug.
Matched pairs design
Refer to the table summarizing service times (seconds) of dinners at a fast food restaurant. How many individuals are included in the summary? Is it possible to identify the exact values of all of the original service times?
No. The data values in each class could take on any value between the class limits, inclusive.
If we find that there is a linear correlation between the concentration of carbon dioxide in our atmosphere and the global temperature, does that indicate that changes in the concentration of carbon dioxide cause changes in the global temperature?
No. The presence of a linear correlation between two variables does not imply that one of the variables is the cause of the other variable.
Suppose that you need to create a list of n values that have a specific known mean. Some of the n values can be freely selected. How many of the n values can be freely assigned before the remaining values are determined? (The result is referred to as the number of degrees of freedom.)
Of the n values, n−1 can be freely selected because the remaining value(s) can be expressed in terms of the assigned values and the known mean.
_______ are sample values that lie very far away from the majority of the other sample values.
Outlier
Identify the type of sampling used (random, systematic, convenience, stratified, or cluster sampling) in the situation described below. A manman experienced a tax audit. The tax department claimed that the man was audited because he was randomly selected from all the tax payers.
Random
Identify which of these types of sampling is used: random, systematic, convenience, stratified, or cluster. A large company wants to administer a satisfaction survey to its current customers. Using their customer database, the company randomly selects 60 customers and asks them about their level of satisfaction with the company.
Random
Identify the type of sampling used (random, systematic, convenience, stratified, or cluster sampling) in the situation described below. A womanwoman is selected by a marketing company to participate in a paid focus group. The company says that the woman was selected because she was randomly chosen from all adults.
Random Sampling
______ is used when subjects are assigned to different groups through a process of random selection.
Randomization
Below are the jersey numbers of 11 players randomly selected from a football team. Find the range, variance, and standard deviation for the given sample data. What do the results tell us? 26, 49, 12, 77, 55, 59, 40, 92, 70, 99, 27
Range equals=87 Sample standard deviation equals =27.9 Sample variance equals=778.4 Jersey numbers are nominal data that are just replacements for names, so the resulting statistics are meaningless.
In a _______ distribution, the frequency of a class is replaced with a proportion or percent.
Relative Frequency Distribution
Which of the following corresponds to the case when every sample of size n has the same chance of being chosen?
Simple Random Sample
Identify which of these types of sampling is used: random, systematic, convenience, stratified, or cluster. To determine her breathing ratebreathing rate, Carrie divides up her day into three parts: morning, afternoon, and evening. She then measures her breathing rate at 4 randomly selected times during each part of the day.
Stratified
Which sampling method subdivides the population into categories sharing similar characteristics and then selects a sample from each subdivision?
Stratified
Identify the type of sampling used (random, systematic, convenience, stratified, or cluster sampling) in the situation described below. A researcher selects every 221th social security number and surveys the corresponding person.
Systematic
In a study designed to test the effectiveness of a medication as a treatment for lower back pain, 1643 patients were randomly assigned to one of three groups: (1) the 547 subjects in the placebo group were given pills containing no medication; (2) 550 subjects were in a group given pills with the medication taken at regular intervals; (3) 546 subjects were in a group given pills with the medication to be taken when needed for pain relief. In what specific way was replication applied in the study?
The group sample sizes are all large so the researchers could see the effects of the treatment.
Heights of statistics students were obtained by a teacher as part of an experiment conducted for the class. The last digit of those heights are listed below. Construct a frequency distribution with 10 classes. Based on the distribution, do the heights appear to be reported or actually measured? What can be said about the accuracy of the results?
The heights appear to be reported because there are disproportionately more 0s and 5s. They are likely not very accurate because they appear to be reported.
The table shows the magnitudes of the earthquakes that have occurred in the past 10 years. Use the frequency distribution to construct a histogram. Does the histogram appear to be skewed? If so, identify the type of skewness.
The histogram has a longer right tail, so the distribution of the data is skewed to the right.
Listed below are the jersey numbers of 1111 players randomly selected from the roster of a championship sports team. What do the results tell us?
The jersey numbers are nominal data and they do not measure or count anything, so the resulting statistics are meaningless.
One common system for computing a grade point average (GPA) assigns 4 points to an A, 3 points to a B, 2 points to a C, 1 point to a D, and 0 points to an F. What is the GPA of a student who gets an A in a 33-credit course, a B in each of two 2-credit courses, a C in a 3-credit course, and a D in a 2-credit course?
The mean grade point average is a 2.7
If we collect a large sample of blood platelet counts and if our sample includes a single outlier, how will that outlier appear in a histogram?
The outlier will appear as a bar far from all of the other bars with a height that corresponds to a frequency of 1.
If your score on your next statistics test is converted to a z score, which of these z scores would you prefer: −2.00, −1.00, 0, 1.00, 2.00? Why?
The z score of 2.00 is most preferable because it is 2.00 standard deviations above the mean and would correspond to the highest of the five different possible test scores.
For a data set of brain volumes (cm3) and IQ scores of four males, the linear correlation coefficient is found and the P-value is 0.336. Write a statement that interprets the P-value and includes a conclusion about linear correlation.
The P-value indicates that the probability of a linear correlation coefficient that is at least as extreme is 33.6, which is high, so there is not sufficient evidence to conclude that there is a linear correlation between brain volume and IQ score in males.
Which of the following is a common distortion that occurs in graphs?
Using a two-dimensional object to represent data that are one-dimensional in nature
Which characteristic of data is a measure of the amount that the data values vary?
Variations
Which of the following is NOT a property of the standard deviation?
When comparing variation in samples with very different means, it is good practice to compare the two sample standard deviations.
In a graph, if one or both axes begin at some value other than zero, the differences are exaggerated. This bad graphing method is known as _______.
a non-zero axis
a. A statistics class with 36 students is arranged so that there are 6 rows with 6 students in each row, and the rows are numbered from 1 through 6. A die is rolled and a sample consists of all students in the row corresponding to the outcome of the die. b. For the same class described in part (a), the 36 student names are written on 36 individual index cards. The cards are shuffled and six names are drawn from the top. c. For the same class described in part (a), the six youngest students are selected.
a. This sample is not a simple random sample. It is a random sample. b. This sample is a simple random sample. It is a random sample. c. This sample is not a simple random sample. It is not a random sample.
Which of the following is NOT a measure of center?
census
A _______ histogram has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.
relative frequency
The Range Rule of Thumb roughly estimates the standard deviation of a data set as _______.
s= range/4
A data value is considered _______ if its z-score is less than minus−2 or greater than 2.
significantly low or significantly high
Class width is found by _______.
subtracting a lower class limit from the next consecutive lower class limit
A study is conducted to measure children's growth rates without any treatment applied to the children. What best classifies this study?
Observational
Find the mean of the data summarized in the given frequency distribution. Compare the computed mean to the actual mean of 51.1 miles per hour.
The computed mean is not close to the actual mean because the difference between the means is morethan 5%.
Are the data reported or measured?
The data appears to be measured. The heights occur with roughly the same frequency or The data appears to be reported. Certain heights occur a disproportionate number of times.
For a data set of weights (pounds) and highway fuel consumption amounts (mpg) of six types of automobile, the linear correlation coefficient is found and the P-value is 0.025. Write a statement that interprets the P-value and includes a conclusion about linear correlation.
The P-value indicates that the probability of a linear correlation coefficient that is at least as extreme is 2.5%, which is low, so there is sufficient evidence to conclude that there is a linear correlation between weight and highway fuel consumption in automobiles.
A z score (or standard score or standardized value) is the number of standard deviations, s or σ, that a given value x is above or below the mean x or μ. The z score is calculated by using one of the equations shown below.
look on desktop
A value at the center or middle of a data set is a(n) _______.
measure of center
p-values
only a small P-value, such as .05 or less (5% chance or less) suggests that the sample results are not likely to occur by chance when there is no linear correlation, so a small P-value supports a conclusion that there is a linear correlation between the two variables.