Chapter 2: Descriptive Statistics
*symmetric* frequency distribution
When a vertical line can be drawn through the middle of a graph of the distribution and the resulting halves are approximately mirror images. (p. 73)
*uniform* (or *rectangular*) frequency distribution
When all entries, or classes, in the distribution have equal or approximately equal frequencies. A uniform distribution is also symmetric. (p. 73)
*skewed* frequency distribution
When the "tail" of the graph elongates more to one side than to the other. A distribution is *skewed left (negatively skewed)* when its tail extends to the left. A distribution is *skewed right (positively skewed)* when its tail extends to the right. (p. 73)
*(Y.T.I.) Exercise 6 (p. 88):* Use the Empirical Rule. The mean speed of a sample of vehicles along a stretch of highway is 65 miles per hour, with a standard deviation of 5 miles per hour. Estimate the percent of vehicles whose speeds are between 55 miles per hour and 75 miles per hour. (Assume the data set has a bell-shaped distribution.) Approximately *__%* of vehicles travel between 55 miles per hour and 75 miles per hour.
Correct Answer: *95%*
range
The difference between the maximum and minimum data entries. (p. 40)
class width
The distance between lower (or upper) limits of consecutive classes. (p. 40)
*Construct an Ogive (Cumulative Frequency Graph)*
1.) Construct a frequency distribution that includes cumulative frequencies as one of the columns. 2.) Specify the horizontal and vertical scales. The horizontal scale consists of upper class boundaries, and the vertical scale measures cumulative frequencies. 3.) Plot points that represent the upper class boundaries and their corresponding cumulative frequencies. 4.) Connect the points in order from left to right with line segments. 5.) The graph should start at the lower boundary of the first class (cumulative frequency is 0) and should end at the upper boundary of the last class (cumulative frequency is equal to the sample size).
*Finding the Sample Variance and Standard Deviation* (p. 85)
1.) Find the mean of the sample data set. *x_* = (∑*x*)/*N* 2.) Find the deviation of each entry. *x*-*x_* 3.) Square each deviation. (*x*-*x_*)² 4.) Add to get the sum of squares. *SS_x* = ∑(*x*-*x_*)² 5.) Divide by *n*-1 to get the sample variance. *s²* = (∑(*x*-*x_*)²)/*n*-1 6.) Find the square root of the variance to get the sample standard deviation. *s* =√(∑(*x*-*x_*)²)/*n*-1
five number summary
1.) The minimum entry 2.) The first quartile (*Q₁*) 3.) The median (*Q₂*) 4.) The third quartile (*Q₃*) 5.) The maximum entry (p. 105)
interval
A continuum of numeric values with equal intervals that lacks an absolute zero. (p. 40)
outlier
A data entry that is far removed from the other entries in the data set. (p. 70)
cumulative frequency graph
A line graph that displays the cumulative frequency of each class at its upper class boundary. The upper boundaries are marked on the horizontal axis, and the cumulative frequencies are marked on the vertical axis. Also called *ogive* (pronounced ō'jīve). (p. 46)
frequency distribution
A table that shows *classes* or *intervals* of data entries with a count of the number of entries in each class. The *frequency, ƒ* of a class is the number of data entries in the class. (p. 40)
measures of central tendency
A value that represents a typical, or central, entry of a data set. (p. 67)
box-and-whisker plot (or boxplot)
An exploratory data analysis tool that highlights the important features of a data set. (p. 104)
*(Y.T.I.) Exercise 8 (p. 72):* Approximate the mean of the frequency distribution for the ages of the residents of a town (round answer to *one* decimal place). *Age* = *Frequency* *0-9* = *26* *10-19* = *40* *20-29* = *19* *30-39* = *29* *40-49* = *28* *50-59* = *46* *60-69* = *30* *70-79* = *17* *80-89* = *3* The approximate mean age is *___* years.
Correct Answer: *39.4*
coefficient of variation (*CV*)
Describes the standard deviation as a percent of the mean. Population: *CV* = (*σ*/*µ*)×100% Sample: *CV* = (*s*/*x*)×100% (p. 92)
Section 3
Measures of Central Tendency
Section 5
Measures of Position
mode
The data entry that occurs with the greatest frequency. A data set can have one mode, more than one mode, or no mode. When no entry is repeated, the data set has no mode. When two entries occur with the same greatest frequency, each entry is a mode and the data set is called *bimodal*. (p. 69)
upper class limit
The greatest number that can belong to the class. (p. 40)
lower class limit
The least number that can belong to the class. (p. 40)
population variance
The population variance of a population data set of *N* entries is Population variance = *σ²* = (∑(*x_* - *µ*)²)/*N*. The symbol *σ* is the lowercase Greek letter sigma. (p. 83)
range
The range of a data set is the difference between the maximum and minimum data entries in the set. To find the range, the data must be quantitative. Range = (Maximum data entry) - (Minimum data entry) (p. 82)
The Three *Quartiles*
The three *quartiles*, *Q₁*, *Q₂*, and *Q₃*, divide an ordered data set into four equal parts. About one-quarter of the data fall on or below the *first quartile* (*Q₁*). About one-half of the data fall on or below the *second quartile* (*Q₂*) (the second quartile is the same as the median of the data set). About three-quarters of the data fall on or below the *third quartile* (*Q₃*). (p. 102)
bimodal
When two entries occur with the same greatest frequency. (p. 69)
relative frequency
the portion, or percentage, of the data that falls in that class. To find the *relative frequency* of a class, divide the frequency, *ƒ*, by the sample sizer, *n*. (p. 42) ~ formula: relative frequency = *class frequency/sample size* = *ƒ/n* ~ Note that *n* = *∑ƒ*.
mean of a frequency distribution
For a sample, it is estimated by *x_* = (∑(*x*)(*ƒ*))/*n* (*Note that *n* = ∑(*ƒ*).*) where *x* and *ƒ* are the midpoint and frequency of each class, respectively. (p. 72)
Empirical Rule (or *68-95-99.7* Rule)
For data sets with distributions that are approximately symmetric and bell-shaped (see figure above), the standard deviation has these characteristics. 1.) About 68% of the data lie within one standard deviation of the mean. 2.) About 95% of the data lie within two standard deviations of the mean. 3.) About 99.7% of the data lie within three standard deviations of the mean. (p. 88)
Section 1
Frequency Distributions & Their Graphs
deviation
In a population data set, the difference between the entry and the mean m of the data set. Deviation of *x* = *x* - µ (p. 83)
classes
Intervals for a frequency distribution of quantitative data. (p. 40)
Section 4
Measures of Variation
Fractiles
Numbers that partition, or divide, an ordered data set into equal parts (each part has the same number of data entries). (p. 102)
Interquartile Range (IQR)
The *interquartile range (IQR)* of a data set is a measure of variation that gives the range of the middle portion (about half) of the data. The IQR is the difference between the third and first quartiles. IQR= *Q₃* - *Q₁* (p. 104)
*Constructing a Frequency Distribution from a Data Set* (p. 40)
1.) Decide on the number of classes to include in the frequency distribution. The number of classes should be between 5 and 20; otherwise, it may be difficult to detect any patterns. 2.) Find the class width as follows. Determine the range of the data, divide the range by the number of classes, and *round up to the next convenient number*. 3.) Find the class limits. You can use the minimum data entry as the lower limit of the first class. To find the remaining lower limits, add the class width to the lower limit of the preceding class. Then find the upper limit of the first class. Remember that classes cannot overlap. Find the remaining upper class limits. 4.) Make a tally mark for each data entry in the row of the appropriate class. 5.) Count the tally marks to find the total frequency *ƒ* for each class.
*Using the Interquartile Range to Identify Outliers*
1.) Find the first (*Q₁*) and third (*Q₃*) quartiles of the data set. 2.) Find the interquartile range: IQR = *Q₃* - *Q₁* 3.) Multiply IQR by 1.5: 1.5×(IQR). 4. Subtract 1.5×(IQR) from *Q₁*. Any data entry less than *Q₁*-1.5×(IQR) is an outlier. 5.) Add 1.5×(IQR) to *Q₃*. Any data entry greater than *Q₃*+1.5×(IQR) is an outlier.
*Drawing a Box-and-Whisker Plot*
1.) Find the five-number summary of the data set. 2.) Construct a horizontal scale that spans the range of the data. 3.) Plot the five numbers above the horizontal scale. 4.) Draw a box above the horizontal scale from *Q₁* to *Q₃* and draw a vertical line in the box at *Q₂*. 5.) Draw whiskers from the box to the minimum and maximum entries.
*Finding the *Population* Variance and Standard Deviation*
1.) Find the mean of the population data set. *µ* = (∑*x*)/*N* 2.) Find the deviation of each entry. *x* - *µ* 3.) Square each deviation. (*x* - *µ*)² 4.) Add to get the sum of squares. *SS_x* = ∑(*x* - *µ*)² 5.) Divide by N to get the population variance. *σ²* = (∑(*x* - *µ*)²)/*N* 6.) Find the square root of the variance to get the population standard deviation. *σ* = √(∑(*x* - *µ*)²)/*N*
*Finding the Mean of a Frequency Distribution* (p. 72)
1.) Find the midpoint of each class. *x* = (Lower limit + Upper limit)/2 2.) Find the sum of the products of the midpoints and the frequencies. ∑(*x*)(*ƒ*) 3.) Find the sum of the frequencies. *n* = ∑(*ƒ*) 4.) Find the mean of the frequency distribution. *x_* = (∑(*x*)(*ƒ*))/*n*
frequency polygon
A line graph that emphasizes the continuous change in frequencies. (p. 45)
standard score (or z-score)
The *standard score*, or *"z"-score, represents the number of standard deviations a value x lies from the mean m. To find the z-score for a value, use the formula *z* = (Value - Mean)/Standard Deviation = (*x* - *µ*)/*σ*. (p. 107)
*(Y.T.I.) Exercise 2 (p. 43):* Use the frequency distribution shown below to construct an expanded frequency distribution. *High Temperatures (°F)* *Class* *(1):* 20-30 *(2):* 31-41 *(3):* 42-52 *(4):* 53-63 *(5):* 64-74 *(6):* 75-85 *(7):* 86-96 *Frequency, ƒ* *(1):* 18 *(2):* 44 *(3):* 66 *(4):* 67 *(5):* 79 *(6):* 66 *(7):* 25 Complete the table below. (Round answers to the *nearest hundredth* (*two* decimal places).) *High Temperatures (°F)* Class (I) Frequency, ƒ (II) *Midpoint (III)* *Relative frequency (IV)* *Cumulative frequency (V)* *(1):* *I:* 20-30 *II:* 18 *III:* *__* *IV:* *___* *V:* *__* *(2):* *I:* 31-41 *II:* 44 *III:* *__* *IV:* *___* *V:* *__* *(3):* *I:* 42-52 *II:* 66 *III:* *__* *IV:* *___* *V:* *___* *(4):* *I:* 53-63 *II:* 67 *III:* *__* *IV:* *___* *V:* *___* *(5):* *I:* 64-74 *II:* 79 *III:* *__* *IV:* *___* *V:* *___* *(6):* *I:* 75-85 *II:* 66 *III:* *__* *IV:* *___* *V:* *___* *(7):* *I:* 86-96 *II:* 25 *III:* *__* *IV:* *___* *V:* *___*
Correct Answers (*Midpoint*; *"Relative" Frequency*; *"Cumulative" Frequency*): *(1):* *25*; *0.05*; *18* *(2):* *36*; *0.12*; *62* *(3):* *47*; *0.18*; *128* *(4):* *58*; *0.18*; *195* *(5):* *69*; *0.22*; *277* *(6):* *80*; *0.18*; *340* *(7):* *91*; *0.07*; *365*
*(Y.T.I.) Exercise 1 (p. 82):* The depths (in inches) at which 10 artifacts are found are listed. Complete parts 1 (a) and 2 (b) below (round both answers to the *nearest tenth* (*one* decimal place).) *22.9 36.6 34.6 31.3 46.6 32.7 28.3 24.6 36.4 28.2* *Part 1 (a):* Find the range of the data set. Range = *___* *Part 2 (b):* Change 46.6 to 50.7 and find the range of the new data set. Range = *___*
Correct Answers: *Part 1 (a):* *23.7* *Part 2 (b):* *27.8*
*(Y.T.I.) Exercise 5 (p. 87):* Compare the three data sets below. (Since I don't have Quizlet+, I can't insert the images of the actual dot plots; ergo, I pasted their descriptions.) *(i):* A dot plot has a horizontal axis labeled from *4 to 10* in *increments of "1"*. The graph consists of a series of plotted points from left to right. The coordinates of the plotted points are as follows, where the *"label"* is listed *first*, and the *number of dots* is listed *second*: *(5, 2); (6, 3); (7, 5); (8, 3); (9, 2)*. *(ii):* A dot plot has a horizontal axis labeled from *4 to 10* in *increments of "1"*. The graph consists of a series of plotted points from left to right. The coordinates of the plotted points are as follows, where the *"label"* is listed *first*, and the *number of dots* is listed *second*: *(6, 4); (7, 7); (8, 4)*. *(iii):* A dot plot has a horizontal axis labeled from *4 to 10* in *increments of "1"*. The graph consists of a series of plotted points from left to right. The coordinates of the plotted points are as follows, where the *"label"* is listed *first*, and the *number of dots* is listed *second*: *(5, 1); (6, 4); (7, 5); (8, 4); (9, 1)*. *Part 1 (a):* Which data set has the greatest sample standard deviation? A.) Data set (i), because it has more entries that are farther away from the mean. B.) Data set (ii), because it has more entries that are close to the mean. C.) Data set (iii), because it has two entries that are far away from the mean. *Part 2 (a):* Which data set has the least sample standard deviation? A.) Data set (i), because it has more entries that are farther away from the mean. B.) Data set (ii), because it has more entries that are close to the mean. C.) Data set (iii), because it has less entries that are farther away from the mean. *Part 3 (b):* How are the data sets the same? How do they differ? A.) The three data sets have the same standard deviations but have different means. B.) The three data sets have the same mean, median, and mode but have different standard deviations. C.) The three data sets have the same mean and mode but have different medians and standard deviations. D.) The three data sets have the same mode but have different standard deviations and means.
Correct Answers: *Part 1 (a):* A.) Data set (i), because it has more entries that are farther away from the mean. *Part 2 (a):* B.) Data set (ii), because it has more entries that are close to the mean. *Part 3 (b):* B.) The three data sets have the same mean, median, and mode but have different standard deviations.
*(Y.T.I.) Exercise 3 (p. 68):* The tuition and fees (in thousands of dollars) for the top 14 universities in a recent year are listed below. Find the mean (part A (1 & 2)), median (part B (3 & 4)), and mode of the data (part C (5 & 6)), if possible. If any of these measures cannot be found or a measure does not represent the center of the data, explain why. *35 47 44 44 41 44 39 44 45 45 41 45 44 46* *Part 1 (a):* Find the mean cost. Select the correct choice below and, if necessary, fill in the answer box to complete your choice. (Round answer to *one* decimal place.) A.) The mean cost is *___*. B.) There is no mean cost. *Part 2 (a):* Does the mean represent the center of the data? A.) The mean represents the center. B.) The mean does not represent the center because it is the largest data value. C.) The mean does not represent the center because it is not a data value. D.) The mean does not represent the center because it is the smallest data value. E.) There is no mean cost. *Part 3 (b):* Find the median cost. Select the correct choice below and, if necessary, fill in the answer box to complete your choice. (Round answer to *one* decimal place.) A.) The median cost is *__*. B.) There is no median cost. *Part 4 (b):* Does the median represent the center of the data? A.) The median represents the center. B.) The median does not represent the center because it is the smallest data value. C.) The median does not represent the center because it is the largest data value. D.) The median does not represent the center because it is not a data value. E.) There is no median cost. *Part 5 (c):* Find the mode of the costs. Select the correct choice below and, if necessary, fill in the answer box to complete your choice. (Round answer(s) to *one* decimal place, and use a *comma* to separate answers (if needed).) A.) The mode(s) of the costs is (are) *__*. B.) There is no mode. *Part 6 (c):* Does (Do) the mode(s) represent the center of the data? A.) The mode(s) represent(s) the center. B.) The mode(s) does (do) not represent the center because it (one) is the smallest data value. C.) The mode(s) does (do) not represent the center because it (one) is the largest data value. D.) There is no mode. E.) The mode(s) does (do) not represent the center because it (they) is (are) not a data value.
Correct Answers: *Part 1 (a):* A.) The mean cost is *43.1*. *Part 2 (a):* A.) The mean represents the center. *Part 3 (b):* A.) The median cost is *44*. *Part 4 (b):* A.) The median represents the center. *Part 5 (c):* A.) The mode(s) of the costs is (are) *44*. *Part 6 (c):* A.) The mode(s) represent(s) the center.
*(Y.T.I.) Exercise 1 (p. 67):* The number of credits being taken by a sample of 13 full-time college students are listed below. Find the mean (part A (1 & 2)), median (part B (3 & 4)), and mode of the data (part C (5 & 6)), if possible. If any measure cannot be found or does not represent the center of the data, explain why. *7 9 10 10 7 6 6 6 8 6 6 6 7* *Part 1 (a):* Find the mean. Select the correct choice below and, if necessary, fill in the answer box to complete your choice. (Type answer as either an *integer* or a *decimal* rounded to *one* decimal place.) A.) The mean is *__*. B.) The data set does not have a mean. *Part 2 (a):* Does the mean represent the center of the data? A.) The mean represents the center. B.) The mean does not represent the center because it is the largest data value. C.) The mean does not represent the center because it is the smallest data value. D.) The mean does not represent the center because it is not a data value. E.) The data set does not have a mean. *Part 3 (b):* Find the median. Select the correct choice below and, if necessary, fill in the answer box to complete your choice. (Type answer as either an *integer* or a *decimal* rounded to *one* decimal place.) A.) The median is *_*. B.) The data set does not have a median. *Part 4 (b):* Does the median represent the center of the data? A.) The median represents the center. B.) The median does not represent the center because it is the smallest data value. C.) The median does not represent the center because it is the largest data value. D.) The median does not represent the center because it is not a data value. E.) The data set does not have a median. *Part 5 (c):* Find the mode. Select the correct choice below and, if necessary, fill in the answer box to complete your choice. (Type answer(s) as either an *integer* or a *decimal* rounded to *one* decimal place, and use a *comma* to separate answers (if needed).) A.) The mode(s) is/are *_*. B.) The data set does not have a mode. *Part 6 (c):* Does (Do) the mode(s) represent the center of the data? A.) The mode(s) represent(s) the center. B.) The mode(s) does (do) not represent the center because it (one) is the smallest data value. C.) The mode(s) does (do) not represent the center because it (they) is (are) not a data value. D.) The data set does not have a mode. E.) The mode(s) does (do) not represent the center because it (one) is the largest data value.
Correct Answers: *Part 1 (a):* A.) The mean is *7.2*. *Part 2 (a):* A.) The mean represents the center. *Part 3 (b):* A.) The median is *7*. *Part 4 (b):* A.) The median represents the center. *Part 5 (c):* A.) The mode(s) is/are *6*. *Part 6 (c):* B.) The mode(s) does (do) not represent the center because it (one) is the smallest data value.
*(Y.T.I.) Exercise 4 (p. 69):* The responses of a sample of 5,257 shoppers who were asked how their purchases are made are shown in the table. Find the mean (part A (1 & 2)), the median (part B (3 & 4)), and the mode of the data (part C (5 & 6)), if possible. If any measure cannot be found or does not represent the center of the data, explain why. *How Purchases Are Made (i)*: *Frequency, f (ii)* *Research online and in store, buy in store*: *1,034* *Search and buy online*: *2,220* *Search and buy in store*: *1,179* *Research online and in store, buy online*: *824* *Part 1 (a):* Find the mean. Select the correct choice below and, if necessary, fill in the answer box to complete your choice. (Round answer to *one* decimal place.) A.) The mean is *__*. B.) The mean cannot be calculated because there is an even number of data entries. C.) The mean cannot be calculated because the sample size is too small. D.) The mean cannot be calculated because the data are at the nominal level of measurement. *Part 2 (a):* Does the mean represent the center of the data? Choose the correct answer below. A.) The mean represents the center of the data set. B.) The mean does not represent the center because it is the greatest data entry. C.) The mean does not represent the center because it is not a data entry. D.) The mean does not represent the center because it is the least data entry. E.) The data set does not have a mean. *Part 3 (b):* Find the median. Select the correct choice below and, if necessary, fill in the answer box to complete your choice. (Round answer to *one* decimal place.) A.) The median is *__*. B.) The median cannot be calculated because the sample size is too small. C.) The median cannot be calculated because the data are at the nominal level of measurement. D.) The median cannot be calculated because there is an even number of data entries. *Part 4 (b):* Does the median represent the center of the data? Choose the correct answer below. A.) The median represents the center of the data set. B.) The median does not represent the center because it is the least data entry. C.) The median does not represent the center because it is the greatest data entry. D.) The median does not represent the center because it is not a data entry. E.) The data set does not have a median. *Part 5 (c):* Find the mode. Choose the correct answer below. A.) Search and buy online B.) Research online and in store, buy online C.) Search and buy in store D.) Research online and in store, buy in store E.) The data set does not have a mode. *Part 6 (c):* Does the mode represent a typical entry of the data? Choose the correct answer below. A.) The mode represents a typical entry of the data set. B.) The mode does not represent the center because it is the last data entry in the table. C.) The mode does not represent the center because it is not a data entry. D.) The mode does not represent the center because it is the first data entry in the table. E.) The data set does not have a mode.
Correct Answers: *Part 1 (a):* D.) The mean cannot be calculated because the data are at the nominal level of measurement. *Part 2 (a):* E.) The data set does not have a mean. *Part 3 (b):* C.) The median cannot be calculated because the data are at the nominal level of measurement. *Part 4 (b):* E.) The data set does not have a median. *Part 5 (c):* A.) Search and buy online *Part 6 (c):* A.) The mode represents a typical entry of the data set.
*(Y.T.I.) Exercise 3 (p. 44):* Construct a frequency distribution and a frequency histogram for the data set using the indicated number of classes. Describe any patterns. Number of classes: 8 Data set: Reaction times (in milliseconds) of 30 adult females to an auditory stimulus *430 291 380 336 514 422 388 426 375 313 443 390 353 472 388 413 441 425 304 455 305 307 327 414 450 388 322 361 507 415* *Part 1:* Construct a frequency distribution of the data. Use the minimum data entry as the lower limit of the first class. *Class (I)* *Frequency (II)* 1.) *I:* *___* - *___* *II:* *_* 2.) *I:* *___* - *___* *II:* *_* 3.) *I:* *___* - *___* *II:* *_* 4.) *I:* *___* - *___* *II:* *_* 5.) *I:* *___* - *___* *II:* *_* 6.) *I:* *___* - *___* *II:* *_* 7.) *I:* *___* - *___* *II:* *_* 8.) *I:* *___* - *___* *II:* *_* *Part 2:* Construct a frequency histogram of the data (since I don't have Quizlet+, I can't insert the images of the actual histograms; ergo, I pasted their descriptions). A.) A frequency histogram has a *horizontal axis* labeled *"Class"* from *1 to 8* in *increments of "1"*, and a *vertical axis* labeled *"Frequency"* from *0 to 10* in *increments of "1"*. There are vertical bars, each of which is over a horizontal axis label. The heights of the bars are as follows, where the *"horizontal axis" label* is listed *first*, and the *height* is listed *second*: *(1, 1); (2, 5); (3, 2); (4, 7); (5, 4); (6, 3); (7, 2); (8, 6)*. B.) A frequency histogram has a *horizontal axis* labeled *"Class"* from *1 to 8* in *increments of "1"*, and a *vertical axis* labeled *"Frequency"* from *0 to 10* in *increments of "1"*. There are vertical bars, each of which is over a horizontal axis label. The heights of the bars are as follows, where the *"horizontal axis"* label is listed *first*, and the *height* is listed *second*: *(1, 2); (2, 1); (3, 4); (4, 7); (5, 6); (6, 2); (7, 3); (8, 5)*. C.) A frequency histogram has a *horizontal axis* labeled *"Class"* from *1 to 8* in *increments of "1"*, and a *vertical axis* labeled *"Frequency"* from *0 to 10* in *increments of "1"*. There are vertical bars, each of which is over a horizontal axis label. The heights of the bars are as follows, where the *"horizontal axis" label* is listed *first*, and the *height* is listed *second*: *(1, 5); (2, 3); (3, 2); (4, 6); (5, 7); (6, 4); (7, 1); (8, 2)*. D.) A frequency histogram has a *horizontal axis* labeled *"Class"* from *1 to 8* in *increments of "1"*, and a *vertical axis* labeled *"Frequency"* from *0 to 10* in *increments of "1"*. There are vertical bars, each of which is over a horizontal axis label. The heights of the bars are as follows, where the *"horizontal axis" label* is listed *first*, and the *height* is listed *second*: *(1, 3); (2, 5); (3, 6); (4, 2); (5, 4); (6, 7); (7, 2); (8, 1)*. *Part 3:* Describe any patterns. Choose the correct answer below. A.) The class with the greatest frequency is class 8. The class with the least frequency is class 2. B.) The class with the greatest frequency is class 1. The class with the least frequency is class 8. C.) The class with the greatest frequency is class 7. The class with the least frequency is class 5. D.) The class with the greatest frequency is class 5. The class with the least frequency is class 7.
Correct Answers: *Part 1* (*I*; *II*): 1.) *291* - *318*; *5* 2.) *319* - *346*; *3* 3.) *347* - *374*; *2* 4.) *375* - *402*; *6* 5.) *403* - *430*; *7* 6.) *431* - *458*; *4* 7.) *459* - *486*; *1* 8.) *487* - *514*; *2* *Part 2:* C.) A frequency histogram has a *horizontal axis* labeled *"Class"* from *1 to 8* in *increments of "1"*, and a *vertical axis* labeled *"Frequency"* from *0 to 10* in *increments of "1"*. There are vertical bars, each of which is over a horizontal axis label. The heights of the bars are as follows, where the *"horizontal axis" label* is listed *first*, and the *height* is listed *second*: *(1, 5); (2, 3); (3, 2); (4, 6); (5, 7); (6, 4); (7, 1); (8, 2)*. *Part 3:* D.) The class with the greatest frequency is class 5. The class with the least frequency is class 7.
*(Y.T.I.) Exercise 5 (p. 46):* Construct a frequency distribution and a relative frequency histogram for the data set using five classes. Which class has the greatest relative frequency and which has the least relative frequency? *Ratings from 1 (lowest) to 10 (highest) from 36 taste testers* *1 5 10 3 9 9 6 10 5 8 8 6 5 9 1 4 10 4 4 6 3 6 6 2 3 9 2 7 3 3 6 5 1 9 4 2* *Part 1:* Construct a frequency distribution for the data using five classes. *Class (I)* *Frequency (II)* A: *I:* *_* - *_* *II:* *_* B: *I:* *_* - *_* *II:* *_* C: *I:* *_* - *_* *II:* *__* D: *I:* *_* - *_* *II:* *_* E: *I:* *_* - *__* *II:* *_* *Part 2:* Which relative frequency histogram below represents the data? (Since I don't have Quizlet+, I can't insert the images of the actual histograms; ergo, I pasted their descriptions.) A.) A histogram titled *"Taste Test Ratings"* has a horizontal axis labeled *"Ratings"* from *1.5 to 9.5* in *increments of "2"*, and a vertical axis labeled *"Relative frequency"* from *0 to 0.4* in *increments of "0.05"*. The histogram contains vertical bars of width 2, where one vertical bar is centered over each of the horizontal axis tick marks. The approximate heights of the vertical bars are listed as follows, where the *"label"* is listed *first*, and the *approximate height* is listed *second*: *(1.5, 0.17); (3.5, 0.25); (5.5, 0.28); (7.5, 0.08); (9.5, 0.22)*. B.) A histogram titled *"Taste Test Ratings"* has a horizontal axis labeled *"Ratings"* from *1.5 to 9.5* in *increments of "2"*, and a vertical axis labeled *"Frequency"* from *0 to 12* in *increments of "2"*. The histogram contains vertical bars of width 2, where one vertical bar is centered over each of the horizontal axis tick marks. The approximate heights of the vertical bars are listed as follows, where the *"label"* is listed *first*, and the *approximate height* is listed *second*: *(1.5, 6); (3.5, 9); (5.5, 10); (7.5, 3); (9.5, 8)*. C.) A histogram titled *"Taste Test Ratings"* has a horizontal axis labeled *"Ratings"* from *1 to 9* in *increments of "2"*, and a vertical axis labeled *"Relative frequency"* from *0 to 0.4* in *increments of "0.05"*. The histogram contains vertical bars of width 2, where one vertical bar is centered over each of the horizontal axis tick marks. The approximate heights of the vertical bars are listed as follows, where the *"label"* is listed *first*, and the *approximate height* is listed *second*: *(1, 0.23); (3, 0.24); (5, 0.13); (7, 0.25); (9, 0.15)*. D.) A histogram titled *"Taste Test Ratings"* has a horizontal axis labeled *"Ratings"* from *1 to 9* in *increments of "2"*, and a vertical axis labeled *"Frequency"* from *0 to 12* in *increments of "2"*. The histogram contains vertical bars of width 2, where one vertical bar is centered over each of the horizontal axis tick marks. The approximate heights of the vertical bars are listed as follows, where the *"label"* is listed *first*, and the *approximate height* is listed *second*: *(1, 6); (3, 9); (5, 10); (7, 3); (9, 8)*. *Part 3:* Class *_(1)_* - *_(2)_* has the greatest relative frequency. (Type answers as *whole numbers* in *ascending* order.) *Part 4:* Class *_(1)_* - *_(2)_* has the least relative frequency. (Type answers as *whole numbers* in *ascending* order.)
Correct Answers: *Part 1* (*I*; *II*): A: *1* - *2*; *6* B: *3* - *4*; *9* C: *5* - *6*; *10* D: *7* - *8*; *3* E: *9* - *10*; *8* *Part 2:* A.) A histogram titled *"Taste Test Ratings"* has a horizontal axis labeled *"Ratings"* from *1.5 to 9.5* in *increments of "2"*, and a vertical axis labeled *"Relative frequency"* from *0 to 0.4* in *increments of "0.05"*. The histogram contains vertical bars of width 2, where one vertical bar is centered over each of the horizontal axis tick marks. The approximate heights of the vertical bars are listed as follows, where the *"label"* is listed *first*, and the *approximate height* is listed *second*: *(1.5, 0.17); (3.5, 0.25); (5.5, 0.28); (7.5, 0.08); (9.5, 0.22)*. *Part 3:* *(1):* *5* *(2):* *6* *Part 4:* *(1):* *7* *(2):* *8*
*(Y.T.I.) Exercise 4 (p. 45):* The accompanying data set lists the numbers of children of world leaders. Use the data to construct a frequency distribution using six classes and to create a frequency polygon. Describe any patterns. *Numbers of Children:* *0 2 2 4 6 17 1 0 6 5 4 0 6 6 5 4 1 0 0 6 2 7 4 4 0 1 5 3 4 3 2 10 5 4 4 17 0 13 9 14 16 16 0* *Part 1:* Complete the frequency distribution table below. Use the minimum data entry as the lower limit of the first class. *Class (I)* *Frequency (II)* *Midpoint (III)* 1.) *I:* *_* - *_* *II:* *__* *III:* *_* 2.) *I:* *_* - *_* *II:* *__* *III:* *_* 3.) *I:* *_* - *_* *II:* *_* *III:* *_* 4.) *I:* *_* - *__* *II:* *_* *III:* *__* 5.) *I:* *__* - *__* *II:* *_* *III:* *__* 6.) *I:* *__* - *__* *II:* *_* *III:* *__* *Part 2:* Create a frequency polygon. Choose the correct graph below (since I don't have Quizlet+, I can't insert the images of the actual frequency polygons; ergo, I have the descriptions). A.) A frequency polygon has a *horizontal axis* labeled *"Number of Children"* from *-3 to 20* in *increments of "2"*, and a *vertical axis* labeled *"Frequency"* from *0 to 18* in *increments of "2"*. Plotted points are connected by line segments from left to right. The heights of the plotted points are as follows, where the *"number of children"* is listed *first*, and the *frequency* is listed *second*: *(2, 0); (1, 16.5); (4, 14); (7, 3.6); (10, 1.6); (13, 2.2); (16, 4); (19, 0)*. B.) A frequency polygon has a *horizontal axis* labeled *"Number of Children"* from *-3 to 20* in *increments of "2"*, and a *vertical axis* labeled *"Frequency"* from *0 to 18* in *increments of "2"*. Plotted points are connected by line segments from left to right. The heights of the plotted points are as follows, where the *"number of children" is listed *first*, and the *frequency* is listed *second*: *(-2, 0); (1, 15); (4, 14); (7, 6); (10, 2); (13, 2); (16, 4); (19, 0)*. C.) A frequency polygon has a *horizontal axis* labeled *"Number of Children"* from *-4 to 21* in *increments of "2"*, and a *vertical axis* labeled *"Frequency"* from *0 to 18* in *increments of "2"*. Plotted points are connected by line segments from left to right. The heights of the plotted points are as follows, where the *"number of children"* is listed *first*, and the *frequency* is listed *second*: *(-2, 0); (1, 15); (4, 16.5); (7, 1.2); (10, 4.8); (13, 4); (16, 2); (19, 0)*. D.) A frequency polygon has a *horizontal axis* labeled *"Number of Children"* from *-3 to 20* in *increments of "2"*, and a *vertical axis* labeled *"Frequency"* from *0 to 18* in *increments of "2"*. Plotted points are connected by line segments from left to right. The heights of the plotted points are as follows, where the *"number of children"* is listed *first*, and the *frequency* is listed *second*: *(-2, 0); (1, 14); (4, 15); (7, 4); (10, 2); (13, 6); (16, 2); (19, 0)*. *Part 3:* Describe any patterns. Choose the correct answer below. A.) The data show that most of the 43 world leaders had more than 17 children. B.) The data show that most of the 43 world leaders had fewer than 2 children. C.) The data show that most of the 43 world leaders had fewer than 6 children. D.) The data show that most of the 43 world leaders had more than 11 children.
Correct Answers: *Part 1* (*I*; *II*; *III*): 1.) *0* - *2*; *15*; *1* 2.) *3* - *5*; *14*; *4* 3.) *6* - *8*; *6*; *7* 4.) *9* - *11*; *2*; *10* 5.) *12* - *14*; *2*; *13* 6.) *15* - *17*; *4*; *16* *Part 2:* B.) A frequency polygon has a *horizontal axis* labeled *"Number of Children"* from *-3 to 20* in *increments of "2"*, and a *vertical axis* labeled *"Frequency"* from *0 to 18* in *increments of "2"*. Plotted points are connected by line segments from left to right. The heights of the plotted points are as follows, where the *"number of children" is listed *first*, and the *frequency* is listed *second*: *(-2, 0); (1, 15); (4, 14); (7, 6); (10, 2); (13, 2); (16, 4); (19, 0)*. *Part 3:* C.) The data show that most of the 43 world leaders had fewer than 6 children.
*(Y.T.I.) Exercise 1 (p. 41):* The data represent the time, in minutes, spent reading a political blog in a day. Construct a frequency distribution using 5 classes. In the table, include the midpoints, relative frequencies, and cumulative frequencies. Which class has the greatest frequency and which has the least frequency? *3 27 0 6 15 29 10 17 12 20 14 19 8 9 20 1 10 16 0 15* *Part 1:* Complete the table, starting with the lowest class limit. (*Simplify* the answers). *Class (I)* *Frequency (II)* *Midpoint (III)* *Relative Frequency (IV)* *Cumulative Frequency (V)* A: *I*: *_* - *_* *II*: *_* *III*: *__* *IV*: *__* *V*: *_* B: *I*: *_* - *__* *II*: *_* *III*: *__* *IV*: *___* *V*: *_* C: *I*: *__* - *__* *II*: *_* *III*: *___* *IV*: *__* *V*: *__* D: *I*: *__* - *__* *II*: *_* *III*: *___* *IV*: *___* *V*: *__* E: *I*: *__* - *__* *II*: *_* *III*: *___* *IV*: *__* *V*: *__* *Part 2:* Which class has the greatest frequency? The class with the greatest frequency is from *__(1)__* to *__(2)__*. *Part 3:* Which class has the least frequency? The class with the least frequency is from *__(1)__* to *__(2)__*.
Correct Answers: *Part 1* (*I*; *II*; *III*; *IV*; *V*): A: *0* - *5*; *4*; *2.5*; *0.2*; *4* B: *6* - *11*; *5*; *8.5*; *0.25*; *9* C: *12* - *17*; *6*; *14.5*; *0.3*; *15* D: *18* - *23*; *3*; *20.5*; *0.15*; *18* D: *24* - *29*; *2*; *26.5*; *0.1*; *20* *Part 2:* *(1):* *12* *(2):* *17* *Part 3:* *(1):* *24* *(2):* *29*
weighted mean
The mean of a data set whose entries have varying weights. The weighted mean is given by *x_* = ∑(*x*)(*w*) (*Sum of the products of the entries and the weights*)/∑(*w*) (*Sum of the weights*) where w is the weight of each entry *x*. (p. 71)
class boundaries
The numbers that separate classes *without* forming gaps between them. (p. 44)
population standard deviation
The population standard deviation of a population data set of *N* entries is the square root of the population variance. Population standard deviation = *σ* = √*σ²* = √Σ((*x_*)(*µ*)²)/*N* (p. 83)
*Chebychev's Theorem*
The portion of any data set lying within *k* standard deviations (*k* > 1) of the mean is at least 1 - (1/(*k²*)). • *k* = 2: In any data set, at least 1 - (1/(2²)) = 3/4, or 75%, of the data lie within 2 standard deviations of the mean. • *k* = 3: In any data set, at least 1 - (1/(3²)) = 8/9, or about 88.9%, of the data lie within 3 standard deviations of the mean. (p. 89)
sample *variance* and sample *standard deviation* (p. 85)
The sample *variance* and sample *standard deviation* of a sample data set of *n* entries are listed below. Sample variance = *s²* = (∑(*x*-*x_*)²)/*n*-1 Sample standard deviation = *s* = √*s²* = √∑(*x*-*x_*)²/*n*-1
mean
The sum of the data entries divided by the number of entries. To find the mean of a data set, use one of these formulas. *Population* Mean: *µ* = (∑*x*)/*N* *Sample* Mean: *x_* = (∑*x*)/*n* The lowercase Greek letter *µ* (pronounced mu) represents the population mean and *x̄* (read as "*x* bar") represents the sample mean. Note that *N* represents the number of entries in a *population*, and *n* represents the number of entries in a *sample*. Recall that the uppercase Greek letter sigma (∑) indicates a summation of values. (p. 67)
cumulative frequency
The sum of the frequencies for that class and all previous classes. The *cumulative frequency* of the last class is equal to the sample size, *n*. (p. 42)
midpoint
The sum of the lower and upper limits of the class divided by two. Sometimes called the *class mark.* (p. 42) ~ formula: midpoint = *((lower class limit) + (upper class limit))/2*
sum of squares
The sum of the squares of the deviations (denoted by *SS_x*). (p. 83)
median
The value that lies in the middle of the data when the data set is ordered. The median measures the center of an ordered data set by dividing it into two equal parts. When the data set has an odd number of entries, the median is the middle data entry. When the data set has an even number of entries, the median is the mean of the two middle data entries. (p. 68)
percentile that corresponds to a specific data entry *x*
To find the *percentile that corresponds to a specific data entry "x"*, use the formula Percentile of *x* = ((number of data entries less than *x*)/(total number of data entries))×100 and then round to the nearest whole number. (p. 106)
frequency histogram
Uses bars to represent the frequency distribution of a data set. A *histogram* has the following properties: 1.) The horizontal scale is quantitative and measures the data entries. 2.) The vertical scale measures the frequencies of the class. 3.) Consecutive bars must touch. (p. 44)