Basic Practice of Statistics Test 2
Range
The simplest measure of data variation, being the difference between the min and the max.
Fence values
These are found by taking either Q1-1.5(IQR) or Q3-1.5(IQR), any measurement less than the lower fence or greater than the upper fence is considered an outlier.
Unusual features
Things that create a distribution that is not symmetric (or more specifically, not normal) and can incluse a heavy concentration of values, a range of values not represented, (gap) or extreme values at the ends of the distribution (outliers).
Back to back Stem and Leaf plot
This can be used to display two sets of data that are measured in the same units, with a common set of stems.
10-15 stems
This is the ideal number of stems for a moderate sized data set.
Extended Stem and Leaf plot
This is used when a standard stem and leaf plot would be too compact; it shows the spread of the data clearly. Stems are repeated.
Symmetric distribution
A distribution is described as __________ if the left and right sides of the distribution are mirror images of each other.
Uniform distribution
A distribution whose frequencies are constant across the possible values. Its plot is rectangular.
Multimodal
A distribution with multiple peaks is said to be ______.
Trimodal
A distribution with three significant peaks is said to be ________.
Bimodal
A distribution with two significant peaks is said to be _______.
Stem and leaf plot
A graphical procedure that can be used to display quantitative data, either discrete or continuous. It retains the actual data values, an advantage over other procedures.
Standard deviation
A measure of variability about the mean.
Normal distribution
A special symmetric distribution with a symmetric, bell-shaped curve with a single peak in the middle of the distribution.
Frequency table
A table with three columns that displays the class intervals, the class frequencies, and the relative frequencies of a set of data.
Histogram
Another graphical method of displaying quantitative data that breaks the range of values of a variable into class intervals, but instead of displaying the actual data, it displays either the number of times or the percentages of the observations in each interval.
Sample median
Denoted by M, location calculated by (n+1/2).
Sample standard deviation
Denoted by S, measure of spread around the mean x bar.
Sample variance
Denoted by S².
Population Median
Denoted by the Greek letter η, (read eta), the central value that is larger than half of the data and smaller than half of the data.
Population mean
Denoted by the Greek letter μ. The sum of all the measurements in the population divided by the total number of measurements. (This is a parameter and usually not measurable.)
Sample mean
Denoted by x bar. The sum of all the measurements in the sample divided by the total number of measurements in the sample. This can be used to make inferences about µ.
Population standard deviation
Denoted by σ (read sigma), usually cannot be computed.
Outliers
Extreme values in a set of data which skew the data in certain direction.
Boxplot
Graphical method of displaying data which can be used to give information of the shape, center and spread, and the concentrations of data values.
Interquartile range
One measure of the spread around the median, resistant to outliers, Q3-Q1, measures the range of the middle 50% of the data.
Skewed distribution
One tail of the distribution is longer than the other. A distribution is skewed to the left if the left side of a graph extends much farther out than the right side, etc.
Frequency (class frequency)
Number of observations falling in each interval.
Standard Stem and Leaf plot
Involves dividing each number in a data list into two parts- a "stem" and a "leaf" and then listing the stems in consecutive order.
Dispersion parameter
Measures the amount of spread or variability around the center.
Central location parameter
Measures where the center of the distribution lies.
Five number summary
Min, Q1, Median, Q3, Max.
Relative frequency
The percentage of observations in each interval. This is found by dividing the frequencies by the total number of observations, and multiplying by 100.
