Ch. 3 Vocab — Data Description
percentile formula
((number of values below X) + 0.5 / total number of values) * 100
percentile median
P-sub-50 (50th percentile)
quartile median
Q-sub-2 (50th percentile)
Mean
Sum of all the values divided by the number of values. This can either be a population ____ (denoted by mu) or a sample ____ (denoted by x bar)
Percentile
This divides the data set into 100 equal groups. The percent of the population which lies below that value. The data must be ranked to find this.
Sample Variance
Unbiased estimator of a population variance. Instead of dividing by the population size, the sum of the squares of the deviations from the sample mean is divided by one less than the sample size. The units on the variance are the units of the population squared.
Mild Outliers
Values which lie between 1.5 and 3.0 times the InterQuartile Range below the 1st Quartile or above the 3rd Quartile.
Extreme Outliers
Values which lie more than 3.0 times the InterQuartile Range below the 1st Quartile or above the 3rd Quartile.
percentile, percentage
A __________ is a relative measurement of position; whereas, a __________ is an absolute measure of the part to the total
Box and Whiskers Plot / Box Plot
A graphical representation of the minimum value, lower hinge, median, upper hinge, and maximum. Some textbooks define the five values as the minimum, first Quartile, median, third Quartile, and maximum.
Outlier
An extremely high or low value when compared to the rest of the values.
Parameter
Characteristic or measure obtained from a population
Statistic
Characteristic or measure obtained from a sample
data array
a data set that has been ordered
Quartile
a relative measure of position obtained by dividing the data set into quarters; either the 25th, 50th, or 75th percentiles, and the 50th percentile is called the median; divided into four groups
Decile
a relative measure of position obtained by dividing the data set into tenths; either the 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, or 90th percentiles; divided into ten groups
Coefficient of Variation
allows one to compare standard deviations when the units are different; standard deviation divided by the mean expressed as a percentage
percentage
an absolute measure of the part to the total
range rule of thumb
an approximation of the standard deviation; the range is divided by 4
Five Number Summary
Minimum value, lower hinge, median, upper hinge, and maximum.
Empirical Rule / 68-95-99.7
Only valid when a distribution in bell-shaped (normal). Approximately 68% lies within 1 standard deviation of the mean; 95% within 2 standard deviations; and 99.7% within 3 standard deviations of the mean.
Greek letters
denote parameters (population)
Roman letters / Arabic letters
denote statistics (samples)
measures of variation
describes the distribution of data by showing how different each value is from the other; includes range, standard deviation, variance, and IQR
Standard Deviation
The square root of the variance. The population ________ _________ is the square root of the population variance, and the sample ________ _________ is the square root of the sample variance. The sample ________ _________ is not the unbiased estimator for the population standard deviation. The units on the ________ _________ is the same as the units of the population/sample.
population, sample
The __________ standard deviation is a parameter; whereas, the ______ standard deviation is a statistic.
measures of central tendency
lets us know what is normal or "average" for a set of data by condensing the data set down to one representative value; includes mean, median, mode, and midrange
negatively skewed distribution / left-skewed distribution
the majority of the data values fall to the right of the mean and cluster at the upper end of the distribution; the "tail" is to the left
multimodal
more than two modes
no mode
no data value is repeated
unimodal
one mode
symmetric distribution
the data values are evenly distributed on both sides of the mean; in a _________ ___________, the mean is the median.
positively skewed distribution / right-skewed distribution
the majority of data values fall to the left of the mean and cluster at the lower end of the distribution; the "tail" is to the right
percentile rank / percentile
the percentage of data values that fall below the specific rank
bimodal
two modes
measures of position
used to locate the relative position of a data value in the data set; includes z-scores, percentiles, deciles, and quartiles
sample mean
x̅
population mean
μ
z-score / standard score
-tells how many standard deviations a data value is above or below the mean for a specific distribution of values -obtained by subtracting the mean and dividing by the standard deviation -when all values are transformed to their ________ _____s, the new mean (for Z) will be zero and the standard deviation will be one -lets you compare apples to oranges
Chebyshev's Theorem
1-(1/k^2), when k>1; states the proportion of data values that fall k standard deviations of the mean. It can be applied to any distribution regardless of its shape.
decile median
D-sub-5 (50th percentile)
Population Variance
The average of the squares of the distances from the population mean. It is the sum of the squares of the deviations from the mean divided by the population size. The units on the variance are the units of the population squared.
Interquartile Range / IQR
The difference between the 3rd and 1st Quartiles.
Range
The difference between the highest and lowest values. Max - Min
Midrange
The mean of the highest and lowest values. (Max + Min) / 2
Weighted Mean
The mean when each value is multiplied by its weight and summed. This sum is divided by the total of the weights.
Lower Hinge / Lower Quartile
The median of the lower half of the numbers (up to and including the median). The _____ _____ is the first Quartile unless the remainder when dividing the sample size by four is 3.
Upper Hinge / Upper Quartile
The median of the upper half of the numbers (including the median). The _____ _____ is the 3rd Quartile unless the remainder when dividing the sample size by four is 3.
Median
The midpoint of the data after being ranked (sorted in ascending order). There are as many numbers below the ______ as above the ______.
Mode
The most frequent number