Chapter 4
Fractile
Measures of data that divide the data set into two or more equal parts. The median is one example because the median splits the data in half or into two equal parts. Other fractiles include: quartiles, deciles, and percentiles which split the data into 4 parts, 10 parts, and 100 parts respectively.
Standard Deviation
A measure of how spread out the numbers are. Its symbol is σ (the greek letter sigma). The formula is easy: It is the square root of the variance.
Population Variance
A measure of variability for the average squared distance that scores in a population deviate from the mean. It is computed only when all scores in a given population are recorded. Formula: SS N
Sample Variance
A measure of variability for the average squared distance that scores in a sample deviate from the mean. It is computed when only a portion or sample of data is measured in a population. Formula: SS n-1
Variability
A way to measure the dispersion or spread of scores around the mean. By definition, the variability of scores can never be negative; the variability ranges from 0 to positive infinity. Researchers measure variability to determine how dispersed scores are in a set of data. Measures of variability include the range, variance, and standard deviation. IE: If four students receive the same scores of 8, 8, 8, and 8, on the same assessment, then their variability is 0 because their scores do not vary and they are all the same value. But, if the scores are 8, 8, 8, and 9, then they do vary because at least one of the scores differs from the others. Thus, scores can either not vary (variability is 0) or vary (a variability is greater than 0).
Unbiased Estimator
Any sample statistic, such as the sample variance when we divide SS by n-1, obtained from a randomly selected sample that equals the value of its respected population parameter, such as population variance, on average. The sample variance is unbiased if: s^2= SS n-1 then s^2 = o^2
Biased Estimator
Any sample statistic, such as the variance when we divide SS by n, obtained from a randomly selected sample that does not equal the value of its respective population parameter, such as the population mean, on average. The sample variance is biased if: s^2= SS n then s^2 < o^2
Sum of Squares (SS)
Is the sum of the squared deviations of scores from their mean. The SS is the numerator in the variance formula.
Quartile
Splits the data into four equal parts. In terms of percentiles, the four quartiles are the 25th percentile (Q1), the 50th percentile (Q2), 75th percentile (Q3), and the 100th percentile (Q4) of a distribution. Thus the quartiles split the data into four equal parts, each containing 25% of the data. By separating the data, we can organize the data by dividing them into four equal parts. They essentially mark percentile boundaries in a distribution. (For example) If you know your college GPA is above the median quartile, then you know that your GPA is better than more than 50% of your peers. To locate each quartile, you must follow three steps: 1. Locate the median for all data. This is the median quartile (Q2). 2. Locate the median for scores below Q2. This is the lower quartile (Q1). 3. Locate the median for scores above Q2. This is the upper quartile (Q3).
Empirical Rule
States that for data that are normally distributed, at least 99.7% of data lie within three standard deviations of the mean, at least 95% of data lie within two standard deviations of the mean, and at least 68% of data lie within one standard deviation of the mean.
Variance
The average of the squared differences from the mean. To calculate the variance follow these steps: 1. Work out the mean (the simple average of the numbers) 2. Then for each number: subtract the mean and square the result (the squared difference). 3. Then work out the average of those squared differences.
Deviation
The difference of each score from its mean. x- u
Lower Quartile
The median value of the lower half of a data set at the 25th percentile of a distribution.
Median Quartile
The median value of the lower half of a data set at the 50th percentile of a distribution.
Upper Quartile
The median value of the lower half of a data set at the 75th percentile of a distribution.
Interquartile Range (IQR)
The range of scores in a distribution between Q1 and Q3. Hence, the IQR is the range of scores in a distribution minus the top and bottom 25% of scores in a distribution. Because the IQR excludes the top and bottom 25% of scores in a distribution, outliers have little influence over this value. To compute an IQR, we subtract the upper quartile (Q3) from the lower quartile Q1: IQR= Q3- Q1
Range
The simplest way to describe how dispersed scores are in a distribution. It is the difference between the largest value (L) and the smallest value (S) in a data set. It is most informative for data sets without outliers. IE: Suppose you have a set of scores: 1, 2, 3, 4, 5 The range of the data is 5-1=4 Now suppose you have a set of scores: 2, 4, 8, 8, 100 The range is 100-2=98 This range is misleading because only one value is greater than 8.
Semi-interquartile Range
or Quartile Deviation, is used as a measure for half the distance between the upper and lower quartiles of a distribution. You can think of the SIQR as the mean IQR, with smaller SIQR values indicating less spread or variability of scores in a data set. While the SIQR is a good estimate of variability, it is also limited in that its estimate excludes half the scores in a distribution.