Statistics Ch 3
z-Score
Indicates how many standard deviations a particular data value is from the mean. If the z-score is positive, the data value is above the mean, and if the z-score is negative, the data value is below the mean.
Mode
The value that occurs most often in a set of data.
Skewness Affect - Left-Skewed Distribution
Mean < Median < Mode
Skewness Affect - Symmetric Unimodal Distribution
Mean = Median = Mode. Unimodal means it has ONE MODE. This is also an example of a NORMAL DISTRIBUTION.
Skewness Affect - Right-Skewed Distribution
Mean > Median > Mode
Chebyshev's Rule
The proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1-1/k^2, where k is any positive number greater that 1.
Mean
This "measure of center" is the AVERAGE of the values in a data set. (Mean is sensitive to extreme values.)
The Empirical Rule
This says that, in a normal bell-shaped curve, 68% of the data fall within one standard deviation, 95% within two, and 99.7% within three.
Percentile Calculation Formula
i = (P/100)n.
conditional probability
...
independent events
...
intersection
...
mutually exclusive
...
union
...
The Empirical Rule in terms of z-Scores
68% of the data will have z-scores between -1 and 1, 95% between -2 and 2, and 99.7% -3 and 3.
Standard Deviation
A common measure of the variability, or spread, of a data set. It is a typical deviation from the mean.
Detecting Outliers - IQR Method
A data value is an outlier is a. it is located 1.5(IQR) or more below Q1, or b. it is located 1.5(IQR) or more above Q3.
Boxplot
A graphic display that represents the distribution of data by focusing on five key measures: Min, Q1, Q2, Q3, Max.
Median
A measure of center in a set of numerical data. The median of a list of values is the value appearing at the center of a sorted version of the list - or the mean of the two central values if the list contains an even number of values. (Median is NOT sensitive to extreme values)
Interquartile Range (IQR)
A robust measure of variability, calculated as IQR=Q3-Q1. It is interpreted as the spread of the middle 50% of the data, and it is NOT affected to outliers since it ignores the highest 25% and the lowest 25% of the data set.
Five-Number Summary
An exploratory data analysis technique that uses five numbers to summarize the data: 1. smallest value, 2. first quartile, 3. median (second quartile), 4. third quartile, and 5. largest value.
Outlier
An extremely large or extremely small data value relative to the rest of the data set.
Detecting Outliers - Z-score Method
Identify an outlier by determining is it is farther than 3 standard deviations from the mean, i.e., Z-score less than -3 or greater than 3.
Percentile Rank
Percentage of scores falling at or below a specific score. A percentile rank of 95 means that 95% of all of the scores fall at or below this point. In other words, the score is as good as or better than 95% of the scores.
Quartiles
The 25th, 50th, and 75th percentiles, referred to as the first quartile, the second quartile (median), and third quartile, respectively. The quartiles can be used to divide a data set into four parts, with each part containing approximately 25% of the data.
Deviation
The difference between a data value and the mean of the data set. (The distance between the data value and the mean) If data value x > mean, deviation will be positive. If data value x < mean, deviation will be negative. If data value x = mean, deviation will be zero.
Range
The difference between the largest value and smallest value of a data set. (Range = Largest Value - Smallest Value) (A larger range is an indication of greater VARIABILITY, or greater spread, in the data set)
Percentile
The location of a data value relative to other values in the data set, i.e., a score in the 90th percentile means that 90% of all scores are at or below the same level, and 10% scored higher than this score.
Boxplot Upper and Lower Fences
Upper Fence = Q1 - 1.5(IQR) Lower Fence = Q3 + 1.5(IQR)
The interquartile range
What does the length of the box in the boxplot represent?