STAT Chapter 3: Averages and Variation
Five-number summary
Minimum, Q1, Median, Q3, Maximum
Population size
"N" The number of individuals in a population.
Sample size
"n" The number of individuals in the sample.
Population variance
(Average of the distances from each data point in the population to the mean) [Find Formula]
Sample variance
(Average of the squared differences from the mean) [Find Formula]
Mean
(Sum of all the entries)/(Number of entries)
Population standard deviation
(The square root of the population variance) [Find Formula]
Sample standard deviation
(The square root of the sample variance) [Find Formula]
Outlier
A data value that is much greater or much less than the others in a data set.
Quartile
A division of the total data into four intervals, each one representing one-fourth of the data. Q1 is the 25th percentile Q2 is the Median/50th percentile Q3 is the 75th percentile
Resistant measure
A summary number that is not affected by outliers. Ex)The median and IQR
Weighted Average
Average of data that adds factors to reflect the importance of different values. x is the data value w is the weight assigned to that data value [Fine Formula]
Percentile
Each of the 100 equal groups into which a population can be divided according to the distribution of values of a particular variable. Ex) A data point in the 70th percentile is at or above 70% of the data, and at or below 30% of the data.
Results of Chebyshev's theorem
For any data set: 1) at least 75% of data falls within 2 SD of the mean 2) at least 88.9% of data falls within 3 SD of the mean 3) at least 93.8% of data falls within 4 SD of the mean
Chebyshev's theorem
For any set of observations (sample or population), the proportion of the values that lie within k standard deviations of the mean is at least 1-(1/k^2), where k is any constant greater than 1.
Position of Middle value
For ordered set of size n, this value is found at (n+1)/2
Population mean
For population of size N: [Find Formula]
Sample mean
For sample size of n: [Find Formula]
Whisker
Lines that extend from one end of the "box" to the smallest data value and from the other end of the "box" to the largest data value.
Sum of Squares
Measures how far individual measurements are from the mean. [Find Formula]
Average
One number that is used to describe the entire sample or population. Three major ones are mean, median, and mode.
Median
The central value of an ordered distribution. Odd # data values: middle of data set Even # data values: (Sum of middle two numbers)/2
Range
The difference between the largest and smallest values of a data distribution.
Interquartile range
The difference between the upper and lower quartiles. IQR = Q3-Q1
Trimmed mean
The mean of the data values after trimming off a certain percentage of the smallest and largest data values from the original set.
Coefficient of Variation
The standard deviation as a percentage of the sample or population mean Sample CV= (s/x-bar) * 100 Population CV= (sigma/mu) * 100
Mode
The value in a data set that occurs most frequently.
Box-and-whisker plot
Type of EDA for displaying five-number summary.
Standard deviation of grouped data
When data is grouped (ex. in a frequency table or histogram), we can estimate the SD. [Find Formula]
Mean of grouped data
When data is grouped (ex. in a frequency table or histogram), we can estimate the mean. [Find Formula]