Statistics Chap 3 : Numerical Descriptions of Data
Properties of the Mode
1. A data set may not have a mode 2. A data set may have one or more than one mode 3. if a mode exists for a data set, the mode is a value in the data set 4. Not affected by outliers in the data set 5. Only measure of center appropriate for qualitative data
Creating a Box Plot
1. Begin with a horizontal (or vertical) number line that contains the five-number summary 2. Draw a small line segment above (or next to) the number line to represent oeach of the numbers in the five-number summary 3. Connect the line segment that represent the 1st Quartile to the line segment representing the 3rd quartile, forming a box with the median's line segment in the middle 4. Connect the "box" to the line segments representing the minimum and maximum values to form the "whiskers"
Properties of Range
1. Easiest measure of dispersion to calculate 2. Only affected by the largest and smallest values in the data set, so it can be misleading
Properties of Standard Deviation
1. Easily computed using a calculator or computer 2. Affected by every value in the data set 3. Population standard deviation and sample standard deviation formulas yield different results 4. Interpreted ass the average distance a data value is from the mean thus it cannot take on negative values 5. same units as the units of data Larger standard deviation indicates the data values are more spread out, smaller deviation indicates the data values lie closer together 7. If it equals 0, then all of the data values are equal to the mean 8. Equal to the square root of the variance
Properties of the Median
1. Easy to compute by hand 2. Only determined by middle values of a data set, not affected by outliers 3. May not be a value in the data set if there are an even number of values in the set 4. Useful measure of center for skewed distributions
Determining the Most appropriate measure of Center
1. For qualitative data, the mode should be used 2. For quantitative data, the mean should be used unless the data set contains outliers or is skewed 3. For quantitative data sets that are skewed or contain outliers, the median should be used
Finding the Median of a Data Set
1. List the data in ascending order, making an ordered array 2. If the data set contains an ODD number of values, the median is the middle value in the ordered array 3. If the data set contains an EVEN number of values, the median is the arithmetic mean of the 2 middle values in the ordered array
Properties of the Mean
1. Most familiar and widely used measure of center 2. Its value is affected by EVERY value in the data set 3. May not be a value in the data set 4. Appropriate measure of center for quantitative data with no outliers
Graphs & Measures of Center
1. The mode is the data value at which a distribution has its highest peak 2. The median is the number that divides the area of the distribution in half 3. The mean of a distribution will be pulled toward any outliers
Properties of Variance
1. easily computed using a calculator or computer 2. Affected by every value in the data set 3. Population variance and sample variance formulas yield different resuts 4. Difficult to interpret because of its unusual squared units 5. Equal to the square of the standard deviation 6. Preferred over thee standard deviation many statistical tests because of its simpler formula
Five-Number Summary
A numerical description of a data set that lists in order from smallest to largest: • Minimum value • 1st Quartile, Q1 • 2nd Quartile, Q2, Median • 3rd Quartile, Q3 • Maximum Value
Chebyshev's Theorem
The proportion of data that lie within K standard deviations of the mean is at least 1-1/K2 for K > 1. When K = 2 and K = 3 : • K = 2: At least 1 - 1/22 = ¾ = 75% of the data values lie within 2 standard deviations of the mean • K = 3: At least 1 - 1/32 = 8/9 = 88.9% of the data values lie within 3 standard deviations of the mean
Percentiles
Values that divide the data into 100 equal parts, each percentile indicates approx. what percentage of the date lie at or below a given value
Quartiles
Values that divide the data into four equal parts, equivalent to the 25th, 50th and 75th percentile • Q1 = First Quartile: 25% of the data are less than or equal to this value • Q2 = Second Quartile: 50% of the data are less than or equal to this value • Q3 = Third Quartile: 75% of the data are less than or equal to this value
Box Plot
a graphical representation of a five-number summary, sometimes referred to as a "box-and-whisker plot"
Standard Deviation
a measure of how much we might expect a typical member of the data set to differ from the mean
Hinge
an approx. of the first or third quartile, found by using the median to divide the data set into an upper half and a lower half (without including the median in either half), and then finding the median of either half of the data set
No mode
describes a data set in which all of the data values occur only once or each value occurs an equal number of times
Bimodal
describes a data set in which exactly 2 data values occur equally often
Multimodal
describes a data set in which more than two data values occur equally often
Unimodal
describes a data set in which only one data value occurs most often
Chebyshev's Theorem
gives a minimum estimate of the percentage of data within a few standard deviations of the mean for any distribution
Standard score (or z-score)
indicates how many standard deviations from the mean a particular data value lies
Pth Percentile of a Data Value
the Pth percentile of a particular value in a data set is given by Where P is rounded to the nearest whole number
Sample Mean
the arithmetic mean of a set of sample data
Population Mean
the arithmetic mean of all the values in a population
Range
the difference between the largest and smallest values in the data set, Range= Maximum data value - Minimum Data value
Location of Data Value for the Pth percentile
the location of the Pth percentile in an ordered arra data values 1. If the formula results in decimal value for l, the location is the next larger whole number 2. If the formula results in a whole number, the percentile's value is the arithmetic mean of the data value in the location and the data value in the next larger location
Weighted Mean
the mean of a data set in which each data value in the set does not hold the same relative importance
Median
the middle value in an ordered array of data
Interfquartile Range (IQR)
the range of middle 50% of the data, given by
Coefficient of variation, CV
the ratio of the standard deviation to the mean as a percentage, allows comparison of the spreads of data from different sources, regardless of differences in units of measurement
Variance
the square of the standard deviation
Population Standard Deviation
the standard deviation of a population data set
Sample Standard Deviation
the standard deviation of a set of sample data
Arithmetic mean
the sum of all of the data values divided by the number of data values, often simply called the mean
Mode
the value in a data set that occurs most frequently
Population Variance
the variance of a population data set
Sample Variance
the variance of a set of sample data
Empirical Rule
used with bell-shaped distributions of data to estimate the percentage of values within a few standard deviations of the mean
Empirical Rule for Bell-Shaped Distributions
• Approx. 68% of the data values lie within 1 standard deviation of the mean • Approx 95% of the data values lie within 2 standard deviations of the mean • Approx 99.7% of the data values lie within 3 standard deviations of the mean