ISDS Chapter 3
Mean
($3,000,000/5) = $600,000
The Sample Varience
- Can never be negative because values are squared - Will equal zero only if all observations have the same value (no variation) - The sum of the squared differences around the mean divided by sample size minus 1
A Box Plot
- Graphically display the distribution of a data set. - Compare two or more distributions. - Identify outliers in a data set.
Shape of Boxplots
- If data are symmetric around the median then the box and central line are centered between the endpoints - Can be shown in either a vertical or horizontal orientation
Data Analysis
- Is objective - Should report the summary measures that best describe and communicate the important aspects of the data set
Data Interpretation
- Is subjective - Should be done in fair, neutral and clear manner
The Standard Deviation O
- Measures variation in the population - Calculation is similar to sample standard deviation - Like sample statistics, population standard deviation is the square root of the population variance
The Standard Deviation O
- Most commonly used measure of variation - Shows variation about the mean - Is the square root of the population variance - Has the same units as the original data
The Sample Standard Deviation
- Most commonly used measure of variation - Shows variation about the mean - Is the square root of the variance - Has the same units as the original data
Median
- Not affected by extreme values - When data set contains odd number * Middle value - When data set contains even number * Take the average of the 2 middle values
The Arithmetic Mean
- Often just called the "mean" - The most common measure of central tendency
Rules when Calculation the Ranked Position
- Rule 1: If the result is a whole number then it is the ranked position to use - Rule 2: If the result is a fractional half (e.g. 2.5, 7.5, 8.5, etc.) then average the two corresponding data values. - Rule 3: If the result is neither (not a whole number or a fractional half) then round the result to the nearest integer to find the ranked position.
Numerical Descriptive Measures with Ethical Considerations
- Should document both good and bad results - Should be presented in a fair, objective and neutral manner - Should not use inappropriate summary measures to distort facts
Mean
- Sum of values divided by the number of values - Affected by extreme values (outliers)
The Sample Standard Deviation
- The majority of observations will lie within +1 to -1 standard deviations. - Shows variation about the mean - Is square root of the variation
Measures of Variation
- The more the data are spread out, the greater the range, variance, and standard deviation. - The more the data are concentrated, the smaller the range, variance, and standard deviation. - If the values are all the same (no variation), all these measures will be zero. - None of these measures are ever negative.
Mode
- Value that occurs most often - Not affected by extreme values - Used for either numerical or categorical (nominal) data - There may be several, there may be none
Z- Score
- __________ Is useful in identifying outliers - Larger the ________ the greater the distance from value to the mean
The five numbers that help describe the center, spread and shape of data
1) Xsmallest 2) First Quartile (Q1) 3) Median (Q2) 4) Third Quartile (Q3) 5) Xlargest
Steps to Compute Sample Standard Deviation
1. Compute the difference between each value and the mean. 2. Square each difference. 3. Add the squared differences. 4. Divide this total by n-1 to get the sample variance. 5. Take the square root of the sample variance to get the sample standard deviation.
Range, Variance, and Standard Deviation
3 Measures of variation
IQR
= Q3-Q1
Range
= Xlargest - Xsmallest
The Boxplot
A Graphical display of the data based on the five-number summary
Variation and Shape
A data set can be characterized by its ______
Extreme Outlier
A data value is considered an _________ if its Z-score is less than -3.0 or greater than +3.0.
The IQR
A measure of variability that is not influenced by outliers or extreme values
Mean
Acts like the "balance point" for the data set
Quartiles, Five-Number Summary, Boxplot
Another way to describe numerical data
The Empirical Rule
Approximately 68% of the data in a bell shaped distribution is within 1 standard deviation of the mean or
The Empirical Rule
Approximates the variation of data in a bell-shaped distribution
The Sample Variance
Average (approximately) of squared deviations of values from the mean
Shape of a Distribution
Describes how data are distributed
Sample
Descriptive statistics discussed previously described a _______, not the population.
Mean
Generally used, unless extreme values (outliers) exist.
Measures of Variation
Give information on the spread or variability or dispersion of the data values.
Most Data Sets
Have a pattern that looks approximately like a bell with a peak of values somewhere in the middle (bell shaped curve).
Population Mean, Population Variance, and Population Standard Deviation
Important population parameters are the _________, __________, and ________
Median
In an ordered array, the _______ is the "middle" number (50% above, 50% below)
Middle Fifty
Interval between Q1 and Q3 sometimes called the _______
Shape
Is defined in 2 measures, Skewness or Kurtosis
Shape
Is either symmetrical or skewed
Q2
Is the median, 50% of values are higher and 50% are lower
The Sample Variance
It Does take into account how all the data values are distributed
Left Skewed
Long tail to left caused by extremely low values, pulls down the mean so it is less than the median
Right- Skewed
Long tail to the right caused by extremely high values which pull the mean upward so mean is greater than median
Symmetric
Mean = Median
Left Skewed
Means the Mean is to the LEFT of the MEDIAN
Right Skewed
Means the Mean is to the RIGHT of the MEDIAN
Resistant Measures
Measures like the median, Q1, Q3, and IQR that are not influenced by outliers are called __________
Mean, Median, and Mode
Measures of central tendency
IQR
Measures spread in middle 50 of data or midspread
Skewness
Measures the amount of asymmetry in a distribution
Kurtosis
Measures the relative concentration of values in the center of a distribution as compared with the tails
Variation
Measures the spread or dispersion of values
Symmetrical Data Sets
Median and mean are same - produces bell shaped distribution
Mode
Most frequent value
Median
Often used, since the median is not sensitive to extreme values. For example, median home prices may be reported for a region; it is less sensitive to outliers.
Third Quartile
Only 25% of the observations are greater than the _________
Mean
Only measure in which all values play an equal role (why outliers affect it)
u
Population mean
N
Population size
First Quartile
Q1, is the value for which 25% of the observations are smaller and 75% are larger
Second Quartile
Q2 is the same as the median (50% of the observations are smaller and 50% are larger)
The IQR
Q3 - Q1 and measures the spread in the middle 50% of the data
Quartile Measures
Quartiles split the ranked data into 4 segments with an equal number of values per segment
Symmetric
Right and left tails are equal, so mean = median
The Range
Simplest measure of variation Difference between the largest and the smallest values
Comparing Standard Deviations
Simply gives you an idea of how the data is dispersed around the mean and the number of standard deviations from the mean
SS
Sum of squares IS the top part of the equation WHICH IS the summation of all squared differences between x values and the mean
Parameter
Summary measures describing a population, called _________, are denoted with Greek letters.
Variance and Standard Deviation
The 2 common measures of variation
Midspread
The IQR is also called the ________ because it covers the middle 50% of the data
Larger
The _______ the absolute value of the Z-score, the farther the data value is from the mean.
The Variation
The amount of dispersion or scattering of values
The Central Tendency
The extent to which all the data values group around a typical or central value
The Five Number Summary
The five numbers that help describe the center, spread and shape of data
Median Position
The number of data points +1 /divided by 2 - NOTE THAT gives the position in the data set NOT the value
Z- Score
The number of standard deviations a data value is from the mean.
The Shape
The pattern of the distribution of values from the lowest value to the highest value
Central Tendency, Variation, and Shape
The ways to measure
Z- Score
To compute _____, subtract the mean and divide by the standard deviation.
The Empirical Rule
Use to examine the variability in distributions i.e., cluster around the median, right skewed cluster left of mean, left skewed cluster right of mean
Ignores the way in which the data is distributed
Why the range can be misleading
95%
___ % of data in bell shaped distribution implies that 1 of 20 values will be beyond two standard deviations from mean in either direction
99.7%
____% of the data in a bell-shaped distribution lies within three standard deviations of the mean, or µ ± 3σ
Q1
divides smallest 25% of values from other 75%
Q3
divides smallest 75% from largest 25%
Population Mean
is the sum of the values in the population divided by the population size, N
Xi
ith value of the variable X
Median
middle value of ranked data