Ch1 and 2 - stats
Variance (s²) of a set of observations is ...
....is an average of the squares of the deviations of the observations from their mean. s²=(x₁-mean)²+(x₂-mean)²+....(x₆-mean)²/(n-1)
Standard deviation measures....
....the spread about the mean and should be used only when the mean is chosen as the measure of center.
The standard deviation (s) is ...
....the square root of the variance s²:
STDEV has the same units of measurement...
...as the original observations. if mean is in calories, STDEV is in calories.
___________% of the data is less than the first quartile.
25%
A boxplot =
5 number summary (min, Q₁, M, Q₃, max)
Q₃ =
75% of the data
Inter-quartile range (IQR) =
= Q₃-Q₁ length of a box in a box plot
How to identify an outlier?
An observation is an outlier if it is < Q₁ - (1.5 x IQR) or An observation is an outlier if it is > Q₃ + (1.5 x IQR)
True or false: An outlier resulting from a typing error can be corrected.
TRUE
True or false: The best way to describe spread of a skewed distribution is to report the full five-number summary.
TRUE Because the two sides of a skewed distribution have different spreads, summaries such as interquartile range are not useful for describing its spread. That's why using the five-number summary is best.
True or false: Mean and standard deviation should only be used to describe a distribution if it is not skewed and has no outliers.
TRUE Since outliers and/or strong skewness affect mean and standard deviation, mean and standard deviation should not be used to describe a skewed distribution or a distribution with outliers.
If your data set is strongly skewed it is better to present the mean/median?
The median, so that the skewed data doesn't alter perception of results
When STDEV = 0
all the observations have the same value
if the unit of measurment is in calories the variance would be...
in squared calories
IQR is/isn't sensitive to outliers
isn't
As observation values are more spread out the STDEV gets
larger
Median =
midpoint of values. -If odd then (n+1)/2 and you get your actual midpoint -If even then (n+1)/2 = n.5 in which case you would add the two numbers, divide by 2 and the average of the 2 is the midpoint
STDEV =
square root of the variance (S²)
mean =
∑ of numbers divided by count of numbers - is very sensitive to outliers
standard deviation in words is....
...the degrees of freedom degrees of the variance (n-1)
the central box of the boxplot measures
50% of the data
Which measures should be used to describe a very right skewed distribution?
Five-number summary → The five-number summary is better than mean and standard deviation for describing a very right skewed distribution.
For the dataset, volumes of milk dispensed into one gallon milk cartons, should you use the mean or the median to describe the center?
Mean → The mean is a better choice in this situation because the distribution of volume of milk is likely symmetric with no outliers.
What is/are the most common numerical description of a distribution?
Mean (measures center) Standard deviation (measures spread)
Q₂ =
Median (M), which also represents 50% of the data
Variance is a close relative of....
Standard Deviation
Variance =
The sum (observation 1 - mean)² + (second observation -mean)²..../n-1
T/F this spread of data has a large STDEV xxxx x xxx
True
If the distribution is exactly symmetric, the mean and median are .......
exactly the same
>
greater than
STDEV is or isn't resistant to outliers
is not resistant. outliers influence the STDEV
<
less than
Range =
max-min is the length of a box plot
5 number summary =
min, Q₁, Q₂, Q₃, max
degrees of freedom of the variance =
n-1 also known as standard deviation
Regular deviation =
observation -mean
boxplots are best used for
side-by-side comparison of more than one distribution They show less detail than histograms or dotplots
resistant measure. what is not a resistant measure? what is a resistant measure?
the mean is not a resistant measure because it is not resistant to the effect of outliers the median is resistant to outliers because it is count only.
Stdev measures
variability, or variation from the mean