STAT 210 Ch 4
mean
the most often used measure of central location
Since all subjects of the population are rarely known, the population standard deviation is usually _________ and must be estimated by the sample standard deviation, denoted ___.
unknown; S
xn =
value of nth (last) observation
IQR
The lower and upper quartiles enclose the middle 50% of the data and hence the IQR measures the range of this middle half of the data
The population mean is estimated by the sample mean, denoted by
X (read "X-bar")
Standard deviation
a measure of variability around the mean
The mean is highly influenced by
outliers
Variance
A measure of spread around the mean that is related to the standard deviation
Interquartile range (IQR).
A measure of spread around the median; Measures variability around the median. Therefore the IQR is resistant to outliers and may be a better measure of spread than the standard deviation if the distribution is skewed; can be considered the "middle quartile", such that 50% of the data is less than it and 50% of the data is greater than it
Long-tailed distribution
A symmetric boxplot with long whiskers or several outliers at each end indicate that the data may come from a distribution with long tails
Statistical Inference
All subjects of the population are rarely known. Hence the population parameter of interest can rarely be determined and must be estimated using a sample statistic. The statistics are denoted using regular letters, such as X, s, and p
Boxplot
Displays quantitative data by giving info on the shape of a distribution, on the center and spread of a distribution and on the concentrations of data values in the tails of a distribution (outliners)
Lower fence=
Q1 - 1.5 IQR
Upper fence=
Q3 + 1.5 IQR
IQR=
Q3 - Q1
Population mean is denoted by the Greek letter m (read "mu")...
and is the sum of all observations divided by how many individuals that there are in the population.
Outliers
any observation less than the lower fence value or greater than the upper fence value
The population median is usually denoted by the Greek letter and is estimated by the sample median, denoted by
h (read "eta"); M
Deviation
is the amount that an observation differs from the mean: x - X
Upper adjacent value
largest observation that remains in the data set
Range= ______________. It is heavily __________ by outliers.
maximum value - minimum value; influenced
Dispersion parameter
measure the spread or variability around the center
If, however, all the values are _____ ___ ____, then the characteristic is called a ______ and of interest is to measure the amount of spread (or dispersion or variability) around a central value
not the same; variable
n=
number of observations in the sample
Lower Quartile
observation with 25% of the data less than it and 75% of the data greater than it. Denoted as Q1
Upper Quartile
observation with 75% of the data less than it and 25% of the data greater than it. Denoted as Q3
The median is more _________ to outliers than the mean
resistant
The population standard deviation is denoted by ___________.
s (read "sigma")
For symmetric distributions the mean and median will be nearly the _______.
same
If all the values of a characteristic are the _______ then the characteristic is a _______, and both the mean and median are the constant value. There is no spread in the data.
same; constant
Population mean is estimated by the
sample mean
since the entire population is usually unknown the population variance is estimated using the ___________.
sample variance s2
Lower adjacent values (aka whisker)
smallest observation that remains in the data set