Chapter 3 math
Empirical) Rule
Theorem (Math Fact): If a histogram of the data is bell shaped
Probability theory is complex and deep enough
to warrant (at least) its own course (taught in most mathematics or statistics departments)
An outlier is an
unusually small or unusually large value in a data set.
Chebyshev's Theorem At least 75% of the data values must be within
z = 2 standard deviations of the mean.
Chebyshev's Theorem At least 89% of the data values must be within
z = 3 standard deviations of the mean.
Chebyshev's Theorem At least 94% of the data values must be within
z = 4 standard deviations of the mean
For symmetric graphs, skewness is
zero and the mean is equal to the median
Probability values are always assigned on a scale from
0 to 1
The closer the correlation is to zero,
the weaker the relationship
Range is very sensitive
to the smallest and largest data values
The coefficient can take on values between
-1 and +1
The correlation coefficient ranges from which two values?
-1 and +1
Five summary
1. smallest value 2.First quartile (Q1) 3. Median (Q2) 4. Third quartile (Q3) 5. Largest value
Theorem (Math Fact): If a histogram of the data is bell shaped, as shown in the figure to the right, then all of the following are true: (one)
About 68% of all observations will fall within one standard deviation of the mean (that is, within 𝑥ҧ ± 1𝑠) About 95% of all observations will fall within two standard deviations of the mean (that is, within 𝑥ҧ ± 2𝑠) About 99.7% of all observations (almost the entire data set) will fall within three standard deviations of the mean (that is, within 𝑥ҧ ± 3𝑠)
Chebyshev's Theorem
At least 75% of the data values must be within z = 2 standard deviations of the mean At least 89% of the data values must be within z = 3 standard deviations of the mean. At least 94% of the data values must be within z= 4 standard deviations of the mean
_____ can be used to make statements about the proportion of data values that must be within a specified number of standard deviations of the mean, regardless of the shape of the distribution.
Chebyshev's Theorem
Chebyshev's Theorem:
For a distribution of any shape (symmetric, skewed, bimodal, etc.), the proportion of observations that that lie within 𝑘 standard deviations of the mean is at least: 1 − 1 , for 𝑘 > 1
interquartile range
IQR = Q3 - Q1
_____ can be used to determine the percentage of data values that must be within one, two, and three standard deviations of the mean for data having a bell-shaped distribution.
The empirical rule
For Chebyshev's theorem to hold, what should be the value of z?
Z should be greater than one
A box plot is a graphical representation of data that is based on _____.
a five-number summary
The covariance is
a measure of the linear association between two variables.
Positive values of covariance indicate _____.
a positive relation between the x and y variables
A probability near one indicates
an event is almost certain to occur.
A probability near zero indicates
an event is quite unlikely to occur
Two descriptive measures of the relationship between two variables are
covariance and correlation coefficient
A box plot is a
graphical summary of data that is based on a five- number summary.
When the data is positively skewed, what is the relationship between the mean and the median? If the data is negatively skewed, is skewness positive, zero or negative?
he mean is greater than the median. The data is negatively skewed
Box plots provide another way to
identify outliers
Correlation
is a measure of linear association and not necessarily causation. 𝑟 =𝑠𝑥𝑦/S𝑥S𝑦
Probability
is a numerical measure of the likelihood that an event will occur.
The interquartile range of a data set
is the difference between the third quartile and the first quartile.
Probability theory
is the mathematical framework by which we can make scientifically sound claims about a process (or population) using a sample
Correlation Coefficient Just because two variables are highly correlated,
it does not mean that one variable is the cause of the other.
The coefficient of variation indicates how large the standard deviation is relative to the _____.
mean
For data skewed to the left, the skewness is _____
negative
Negative values indicate a
negative relationship
The range
of a data set is the difference between the largest and smallest data values. Range = Largest value - Smallest value
The covariance Positive values indicate a
positive relationship
range is the
simplest measure of variability.
An important measure of the shape of a distribution is called
skewness
If graph is moderately skewed left then
skewness is negative and mean will usually be less than the median
If graph is moderately skewed to the right,
skewness is positive and the mean will usually be greater than the median
The variance is equal to the _____
squared value of the standard deviation
Variance is in terms of squared units, which is hard to interpret. To get back to the original units we take the (positive) square root of 𝑠2. This gives the
standard deviation
Correlation Coefficient Values near -1 indicate a
strong negative linear relationship
Correlation Coefficient values near +1 indicate a
strong positive linear relationship
sample variance
s²
A data value with a z-score less
than -3 or greater than +3 might be considered an outlier.
A key to the development of a box plot is
the computation of the median and the quartiles Q1 and Q3.
interquartile range is for
the middle 50% of the data.
interquartile range overcomes
the sensitivity to extreme data values
The z-score is often called
the standardized value.