stats ch. 1-ch. 5 sec. 2
explain how to calculate the first quartile q1 and the third quartile q3
you find the median of the entire data set, then q1 is the median of the lower half and q3 is the median of the upper half
the box in a box plot represents what percentage of the data
50%
How can we use IQR to determine outliers?
An observations is an outlier it if is more than 1.5*IQR above the third quartile of below the first quartile.
what is the difference between a categorical variable and a quantitative variable
categorical: gender, eye color qualitative (categories) quantitative: age, height (numerical)
what two types of charts/graphs are usually most appropriate for categorical data
bar graph, pie chart
what type of graph gives the picture of the five number summary
box plot
explain how to calculate the mean
add the data values then divide by the sample size
what is meant by a variable
a characteristic of people or things
explain how to find the median
put the data values in order from smallest to largest from there find the median if there is an odd number of values if there is an even find the average of the two
Is standard deviation resistant or nonresistant to extreme observations? Explain.
s, like the mean, is not resistant. Strong skewness or a few outliers can make s very large.
when describing the overall pattern of a distribution of a quantitative variable what 3 features should you mention
shape center spread
what is meant by exploratory data analysis
Exploratory data analysis uses graphs and numerical summaries to describe the variables in a data set and the relations among them.
what is meant by the distribution of the variable
The distribution of a variable tells us what values the variable takes and how often it takes these values.
Explain why the median is resistant to extreme observations, but the mean is nonresistant.
The median is resistant because it is only based on the middle one or two observations of the ordered list. The mean is sensitive to the influence of a few extreme observations. Even if there are no outliers a skewed distribution will pull the mean toward the long tail.
When does standard deviation equal zero?
The standard deviation = 0 only when there is no spread. This happens only when all observations have the same value. Otherwise s > 0. As the observations become more spread out about their mean, s gets larger.
what does standard deviation measure? How do we calculate it?
The standard deviation is a measure of spread. It measures spread around the mean and should only be used when the mean is chosen as the measure of center.
What is the relationship between variance and standard deviation?
The standard deviation s is the square root of the variance s2.
what is the IQR based "rule of thumb"
a potential outliers is a data value that is a distance of more than 1.5 interquartile ranges below the first quartile or above the third quartile IQR=Q3-Q1. QI-1.5(1QR) AND Q3+1.5(IQR)
informally define an outlier
data value that is either much smaller or much larger than the rest of the data
list four graphs that are used for quantitative data
dot plot, histogram, stem and leaf plot
a data set always has a mode
false
what is a simple way to describe the center of a distribution of a quantitative variable
in a normal distribution the mean is the center. in a skewed it is the median
what information is lost when you choose a histogram over a dot plot or stem plot
individual data points
in a skewed distribution which will be farther towards the long tail the mean or the median
mean
which measure is most appropriate for a highly skewed distribution the mean or the median?
median
explain why the median is resistant to extreme observations, but the mean is non resistant
median will always be the middle value. outliers affect the mean
what is the five number summary
minimum, 1st quartile, median, 3rd quartile, maximum
in statistics what are the most common measures of the center
mode and mean
can the standard deviation ever be negative
no
can the value of the mean be identified from a box plot
no
is the standard deviation resistant or nonresistant to extreme observations
nonresistant, outliers do affect it
the mean and the median are close together if the distribution is what?
normal distribution
how do you describe the spread of a distribution of a quantitative variable
standard deviation
explain why it might be better to use the IQR instead of the range to describe the spread of the distribution
the IQR is not subject to peculiarities of the data set and it is not sensitive to outliers
what is the definition of the range
the distance spanned by the entire data set
what does standard deviation measure
the measure of the variance or spread
the middle line of a box plot represents
the median
what is the interquartile range
the range of the middle 50% of the data
the box of box plot contains about half the data
true
when a distribution is strongly skewed to the right the 5 number summer is a better measure of the center and spread than the mean and standard deviation
true
when a distribution is strongly skewed to the right, the median is less than the mean
true
when is it better to use the five number summary versus the mean and standard deviation
when the distribution is skewed left or skewed right