STA 215 Section 3.1-3.6 (Part 1)Descriptive Statistics: The five number summary and percentiles
Of the following options, which one include 50% of the data
Median to Maximum First Quartile to the Third Quartile the Minimum to the Median
Numerical Summaries of Center
NSC
Skewed Right (a.k.a. positively skewed)
, if most of the data occurs on the left (lower) side (i.e. most of the data values are small) with a long tail on the right, upper side.
Bell-Shaped
, if the data set is unimodal, roughly symmetric, with its peak at the center and the graph looks similar to a bell.
Steps for calculating standard deviation
1.) Calculate the sample mean, . 2.) For each observation, calculate the difference between the data value and the mean. 3.) Square each difference calculated in step 2. 4.) Sum the squared differences calculated in Step 3, and then divide the sum by n -1. The answer for this step is called the variance. 5.) Take the square root of the variance calculated in Step
What percent of data is located below the median?
50%
If I told you that you scored in the 90th percentile on the ACT math portion, what percent of students who took the ACT did you score higher than?
90%
In a symmetric distribution, the mean will be
About equal to the median
A histogram is useful for identifying shape, but they are capable of identifying the other three descriptions of a distribution, _________, _________ and ________.
Center, spread, and outliers
The maximum is the largest value. Unless there are outliers. Then it would be the largest number in the dataset that was not an outlier.
False
The minimum is the smallest value. Unless there are outliers. Then it would be the smallest number in the dataset that was not an outlier.
False
Variability
How much variation /spread is there in the data. Range & IQR & SD
The mean tends to be _______________________ outliers and skewness. However, it should be noted that if there are outliers at both ends of a distribution, they will tend to "cancel out" their effect on the mean.
Impacted
In a left skewed distribution, the mean will be
Less than the median
Descriptive Statistics
Quantitative data
Deviation
The distance of a data value from the mean Always zero example: x-bar=2.71 0-2.71 = -2.71 1-2.71 = -1.71 2-2.271 = 0.71
Median (NSC)
The median locates the center of the data and splits it in half.
Histograms
These are essentially bar graphs for quantitative date. Good for shape
Outliers
Values that stay far away from the rest if the data if present give the count and location. Who the person is not how many people
Measures of Variability
Variance and Standard Deviation IQR Range
Shape
What does the data look like?
Boxplot
a graphical display of the five number summary. Good for outliers
Variance
a measure of spread. It is the squared value of the standard deviation.
standard deviation
a measure of the spread of values. Another way to think of it is as roughly the average distance values fall from the mean SD is the square root if the variance.
Median
a value such that at least half of the data values are less than or equal to the median and at least half of the data values are greater than or equal to the median. We can think of the median as splitting the ordered data set in half.
The median tends to be a ________________ measure of center in the case of skewness or outliers.
better
Simple______ does not show outliers
boxplot
We calculate the deviation of a data value as: ___________________________________, where xi is the data value and x-bar is the mean.
deviation for xi=xi-x-bar
Interquartile Range (IQR)
distance from Q3 to Q1 Q3-Q1 *Middle 50%
Range
distance from the maximum to the minimum. MAX-MIN
The mean will be approximately ________________ to the median in a symmetric distribution.
equal
The mean will be _______________ than the median in a right skewed distribution.
greater
Skewed Left (a.k.a. negatively skewed)
if most of the data occurs on the right (upper) side (i.e. most of the data values are large) with a long tail on the left, lower side
Unimodal Distribution
if the data set has a single peak
Bimodal Distribution
if the data set has two distinct peaks separated by a valley.
Symmetric Distribution
if when you draw a line at the center of the distribution the two halves are mirror images. Real data is almost never perfectly symmetric but is often roughly symmetric.
First Quartile
is a value such that at least 25% of the data values are less than or equal to Q1 and at least 75% of the data values are greater than or equal to Q1. We can think of Q1 as splitting the lower 50% of the ordered data set in half.
Third Quartile
is a value such that at least 75% of the data values are less than or equal to Q3 and at least 25% of the data values are greater than or equal to Q3. We can think of Q3 as splitting the upper 50% of the ordered data set in half.
Five-Number Summary
is the minimum (abbreviated min), the first quartile (denoted Q1), the median (abbreviated med), the third quartile (denoted Q3), and the maximum (abbreviated max). MAX=25% Q3=25% M=25% Q1=25% MIN=25%
The larger the area the ________ the variation of the data
larger
The mean will be _______________ than the median in a left skewed distribution
less
Describing a distribution (graph)
looking at the the movement
The ________ of a sample is denoted by X-BAR and is calculated as: X1+X2+.....+Xn/n , where n is the sample size and x1 is the first value, x2 is the second value and so on.
mean
The mean is _________ robust in the presence of outliers or skewness.
not
The mean does _________ have to have the data ordered, but the median does have to have the data ______________.
not; ordered
Modified boxplot shows ________
ouliers
The median is more ________________ in the presence of outliers or skewness
robust (not affected by outliers and skewness)
Percentiles
set is a value such that at least p% of the data values are less than or equal to the pth percentile.
The smaller the the boxplot the __________ the varibaility of thedata
smaller
s
standard deviation of a a sample (statistic)
Mean (NSC)
the arithmetic average of the data values
Maximum
the largest value in a data set
Minimum
the smallest value in a data set
s^2
variance of a sample (statistic)
Center
what is the typical value. Mean or Median
Is it possible for a variable to have a distribution that is both unimodal and left skewed?
yes