Stats Test 2
Percentile of data value =
# of value less than this data value/ total # of values in data set multiplied by 100%
s² = variance
(standard deviation)²
conditions for a normal distribution
- most data values are clustered near the mean, giving the distribution a well-defined single peak. -data values are spread evenly around the mean, making the distribution symmetric -larger deviations from the mean become increasingly rare, producing the tapering tails of the distribution -individual data values result from a combination of many different factors, such as genetic and environmental factors.
68-95-99.7 Rule
-About 68% (68.3 precisely), or just over 2/3, of the data points fall within 1 standard deviation of the mean -About 95% (95.4 precisely), of the data points fall w/n 2 standard deviations of the mean. - About 99.7% of the data points fall withing 3 standard deviations of the mean.
Normal Distribution
-distribution is single peaked -distribution symmetric around the single peak -distribution is spread out in a way that makes it resemble the shape of a bell -peak corresponds to the mean, median, and mode of the distribution -variation can be characterized by the standard deviation of the distribution
When does the Central Limit Theorem apply?
1. applies for suitably large sample sizes. A common threshold is n>30 2. applies to variables with any distribution (not necessarily a normal distribution)
Central Limit Theorem
1. the distribution of means will be approximately a normal distribution for large sample sizes 2. the mean of the distribution of means approaches the population mean, m, for large sample sizes. 3. The standard deviation of the distribution of means approaches σ/√n for large sample sizes, where s is the standard deviation of the population.
unimodal
A distribution with one mode
trimodal
A distribution with three modes
bimodal
A distribution with two modes
Normal distribution
A specific category of distributions that are symmetric and bell shaped with a single peak. The peak corresponds to the mean, median, and mode of such a distribution.
Boxplot
Draw a # line that spans all the values in the data set. Enclose the values from the lower to the upper quartile in a box. Draw a line through the box at the median Add "whiskers" extending to the low and high values.
In a recent year, the 952 players in a certain sports league had salaries with the characteristics below. The mean was 4,596,061. The median was $1,525,000. The salaries ranged from a low of $508,000 to a high of $30,000,000. a. Describe the shape of the distribution of salaries. Is the distribution symmetric? Is it left-skewed? Is it right-skewed?
It is right skewed
What does the area under the normal distribution curve represent. What is the total area under the normal distribution curve?
The area that lies under the normal distribution curve corresponding to a range of values on the horizontal axis is the total relative frequency of those values. Because the total relative frequency for all values must be 1 (100%), the total area under the normal distribution curve must equal 1 (100%)
Suppose that many random samples of size n for a variable are taken and the distribution of means of each sample is recorded. What is true regarding the Central Limit Theorem?
The standard deviation of the distribution of means approaches σ/√n, where σ is the standard deviation of the population. -The distribution of means will be approximately a normal distribution. -The mean of the distribution of means approaches the population mean, µ.
Does this make sense? The distribution of grades was left skewed, but the mean, median, and mode were all the same.
This does not make sense because the mean and median should lie somewhere to the left of the mode if the distribution is left skewed.
My professor graded the final score on a curve, and she gave a grade of A to anyone who had a standard score of 2 or more..... Does this make sense? Why?
This makes sense because a standard score of 2 or more corresponds to roughly the 97th percentile. This means the lowest test scores are getting curved up to be the highest test scores.
weighted mean
account for variations in the relative importance of data values. Each data value is assigned a weight and the weighted mean is weighted mean = sum of(each data values x its weight)/sum of all weights
upper quartile (third quartile) or Q₃
divides the lowest 3/4 of a data set from the upper 1/4. it is the median of the data values in the upper half of a data set. (exclude the middle value in the data set if the # of data points is odd.)
All data values in a uniform distribution have the same frequency, whereas a distribution with one or more modes
has one or more values that occur most frequently.
outlier
in a data set is a value that is much higher or lower than almost all others
mean
mean = sum of values/total # of values
range
of a set of data values is the difference between its highest and lowest data values range = highest value (max) - lowest value (min)
An ______ in a data set is a value that is much higher than almost all other values. An _______ can change the median of a data set but does not affect the mean or mode.
outlier, outlier
The standard deviation is approximately r/t to the range of a distribution by the
range rule of thumb
range rule of thumb
standard deviation(s) ≈ range / 4
(S) Standard deviation =
sum of (deviations from the mean)²/total # of data values-1 and then get the square root of final number
relative frequency
the area that lies under the normal distribution curve corresponding to a range of values on the horizontal axis - because the total relative frequency must be 1, the total area under the normal distribution curve must equal 1, or 100%
lower quartile (first quartile) or Q₁
the median of the data values in the lower half of a data set. (exclude the middle value in the data set if the # of data points is odd) divides the lowest 1/4 of a data set from the upper 3/4.
median
the middle value in the sorted data set (or halfway between the two middle values if the # of values is even)
mode
the most common value (or group of values) in a data set.
standard score
the number of standard deviations a data value lies above or below the mean
middle quartile (second quartile) or Q₂
the overall median
All data values in a ______ ______ have the same frequency, whereas a distribution with one or more modes has one or more values that occur most frequently.
uniform distribution
unusual values
values that are more than 2 standard deviations from the mean.
Simpson's paradox
when a set of data gives different results for each of several group comparisons than it does when the groups are taken together, this phenomenon is known as _____ _______.
z=standard score
z=data value - mean/standard deviation
Computing Standard Scores
z=standard score=data value - mean/standard deviation