3112 lab 5 THE NORMAL DISTRIBUTION AND Z SCORES
Normal distribution, Kuibyshev's theorem, and Z scores to determine percentage of data that falls in given area
Once we've got our heads around the normal distribution, Kuibyshev's theorem and z scores , we can use them to determine the percentage of our data that falls in a given area of our distribution. In order to do that, we need the cumulative z table, which I have posted on Canvas. The cumulative z table tells us what percentage of the distribution falls to the left of a given z score. I know that the table looks pretty intimidating, so we'll spend a significant amount of time going over this in class.
The Normal Distribution
The normal distribution is a bell-shaped, symmetrical distribution in which the mean, median and mode are all equal. If the mean, median and mode are unequal, the distribution will be either positively or negatively skewed.
main points
The normal distribution is a symmetrical, bell-shaped distribution in which the mean, median and mode are all equal. It is a central component of inferential statistics. The standard normal distribution is a normal distribution represented in z scores. It always has a mean of zero and a standard deviation of one. We can use the standard normal table to calculate the area under the curve between any two points
LEARNING OBJECTIVES
Understand the properties of the normal distribution and its importance to inferential statistics Convert a raw score to a z score and vice versa Familiarize yourself with the standard normal table Convert a z score into a proportion (or percentage) and vice versa
z scores back to raw scores
We simply multiply the z score by the standard deviation and add that to the mean. So if we plug the numbers from our example into the formula we get: Raw score = 58 + 1(5) = 63
OVERVIEW
What is a distribution? A distribution is an arrangement of values of a variable showing their observed or theoretical frequency of occurrence. A bell curve showing how the class did on our last exam would be an example of a distribution. All distributions can be characterized by the following two dimensions: 1. Central Tendency—what are the mean, median and mode(s) of the distribution? 2. Variability—all distributions have a variance and standard deviation (they also have a range and IQR, but those are less important in inferential statistics).
The Normal Distribution and the Standard Deviation
When talking about the normal distribution, it's useful to think of the standard deviation as being steps away from the mean. One step to the right or one step to the left is considered one standard deviation away from the mean. Two steps to the left or two steps to the right are considered two standard deviations away from the mean. Likewise, three steps to the left or three steps to the right are considered three standard deviations from the mean. The standard deviation of a dataset is simply the number (or distance) that constitutes a complete step away from the mea
Normal distribution and chebyshev's theorem: Percentage of distribution that falls in given area is = the probability that a single observation will fall in that area
Fun fact: the percentage of our distribution that falls in a given area is exactly the same as the probability that any single observation will fall in that area. In other words, we know that approximately 34 percent of our data will fall between the mean and one standard deviation above the mean. We can also say that a given observation has a 34 percent chance of falling between the mean and one standard deviation above the mean. Or, to put it another way, if you were to choose an observation at random from our distribution, there is a 34 percent chance that it would come from the area between the mean and one standard deviation above the mean.
KEY TERMS
Normal distribution: a bell-shaped, symmetrical distribution in which the mean, median and mode are all equal Z scores (also known as standard scores): the number of standard deviations that a given raw score falls above or below the mean Standard normal distribution: a normal distribution represented in z scores. The standard normal distribution always has a mean of zero and a standard deviation of one
Z Scores
Z scores, which are sometimes called standard scores, represent the number of standard deviations a given raw score is above or below the mean. Sometimes it's helpful to think of z scores as just another unit of measurement. If, for example, we were measuring time, we could express time in terms of seconds, minutes, hours or days. Similarly we could measure distance in terms of inches, feet, yards or miles. We might have to do a little math to convert our data from one unit of measurement to another, but the thing we are measuring remains unchanged. When we work with z scores, we're basically converting our existing data into a new unit of measurement: standard deviation units. All interval/ratio data can be expressed as z scores. We can convert any raw score into z scores by using the following formula: z= y- y(with a bar over it)/S(sub y)
Normal distributions are important due to Chebyshev's Theorem
for a normal distribution a given standard deviation above and/or below the mean will always account for the same amount of area under the curve. Regardless of what a normal distribution looks like or how big or small the standard deviation is, approximately 68 percent of the observations (or 68 percent of the area under the curve) will always fall within two standard deviations (one above and one below) of the mean. Can you guess what proportion falls between the mean and just one standard deviation above it? If you guessed 34, you must be familiar with division (.68/2 = .34). two standard deviations above and below the mean. For any normal distribution, approximately 95 percent of the observations will fall within this area. EXAMPLE For our distribution with a mean of 58 and a standard deviation of 5, 68% of the data would be located between 53 and 63. Within this range are all of the data values located within one standard deviation (above or below) of the mean. Furthermore, 95% of the data would fall within two standard deviations of the mean, or in this case between 48 and 68. Finally, 99.7% of the data values would fall between 43 and 73, or within three standard deviations of the mean. The percentages mentioned here make up what some statisticians refer to as the 68%-95%-99.7% rule. These percentages remain the same for all normally distributed data
z score formula
z=(x-mean)/standard deviation In other words, we just need to subtract the mean from the raw score and divide by the standard deviation. Let's go back to our distribution with a mean of 58 and a standard deviation of 5. We can convert 63 (a raw score) into standard deviation units (z scores) fairly easily: 63-58/5 = 5/5 = 1 Just as one hour is equal to 60 minutes, a raw score of 63 in this distribution is equal to one standard deviation. The same holds true for observations below the mean: 53-58/5 = -5/5 = -1 In this case, because our answer is negative, we know that 53 falls exactly one standard deviation below the mean. Now suppose we wanted to convert our mean (58) into a z score: 58-58/5 = 0/5 = 0 When we convert our data into z scores, the mean will always end up being zero (it is, after all, zero steps away from itself) and the standard deviation will always be one. Data expressed in terms of z scores are known as the standard normal distribution.
