Ch 6: Continuous Probability Distributions
Standard Normal Distribution
A normal distribution with mean 0 and standard deviation 1, is called the standard normal distribution. It is often to use z curve to represent the standard normal curve. The standard normal distribution is important because it is also used in probability calculations for other normal distributions. When we are interested in a probability based on some normal curve, we first translate the problem into an "equivalent" problem that involves finding an area under the standard normal curve. A table for the standard normal distribution (next two slides) is then used to find the desired area.
Probability Distribution for a Continuous Random Variable
A probability distribution for a continuous random variable X is specified by a mathematical function denoted by f(x) which is called the density function. The graph of a density function is a smooth curve (the density curve). The following requirements must be met: 1. f(x) ≥ 0 2. The total area under the density curve is equal to 1. The probability that X falls in any particular interval is the area under the density curve that lies above the interval. This area specifies the proportion of the population values that fall in the corresponding interval, and it can be interpreted as the long-run proportion of time that a value in the interval would occur if individual after individual were randomly selected from the population Note: P(a < X < b) = P(a ≤ X < b) = P(a < X≤ b) = P(a ≤ X≤ b)
Variable
A variable associates a value with each individual or object in a population. A variable can be either categorical or numerical, depending on its possible values.
Example
Example: Two hundred packages shipped using the priority Mail rate for packages under 2 lb were weighed. Total area under the density curve -total area of triangle = 0.5 (base)(height) = 0.5(2)(1) = 1 Proportion of packages over 1.5 lb is the area of the shaded trapezoid in the figure: -P(x > 1.5) = 1 - P(x ≤ 1.5) = 1 - 0.5(1.5)(0.75) = 0.4375
General Normal Distribution Calculations Contiued
If a variable X has a normal distribution with mean μ and standard deviation σ, then the standardized variable has the normal distribution with mean 0 and standard deviation 1. This is called the standard normal distribution. The formula gives the number of standard deviations that x is from the mean, where μ is the true population mean μ and σ is the true population standard deviation.
Summary
Linked the basic ideas of probability with the techniques of statistical inference. Used probability to describe the long-run frequency of occurrence of various types of outcomes. Introduced probability models that can be used to describe the distribution of characteristics of individuals in a population.
Symmetry Property
P(Z > z*) = P(Z < - z*)
Two distributions with the same standard deviation but different means
Pink: larger mean
Two distributions with the same mean and different standard deviations
Pink: smaller standard deviation
Z-Score
Table entry is probability at or above z
Population Distribution
The distribution of all the values of a numerical variable or all the categories of a categorical variable is called a population distribution.
Mean & Standard Deviation
The mean value of a random variable x, denoted by μx, describes where the probability distribution of x is centered. The standard deviation of a random variable x, denoted by σx, describes variability in the probability distribution. 1. When σx is small, observed values of x will tend to be close to the mean value and 2. When σx is large, there will be more variability in observed values
Normal Distribution
The normal distribution is an example of a population distribution for a continuous random variable. Normal distributions are widely used for two reasons: 1. They provide a reasonable approximation to the distribution of many different variables. 2. They play a central role in many of the inferential procedures that will be discussed in later chapters. Normal distributions are bell shaped and symmetric. They are also referred to as normal curves.
Describing the Population Distribution: Limits
The probability that a continuous random variable x lies between a lower limit a and an upper limit b is P(a < X < b) = (cumulative area to the left of b) - (cumulative area to the left of a) = P(X < b) - P(X< a)
Normal Distributions Continued
There are many different normal distributions, and they are distinguished from one another by their mean μ and standard deviation σ. The mean μ describes where the curve is centered. The standard deviation σ describes how much the curve spreads out around that center. The total area under any normal curve is equal to one. Unimodal shape Approaches horizontal axis but never touches
General Normal Distribution Calculations
To calculate probabilities for any normal distribution, we standardize the relevant values and then use the table of z curve areas. More specifically, if X is a variable whose behavior is described by a normal distribution with mean μ and standard deviation σ, then P(X< b) = P(Z < b*) where Z is a variable whose distribution is standard normal and
Table
Using the table of standard normal curve areas: -For any number z* between 0 and 3.49 and rounded to two decimal places, Appendix Table A.3 gives (Area under z curve to the right of z*) = P(Z > z*) = P(Z ≤ z*) where the letter Z is used to represent a random variable whose distribution is the standard normal distribution. To find the probability that z is at or above z*, locate the following: 1. The row identified the digit to both sides of the decimal point (e.g. 1.7 if z* = 1.76). 2. The column identified with the second digit to the right of the decimal point in z* (e.g. 0.06 if z* = 1.76). The number at the intersection of this row and column is the desired probability, P(z > z*) (e.g. 0.039 if z* = 1.76).
Describing the Population Distribution
When a density histogram based on a small number of intervals is used to summarize a population distribution for a continuous variable, the histogram can be quite jagged. However, when the number of intervals is increased, the resulting histograms become much smoother in appearance. The area of the rectangle above each interval is equal to the relative frequency (probability) of values that fall in the interval, and the total area of the rectangles in a density histogram is equal to 1.