Soc 106 - Ch 4 - Probability D.
PROBABILITY DISTRIBUTIONS FOR DISCRETE VARIABLES
assigns a prob. to each possible value of the variable. Each prob is a value b/w 0 and 1 - the sum of the probabilities of all possible values equals 1 Let P(y) denote the probability of a possible outcome for a variable y. Then, where the sum is over all the possible values of the variable.
probability distributions for discrete and continuous variables
A variable can take at least two different values. For a random sample or randomizedexperiment, each possible outcome has a probability that it occurs. The variable it-self is sometimes then referred to as a random variable. This terminology emphasizes that the outcome varies from observation to observation according to random variation that can be summarized by probabilities. A probability distribution lists the possible outcomes and their probabilities.
the normal probability distribution - Ex.
For example, heights of adult females in North America have approximately anormal distribution with μ = 65.0 inches and σ = 3.5. The probability is nearly1.0 that a randomly selected female has height between μ − 3σ = 65.0 − 3(3.5) =54.5 inches and μ+3σ = 65.0+3(3.5) = 75.5 inches. Adult male height has a normaldistribution with μ = 70.0 and σ = 4.0 inches. So, the probability is nearly 1.0 that arandomly selected male has height between μ −3σ = 70.0 −3(4.0) = 58 inches andμ + 3σ = 70.0 + 3(4.0) = 82 inches. See Figure 4.4.
MEAN AND STANDARD ERROR OF SAMPLING DISTRIBUTION OF¯y
The sample mean¯y is a variable, because its value varies from sample to sample.For random samples, it fluctuates around the population mean μ, sometimes beingsmaller and sometimes being larger. In fact, the mean of the sampling distribution of¯y equals μ. If we repeatedly took samples, then in the long run, the mean of the sample means would equal the population mean μ.The spread of the sampling distribution of¯y is described by its standard deviation, which is called the standard error of¯y (y-bar) Standard Error: The standard deviation of the sampling distribution of¯y is called the standard error of¯y and is denoted by σ¯y. The standard error describes how much¯y varies from sample to sample. Suppose we repeatedly selected samples of size n from the population, finding¯y for each set of n observations. Then, in the long run, the standard deviation of the¯y-values would equal the standard error. The symbol σ¯y(instead of σ ) and the terminology standard error (instead of standard deviation) distinguish this measure from the standard deviation σ of the population distribution. In practice, we do not need to take samples repeatedly to find the standard error of¯y, because a formula is available. For a random sample of size n, the standard errorof¯y depends on n and the population standard deviation σ by Figure 4.12 displays a population distribution having σ = 10 and shows thesampling distribution of¯y for n = 100. When n = 100, the standard error isσ¯y= σ/√n = 10/√100 = 1.0. The sampling distribution has only a tenth of thespread of the population distribution. This means that individual observations tendto vary much more than sample means vary from sample to sample. In summary, the following result describes the center and spread of the sampling distribution of¯y: Mean and Standard Error of y-bar: For sampling a population, the sampling distribution of¯y states theprobabilities for the possible values of¯y. For a random sample of size nfrom a population having mean μ and standard deviation σ , the samplingdistribution of¯y has mean μ and standard error σ¯y= σ/√n.
sampling distribution
fundamentally important type of probability distribution that we needto conduct statistical inference. It enables us to predict how close a sample mean falls to the population mean. The main reason for the importance of the normal distribution is the remarkable resultthat sampling distributions are usually bell shaped.
probability distributions
provide probabilities for all the possible outcomes of a variable
probability
statistical science is young. Methods of statisticalinference were developed within the past century. By contrast, probability, the subject ofthis chapter, has a long history. For instance, mathematicians used probability in France in the sev-enteenth century to evaluate various gambling strategies. Probability is a highly developed subject,but this chapter limits attention to the basics that we'll need for statistical inference
Basic Probability Rules
(4) rules for finding probabilities P(not A) = 1 − P(A). - If you know the probability a particular outcome occurs, then the probabilityit does not occur is 1 minus that probability. Suppose A represents the outcomethat a randomly selected person favors legalization of same-sex marriage. IfP(A) = 0.66, then 1 − 0.66 = 0.34 is the probability that a randomly selectedperson does not favor legalization of same-sex marriage If A and B are distinct possible outcomes (with no overlap), then P(A or B) =P(A) + P(B). (or = +) - In a survey to estimate the population proportion of people who favor le-galization of marijuana, let A represent the sample proportion estimate be-ing much too low, say more than 0.10 below the population proportion. Let B represent the sample proportion estimate being much too high—at least 0.10 above the population proportion. These are two distinct possible outcomes.From methods in this chapter, perhaps P(A) = P(B) = 0.03. Then, the over-all probability the sample proportion is in error by more than 0.10 (without specifying the direction of error) is P(A or B) = P(A) + P(B) = 0.03 + 0.03 = 0.06. If A and B are possible outcomes, then P(A and B) = P(A) × P(B given A). (and = multiply) - From U.S. Census data, the probability that a randomly selected American adult is married equals 0.56. Of those who are married, General Social Surveys estimate that the probability a person reports being very happy when asked to choose among (very happy, pretty happy, not too happy) is 0.40; that is, given you are married, the probability of being very happy is 0.40. So, P(married and very happy) =P(married) × P(very happy given married) = 0.56 × 0.40 = 0.22 - About 22% of the adult population is both married and very happy. The prob-ability P(B given A) is called a conditional probability and is often denotedby P(B | A). In some cases, A and B are "independent," in the sense that whether one occurs does not depend on whether the other does. That is, P(B given A) =P(B), so the previous rule simplifies: If A and B are independent, then P(A and B) = P(A) × P(B). - For example, suppose that 60% of a population supports a carbon tax to diminish impacts of carbon dioxide levels on global warming. In random sam-pling from that population, let A denote the probability that the first personsampled supports the carbon tax and let B denote the probability that the sec-ond person sampled supports it. Then P(A) = 0.60 and P(B) = 0.60. With random sampling, successive observations are independent, so the probability that both people support a carbon tax is P(A and B) = P(A) × P(B) = 0.60 × 0.60 = 0.36 This extends to multiple independent events. For 10 randomly sampled people,the probability that all 10 support a carbon tax is 0.60 × 0.60 ×···×0.60 =(0.60)10= 0.006.
Review
3 types of distributions Population distribution: This is the distribution from which we select the sam-ple. It is usually unknown. We make inferences about its characteristics, suchas the parameters μ and σ that describe its center and spread. Sample data distribution: This is the distribution of data that we actually ob-serve, that is, the sample observations y1, y2,...,yn. We describe it by statisticssuch as the sample mean¯y and sample standard deviation s. The larger thesample size n, the closer the sample data distribution resembles the populationdistribution, and the closer the sample statistics such as¯y fall to the populationparameters such as μ. Sampling distribution of a statistic: This is the probability distribution for thepossible values of a sample statistic, such as¯y. A sampling distribution describesthe variability that occurs in the statistic's value among samples of a certainsize. This distribution determines the probability that the statistic falls within acertain distance of the population parameter it estimates ex's - p. 92 The Central Limit Theorem states that for large random samples on a variable,the sampling distribution of the sample mean is approximately a normal dis-tribution. This holds no matter what the shape of the population distribution,both for continuous variables and for discrete variables. The result applies alsoto proportions, since the sample proportion is a special case of the sample meanfor observations coded as 0 and 1 (such as for two candidates in an election).
example for probability distribution for discrete variables
4.1 Ideal Number of Children for a Family Let y denote the response to the question"What do you think is the ideal number of children for a family to have?" This is adiscrete variable, taking the possible values 0, 1, 2, 3, and so forth. According to re-cent General Social Surveys, for a randomly chosen person in the United States theprobability distribution of y is approximately as Table 4.1 shows. The table displaysthe recorded y-values and their probabilities. For instance, P(4), the probability thaty = 4 children is regarded as ideal, equals 0.12. Each probability in Table 4.1 is be-tween 0 and 1, and the sum of the probabilities equals 1. A histogram can portray the probability distribution. The rectangular bar overa possible value of the variable has height equal to the probability of that value.Figure 4.1 is a histogram for the probability distribution of the ideal number of chil-dren, from Table 4.1. The bar over the value 4 has height 0.12, the probability of the outcome 4.
Example 4.3
4.3 Finding the 99th Percentile of IQ Scores Stanford-Binet IQ scores have approximately a normal distribution with mean = 100 and standard deviation = 16. What isthe 99th percentile of IQ scores? In other words, what is the IQ score that falls above 99% of the scores? To answer this, we need to find the value of z such that μ + zσ falls above 99%of a normal distribution. Now, for μ +zσ to represent the 99th percentile, the prob-ability below μ + zσ must equal 0.99, by the definition of a percentile. So, 1% ofthe distribution is above the 99th percentile. The right-tail probability equals 0.01, as Figure 4.7 shows. With Table A, software, or the Internet, you can find that the z-value for a cu-mulative probability of 0.99 or right-tail probability of 0.01 is z = 2.33. Thus, the99th percentile is 2.33 standard deviations above the mean. In summary, 99% of anynormal distribution is located below μ + 2.33σ . For IQ scores with mean = 100 and standard deviation = 16, the 99th percentile equals μ + 2.33σ = 100 + 2.33(16) = 137. That is, about 99% of IQ scores fall below 137.
Example4.9
4.9Is Sample Mean Income of Migrant Workers Close to Population Mean? For the pop-ulation of migrant workers doing agricultural labor in Florida, suppose that weeklyincome has a distribution that is skewed to the right with a mean of μ = $380 and astandard deviation of σ = $80. A researcher, unaware of these values, plans to ran-domly sample 100 migrant workers and use the sample mean income¯y to estimate μ.What is the sampling distribution of the sample mean? Where is¯y likely to fall, rel-ative to μ? What is the probability that¯y overestimates μ by more than $20, fallingabove $400?By the Central Limit Theorem, the sampling distribution of the sample mean¯y is approximately normal, even though the population distribution is skewed. The p. 90 - 91
SIMULATING THE SAMPLING PROCESS
A simulation can show us how close an exit poll result tends to be to the populationproportion voting for a candidate. One way to simulate the vote of a voter randomlychosen from the population is to select a random number using software. Supposeexactly 50% of the population voted for Brown and 50% voted for the Republicancandidate, Neel Kashkari. Identify all 50 two-digit numbers between 00 and 49 asDemocratic votes and all 50 two-digit numbers between 50 and 99 as Republicanvotes. Then, each candidate has a 50% chance of selection on each choice of two-digitrandom number. For instance, the first two digits of the first column of the randomnumbers table on page 15 provide the random numbers 10, 53, 24, and 42. So, of thefirst four voters selected, three voted Democratic (i.e., have numbers between 00 and49) and one voted Republican. Selecting 1824 two-digit random numbers simulatesthe process of observing the votes of a random sample of 1824 voters of the muchlarger population (which is actually treated as infinite in size). When we performed this simulation, we got 901 heads (Democratic votes)and 923 tails (Republican votes). The sample proportion of Democratic votes was901/1824 = 0.494, quite close to the population proportion of 0.50. This particularestimate was good. Were we merely lucky? We repeated the process and simulated 1824 more flips. (In this app, click again on Simulate.) This time the sample propor-tion of Democratic votes was 0.498, also quite good.Using software,6we next performed this process of picking 1824 people 10,000times so that we could search for a pattern in the results. Figure 4.10 shows a his-togram of the 10,000 values of the sample proportion. Nearly all the simulated pro-portions fell between 0.46 and 0.54, that is, within 0.04 of the population proportionof 0.50. Apparently a sample of size 1824 provides quite a good estimate of a popu-lation proportion. 1824 more flips. (In this app, click again on Simulate.) This time the sample propor-tion of Democratic votes was 0.498, also quite good.Using software,6we next performed this process of picking 1824 people 10,000times so that we could search for a pattern in the results. Figure 4.10 shows a his-togram of the 10,000 values of the sample proportion. Nearly all the simulated pro-portions fell between 0.46 and 0.54, that is, within 0.04 of the population proportionof 0.50. Apparently a sample of size 1824 provides quite a good estimate of a popu-lation proportion. You can perform this simulation using any population proportion value, corre-sponding to flipping a coin in which head and tail have different probabilities. Forinstance, you could simulate sampling when the population proportion voting forthe Democrat is 0.45 by changing the probability of a head in the applet to 45%.Likewise, we could change the size of each random sample in the simulation to studythe impact of the sample size. From results of the next section, for a random sampleof size 1824 the sample proportion has probability close to 1 of falling within 0.04 ofthe population proportion, regardless of its value.
sampling distributions of sample means
Because the sample mean¯y is used so much, with the sample proportion also being asample mean, its sampling distribution merits special attention. In practice, when we analyze data and find¯y, we do not know how close it falls to the population mean μ, because we do not know the value of μ. Using information about the spread of the sampling distribution, though, we can predict how close it falls. For example, the sampling distribution might tell us that with high probability,¯y falls within 10 units of μ.This section presents two main results about the sampling distribution of the sample mean. One provides formulas for the center and spread of the sampling dis-tribution. The other describes its shape
PROBABILITY DISTRIBUTIONS FOR CONTINUOUS VARIABLES
Continuous variables have an infinite continuum of possible values. Probability distributions of continuous variables assign probabilities to intervals of numbers. The probability that a variable falls in any particular interval is between 0 and 1, and the probability of the interval containing all the possible values equals 1 A graph of the probability distribution of a continuous variable is a smooth,continuous curve. The area under the curve1for an interval of values represents theprobability that the variable takes a value in that interval. ex. A recent U.S. Census Bureau study about commutingtime for workers in the United States who commute to work2measured y = traveltime, in minutes. The probability distribution of y provides probabilities such asP(y < 15), the probability that travel time is less than 15 minutes, or P(30 < y < 60),the probability that travel time is between 30 and 60 minutes.Figure 4.2 portrays the probability distribution of y. The shaded area in the figurerefers to the region of values higher than 45. This area equals 15% of the total areaunder the curve, representing the probability of 0.15 that commuting time is morethan 45 minutes. Those regions in which the curve has relatively high height have thevalues most likely to be observed.
PROBABILITY AS A LONG-RUN RELATIVE FREQUENCY
For a particular possible outcome for a random phenomenon, the probability ofthat outcome is the proportion of times that the outcome would occur in a very longsequence of observations Probability: With a random sample or randomized experiment, the probability that an observation has a particular outcome is the proportion of times that outcome would occur in a very long sequence of like observations. Why does probability refer to the long run? Because when you do not al-ready know or assume some value for a probability, you need a large number of observations to accurately assess it. If you sample only 10 people and they are all right-handed, you can't conclude that the probability of being right-handed equals 1.0. This book defines a probability as a proportion, so it is a number between 0 and 1. In practice, probabilities are often expressed also as percentages, then fallingbetween 0 and 100. For example, if a weather forecaster says that the probability of rain today is 70%, this means that in a long series of days with atmospheric conditions like those today, rain occurs on 70% of the days This long-run approach is the standard way to define probability. This definitionis not always applicable, however. It is not meaningful, for instance, for the proba-bility that human beings have a life after death, or the probability that intelligent lifeexists elsewhere in the universe. If you start a new business, you will not have a longrun of trials with which to estimate the probability that the business is successful. Youmust then rely on subjective information rather than solely on objective data. In thesubjective approach, the probability of an outcome is defined to be your degree ofbelief that the outcome will occur, based on the available information, such as datathat may be available from experiences of others. A branch of statistical science usessubjective probability as its foundation. It is called Bayesian statistics, in honor of aneighteenth-century British clergyman (Thomas Bayes) who discovered a probabilityrule on which it is based. We introduce this alternative approach in Section 16.8.
z-Scores and the Standard Normal Distribution
If a variable has a normal distribution, and if its values are converted to z-scores by subtracting the mean and dividing by the standard deviation, then the z-scores have the standard normal distribution. Suppose we convert each SAT score y to a z-score by using z = (y − 500)/100.For instance, y = 650 converts to z = 1.50, and y = 350 converts to z =−1.50. Then,the entire set of z-scores has a normal distribution with a mean of 0 and a standarddeviation of 1. This is the standard normal distribution. Many inferential methods convert values of statistics to z-scores and then to nor-mal curve probabilities. We use z-scores and normal probabilities often throughout the rest of the book.
FINDING NORMAL PROBABILITIES: TABLES, SOFTWARE,AND APPLETS
For the normal distribution, for each fixed number z, the probability that is withinz standard deviations of the mean depends only on the value of z. This is the areaunder the normal curve between μ −zσ and μ +zσ . For every normal distribution,this probability is 0.68 for z = 1, 0.95 for z = 2, and nearly 1.0 for z = 3. For a normal distribution, the probability concentrated within zσ of μ is the samefor all normal curves even if z is not a whole number—for instance, z =1.43 instead of1, 2, or 3. Table A, also shown next to the inside back cover, determines probabilities for any region of values. It tabulates the probability for the values falling in the righttail, at least z standard deviations above the mean. The left margin column of thetable lists the values for z to one decimal point, with the second decimal place listedabove the columns.Table 4.2 displays a small excerpt from Table A. The probability for z = 1.43 fallsin the row labeled 1.4 and in the column labeled .03. It equals 0.0764. This means that for every normal distribution, the right-tail probability above μ + 1.43σ (i.e., more than 1.43 standard deviations above the mean) equals 0.0764. Since the entries in Table A are probabilities for the right half of the normal dis-tribution above μ +zσ , they fall between 0 and 0.50. By the symmetry of the normalcurve, these right-tail probabilities also apply to the left tail below μ − zσ . For ex-ample, the probability below μ −1.43σ also equals 0.0764. The left-tail probabilities are called cumulative probabilities. We subtract the cumulative probability from 1 to find the right-tail probability aboveμ +2.0σ . That is, the probability 1 −0.97725 = 0.02275 falls more than two standarddeviations above the mean. By the symmetry of the normal distribution, this is alsothe probability falling more than two standard deviations below the mean. The prob-ability falling within two standard deviations of the mean is 1 − 2(0.02275) = 0.954.(Here, we've used rule (1) of the probability rules at the end of Section 4.1, that P(notA) = 1− P(A).) You can also find normal probabilities with SPSS and SAS software p. 74
SAMPLING DISTRIBUTION OF SAMPLE MEAN IS APPROXIMATELY NORMAL
For the population distribution for the vote in an election, shown in Figure 4.13, the outcome has only two possible values. It is highly discrete. Nevertheless, the two sampling distributions shown in Figure 4.14 have bell shapes. This is a consequenceof the second main result of this section, which describes the shape of the samplingdistribution of¯y. This result can be proven mathematically, and it is often called the Central Limit Theorem Central Limit Theorem: - For random sampling with a large sample size n, the sampling distributionof the sample mean¯y is approximately a normal distribution. Here are some implications and interpretations of this result: The bell shape of the sampling distribution applies no matter what the shapeof the population distribution. This is remarkable. For large random samples,the sampling distribution of¯y has a normal bell shape even if the populationdistribution is very skewed or highly discrete such as the binary distribution inFigure 4.13. We'll learn how this enables us to make inferences even when thepopulation distribution is highly irregular. This is helpful, because many socialscience variables are very skewed or highly discrete.Figure 4.15 displays sampling distributions of¯y for four different shapes forthe population distribution, shown at the top of the figure. Below them are portrayed the sampling distributions for random samples of sizes n = 2, 5, and 30. As n increases, the sampling distribution has more of a bell shape. How large n must be before the sampling distribution is bell shaped largelydepends on the skewness of the population distribution. If the population dis-tribution is bell shaped, then the sampling distribution is bell shaped for allsample sizes. The rightmost panel of Figure 4.15 illustrates this. More skeweddistributions require larger sample sizes. For most cases, n of about 30 is suf-ficient (although it may not be large enough for precise inference). So, inpractice, with random sampling the sampling distribution of¯y is nearly alwaysapproximately bell shaped. Knowing that the sampling distribution of¯y can be approximated by a normaldistribution helps us to find probabilities for possible values of¯y. For instance,¯y almost certainly falls within 3σ¯y= 3σ/√n of μ. Reasoning of this nature isvital to inferential statistical methods
USING Z-SCORES TO FIND PROBABILITIES OR y-VALUES
Here's a summary of how we use z-scores: If we have a value y and need to find a probability, convert y to a z-score using z = (y −μ)/σ , and then convert z to the probability of interest using a table of normal probabilities, software, or the Internet. If we have a probability and need to find a value of y, convert the probabilityto a tail probability (or cumulative probability) and find the z-score (using anormal table, software, or the Internet), and then evaluate y = μ + zσ . For example, we used the equation z = (y −μ)/σ to determine how many stan-dard deviations a SAT test score of 650 fell from the mean of 500, when σ = 100(namely, 1.50). Example 4.3 used the equation y = μ + zσ to find a percentile scorefor a normal distribution of IQ scores
Constructing a Sampling Distribution Ex.
It is sometimes possible to construct the sam-pling distribution without resorting to simulation or complex mathematical deriva-tions. To illustrate, we construct the sampling distribution of the sample proportionfor an exit poll of n = 4 voters from a population in which half voted for each candi-date. (Such a small n would not be used in practice, but it enables us to more easilyexplain this process.)We use a symbol with four entries to represent the votes for a potential sampleof size 4. For instance, (R, D, D, R) represents a sample in which the first and fourthsubjects voted for the Republican and the second and third subjects voted for theDemocrat. The 16 possible samples are When half the population voted for each candidate, the 16 samples are equally likely.Let's construct the sampling distribution of the sample proportion that voted forthe Republican candidate. For a sample of size 4, that proportion can be 0, 0.25, 0.50,0.75, or 1.0. The proportion 0 occurs with only one of the 16 possible samples, (D, D,D, D), so its probability equals 1/16 = 0.0625. The proportion 0.25 occurs for foursamples, (R, D, D, D), (D, R, D, D), (D, D, R, D), and (D, D, D, R), so its probabilityequals 4/16 = 0.25. Based on this reasoning, Table 4.3 shows the probability for each possible sample proportion value. refer to pg 84
FINDING z-VALUES FOR CERTAIN TAIL PROBABILITIES
Many inferential methods use z-values corresponding to certain normal curve prob-abilities. This entails the reverse use of Table A or software or applets. Starting with atail probability, we find the z-value that provides the number of standard deviationsthat that number falls from the mean. To illustrate, let's first use Table A to find the z-value having a right-tail proba-bility of 0.025. We look up 0.025 in the body of Table A, which contains tail proba-bilities. It corresponds to z = 1.96 (i.e., we find .025 in the row of Table A labeled1.9 and in the column labeled .06). This means that a probability of 0.025 falls aboveμ + 1.96σ . Similarly, a probability of 0.025 falls below μ − 1.96σ . So, a total proba-bility of 0.025 +0.025 = 0.050 falls more than 1.96σ from μ We saw in the previous subsection that 95% of a normal distribution falls within two standard deviations of the mean. More precisely, 0.954 falls within 2.00 standard deviations, and here we'veseen that 0.950 falls within 1.96 standard deviations.
THE STANDARD NORMAL DISTRIBUTION
Many inferential statistical methods use a particular normal distribution, called the standard normal distribution The standard normal distribution is the normal distribution with mean μ = 0 and standard deviation σ = 1. For the standard normal distribution, the number falling z standard deviations above the mean is μ +zσ = 0 +z(1) = z. It is simply the z-score itself. For instance,the value of 2 is two standard deviations above the mean, and the value of −1.3is1.3standard deviations below the mean. The original values are the same as the z-scores.See Figure 4.8. When the values for an arbitrary normal distribution are converted to z-scores,those z-scores are centered around 0 and have a standard deviation of 1. The z-scores have the standard normal distribution.
NORMAL PROBABILITIES AND THE EMPIRICAL RULE
Probabilities for the normal distribution apply approximately to other bell-shapeddistributions. They yield the probabilities for the Empirical Rule. Recall (page 44)that that rule states that for bell-shaped histograms, about 68% of the data fall withinone standard deviation of the mean, 95% within two standard deviations, and all ornearly all within three standard deviations. For example, we've just used software tofind that for normal distributions the probability falling within two standard devia-tions of the mean is 0.954. For one and for three standard deviations, we find centralprobabilities of 0.683 and 0.997, respectively.The approximate percentages in the Empirical Rule are the actual percentagesfor the normal distribution, rounded to two decimal places. The Empirical Rulestated the percentages as being approximate rather than exact. Why? Because thatrule referred to all approximately bell-shaped distributions, not only the normal dis-tribution. Not all bell-shaped distributions are normal, only those described by theformula shown in the footnote on page 73. We won't need that formula, but we willuse probabilities for it throughout the text.
REPEATED SAMPLING INTERPRETATION OF SAMPLING DISTRIBUTIONS
Sampling distributions portray the sampling variability that occurs in collecting dataand using sample statistics to estimate parameters. If different polling organizationseach take their own exit poll and estimate the population proportion voting for theRepublican candidate, they will get different estimates, because the samples havedifferent people. Likewise, Figure 4.10 describes the variability in sample proportionvalues that occurs in selecting a huge number of samples of size n = 1824 and con-structing a histogram of the sample proportions. By contrast, Figure 4.11 describesthe variability for a huge number of samples of size n = 4.A sampling distribution of a statistic for n observations is the relative frequencydistribution for that statistic resulting from repeatedly taking samples of size n, eachtime calculating the statistic value. It's possible to form such a distribution empiri-cally, as in Figure 4.10, by repeated sampling or through simulation. In practice, thisis not necessary. The form of sampling distributions is often known theoretically, asshown in the previous example and in the next section. We can then find probabilitiesabout the value of the sample statistic for one random sample of the given size n.
BIVARIATE PROBABILITY DISTRIBUTIONS: COVARIANCE AND CORRELATION∗
Section 3.5 introduced bivariate descriptive statistics that apply to a pair of variables.An example is the sample correlation. Likewise, bivariate probability distributionsdetermine joint probabilities for pairs of random variables. For example, the bivariatenormal distribution generalizes the bell curve over the real line for a single variabley to a bell-shaped surface in three dimensions over the plane for possible values oftwo variables (x, y). Each variable in a bivariate distribution has a mean and a standard deviation.Denote them by (μx,σx)forx and by (μy,σy)fory. The way that x and y vary togetheris described by their covariance, which is defined to be which represents the average of the cross products about the population means(weighted by their probabilities). If y tends to fall above its mean when x falls aboveits mean, the covariance is positive.Ify tends to fall below its mean when x falls aboveits mean, the covariance is negative.The covariance can be any real number. For interpretation, it is simpler to use - REFER TO PIC where zx= (x − μx)/σxdenotes the z-score for the variable x and zy= (y − μy)/σydenotes the z-score for the variable y. That is, the population correlation equals theaverage cross product of the z-score for x times the z-score for y. It falls between −1and +1. It is positive when positive z-scores for x tend to occur with positive z-scoresfor y and when negative z-scores for x tend to occur with negative z-scores for y.We shall not need to calculate these expectations. We can use software to findsample values, as we showed in Table 3.10 for the correlation.
the normal probability distribution
Some probability distributions are important because they approximate well sampledata in the real world. Some are important because of their uses in statistical infer-ence. This section introduces the normal probability distribution, which is importantfor both reasons The normal distribution is symmetric, bell shaped, and characterized by its mean μ and standard deviation σ. The probability within any particular number of standard deviations of μ is the same for all normal distributions. This probability (rounded off) equals 0.68 within 1 standard deviation, 0.95 within 2 standard deviations, and 0.997 within 3 standard deviations. <- empirical rule Each normal distribution is specified by its mean μ and standard deviationσ . For any real number for μ and any nonnegative number for σ , there is a nor-mal distribution having that mean and standard deviation. Figure 4.3 illustrates this. Essentially the entire distribution falls between μ − 3σ and μ + 3σ .
PARAMETERS DESCRIBE PROBABILITY DISTRIBUTIONS
Some probability distributions have formulas for calculating probabilities. For oth-ers, tables or software provide the probabilities. Section 4.3 shows how to find prob-abilities for the most important probability distribution.Section 3.1 introduced the population distribution of a variable. This is, equiv-alently, the probability distribution of the variable for a subject selected randomlyfrom the population. For example, if 0.12 is the population proportion of adults whobelieve the ideal number of children is 4, then the probability that an adult selectedrandomly from that population believes this is also 0.12. Like a population distribution, a probability distribution has parameters describ-ing center and variability. The mean describes center and the standard deviationdescribes variability. The parameter values are the values these measures would as-sume, in the long run, if the randomized experiment or random sample repeatedlytook observations on the variable y having that probability distribution. For example, suppose we take observations from the distribution in Table 4.1.Over the long run, we expect y = 0 to occur 1% of the time, y = 1 to occur 3% ofthe time, and so forth. In 100 observations, for instance, we expect about one 0, 31s, 60 2s, 23 3s, 12 4s, and one 5. In that case, since the mean equals the total of the observations divided by the samplesize, the mean equals - refer to pic !!!!!! This is also the expected value of y, E(y) = 2.45. The terminology reflects that E(y)represents what we expect for the average value of y in a long series of observations The standard deviation of a probability distribution, denoted by σ , measuresits variability. The more spread out the distribution, the larger the value of σ .TheEmpirical Rule (Section 3.3) helps us to interpret σ . If a probability distribution is bell shaped, about 68% of the probability falls between μ −σ and μ +σ , about 95%falls between μ − 2σ and μ + 2σ , and all or nearly all falls between μ − 3σ and μ + 3σ . The standard deviation is the square root of the variance of the probability dis-tribution. The variance measures the average squared deviation of an observationfrom the mean. That is, it is the expected value of (y − μ)2. In the discrete case, the formula is...
z-scores - example 4.4
Suppose that when you applied to college, youtook a SAT exam, scoring 550. Your friend took the ACT exam, scoring 30. If theSAT has μ = 500 and σ = 100 and the ACT has μ = 18 and σ = 6, then which scoreis relatively better?We cannot compare the test scores of 550 and 30 directly, because they have dif-ferent scales. We convert them to z-scores, analyzing how many standard deviationseach falls from the mean. The SAT score of y = 550 converts to a z-score of TheACTscoreofy = 30 converts to a z-score of (30 − 18)/6 = 2.0.The ACT score of 30 is relatively higher than the SAT score of 650, because 30is 2.0 standard deviations above its mean whereas 550 is only 0.5 standard deviationsabove its mean. The SAT and ACT scores both have approximate normal distribu-tions. From Table A, z = 2.0 has a right-tail probability of 0.0228 and z = 0.5 has aright-tail probability of 0.3085. Of all students taking the ACT, only about 2% scoredhigher than 30, whereas of all students taking the SAT, about 31% scored higher than550. In this relative sense, the ACT score is higher.
EFFECT OF SAMPLE SIZE ON SAMPLING DISTRIBUTION AND PRECISION OF ESTIMATES
The standard error gets smaller as the sample size n gets larger. The reason for this is that the denominator (√n) of the standard error formula σ¯y= σ/√n increases as n increases Figure 4.14 shows the sampling distributions of the sample proportion whenn = 100 and when n = 1824. As n increases, the standard error decreases and thesampling distribution gets narrower. This means that the sample proportion tends tofall closer to the population proportion. It's more likely that the sample proportion closely approximates a population proportion when n = 1824 than when n = 100. This agrees with our intuition that larger samples provide more precise estimates of population characteristics. In summary, error occurs when we estimate μ by¯y, because we sampled only part of the population. This error, which is the sampling error, tends to decrease as the sample size n increases. The standard error is fundamental to inferential procedures that predict the sampling error in using¯y to estimate μ
z-SCORE REPRESENTS THE NUMBER OF STANDARD DEVIATIONS FROM THE MEAN
The z symbol in a normal table refers to the distance between a possible value y ofa variable and the mean μ of its probability distribution, in terms of the number ofstandard deviations that y falls from μ. For example, scores on each portion of the Scholastic Aptitude Test (SAT) havetraditionally been approximately normal with mean μ = 500 and standard deviationσ = 100. The test score of y = 650 has a z-score of z = 1.50, because 650 is 1.50standard deviations above the mean. In other words, y = 650 = μ + zσ = 500 +z(100), where z = 1.50. For sample data, Section 3.4 introduced the z-score as a measure of position.Let's review how to find it. The distance between y and the mean μ equals y−μ.The z-score expresses this difference in units of standard deviations z-score: The z-score for a value y of a variable is the number of standard deviations that y falls from the mean. For a probability distribution with mean μ and standard deviation σ , it equals (refer to pic) Positive z-scores occur when the value for y falls above the mean μ. Negative z-scores occur when the value for y falls below the mean. The next example shows that z-scores provide a useful way to compare positionsfor different normal distributions.
REPRESENTING SAMPLING VARIABILITY BY A SAMPLING DISTRIBUTION
Voter preference is a variable, varying among voters. Likewise, so is the sample pro-portion voting for some candidate a variable: Before the sample is obtained, its valueis unknown, and that value varies from sample to sample. If we could select severalrandom samples of size n = 1824 each, a certain predictable amount of variationwould occur in the sample proportion values. A probability distribution with ap-pearance similar to Figure 4.10 describes the variation that occurs from repeatedlyselecting samples of a certain size n and forming a particular statistic. This distribu-tion is called a sampling distribution. It also provides probabilities of the possiblevalues of the statistic for a single sample of size n sampling distribution: A sampling distribution of a statistic (such as a sample proportion or a sample mean) is the probability distribution that specifies probabilities for the possible values the statistic can take. Each sample statistic has a sampling distribution. There is a sampling distribu-tion of a sample mean, a sampling distribution of a sample proportion, a samplingdistribution of a sample median, and so forth. A sampling distribution is merely atype of probability distribution. Unlike the probability distributions studied so far,a sampling distribution specifies probabilities not for individual observations but forpossible values of a statistic computed from the observations. A sampling distribu-tion allows us to calculate, for example, probabilities about the sample proportionsof individuals in an exit poll who voted for the different candidates. Before the vot-ers are selected for the exit poll, this is a variable. It has a sampling distribution thatdescribes the probabilities of the possible values.The sampling distribution is important in inferential statistics because it helpsus predict how close a statistic falls to the parameter it estimates. From Figure 4.10,for instance, with a sample of size 1824 the probability is apparently close to 1 that asample proportion falls within 0.04 of the population proportion.
sampling distributions describe how statistics vary
We've seen that probability distributions summarize probabilities of possible out-comes for a variable. Let's now look at an example that illustrates the connection between statistical inference and probability calculations 4.5Predicting an Election from an Exit Poll Television networks sample voters on elec-tion day to help them predict the winners early. For the fall 2014 election for Gov-ernor of California, CBS News5reported results of an exit poll of 1824 voters. Theystated that 60.5% of their sample reported voting for the Democratic party candi-date, Jerry Brown. In this example, the probability distribution for a person's votewould state the probability that a randomly selected voter voted for Brown. Thisequals the proportion of the population of voters who voted for him. When the exitpoll was taken, this was an unknown population parameter.To judge whether this is sufficient information to predict the outcome of theelection, the network can ask, "Suppose only half the population voted for Brown.Would it then be surprising that 60.5% of the sampled individuals voted for him?" Ifthis would be very unlikely, the network infers that Brown received more than halfthe population votes and won the election. The inference about the election outcomeis based on finding the probability of the sample result under the supposition that thepopulation parameter, the percentage of voters preferring Brown, equals 50%. About 7.3 million people voted in this race. The exit poll sampled only 1824 vot-ers, yet TV networks used it to predict that Brown would win. How could there pos-sibly have been enough information from this poll to make a prediction? We nextsee justification for making a prediction.
normal distribution
bell-shaped curve, is the most important probability distribution for statistical inference
Figure 4.12 Population vs. Sampling Distribution
ex. 4.7 - p. 86 A result from later in this section says that this sampling distribution is bellshaped. Thus, with probability close to 1.0 the sample proportion falls within threestandard errors of μ, that is, within 3(0.0117) = 0.035 of 0.50, or between about 0.46and 0.54. For a random sample of size 1824 from a population in which 50% voted foreach candidate, it would be surprising if fewer than 46% or more than 54% votedfor one of them. We've now seen how to get this result either using simulation, asshown in Figure 4.10, or using the information about the mean and standard error ofthe sampling distribution.