CH9

¡Supera tus tareas y exámenes ahora con Quizwiz!

"Greater than or Equal to" (≥) in Probability

"At least" - The outcome in parentheses is included in the probability Ex: - P(X ≥ 1) = P(X = 1) + P(X = 2) - The outcome X = 1 is included - In P(X > 1), it would be excluded

Random Events can be Described in a Variety of Ways

1. Probability 2. Risk 3. Odds

Probability Rules

1. The probability P(A) of any event A satisfies 0 ≤ P(A) ≤ 1 2. If S is the sample space in a probability model, then P(S) = 1 3. Two events A & B are DISJOINT (mutually exclusive) if their joint probability P(A & B) = 0 - Meaning that they have no outcomes in common & so can never occur together - If A & B are disjoint, then P(A or B) = P(A) + P(B) - This is the ADDITION RULE FOR DISJOINT EVENTS 4. For any event A, P(A does not occur) = 1 - P(A) - Where the event "A does not occur" is called the complement of event A

Check Your Skills 9.16: You read that in native Hawaiians, the probability of having blood type AB is 1/100 (1 in 100) What does this mean? a. If you pick 100 Hawaiians randomly, the fraction of them having blood type AB will be very close to 1/100 b. If you pick 100 Hawaiians randomly, exactly 1 of them will have blood type AB c. If you pick 10,000 Hawaiians randomly, exactly 100 of them will have blood type AB

A

UNIFORM Density Curve

A curve that has HEIGHT ONE OVER the INTERVAL FROM ZERO TO ONE - The AREA UNDER the curve is ONE - The PROBABILITY OF ANY EVENT is the AREA UNDER the CURVE & ABOVE the EVENT IN QUESTION

Density Curve

A curve that: - Is always ON/ABOVE the HORIZONTAL AXIS - Has AREA EXACTLY ONE UNDERNEATH it • Corresponding to total probability 1 It describes the OVERALL PATTERN OF a DISTRIBUTION - The AREA UNDER the CURVE & ABOVE ANY RANGE OF VALUES ON the HORIZONTAL AXIS is the PROPORTION OF ALL OBSERVATIONS THAT FALL WITHIN THAT RANGE - They do not describe outliers (i.e., deviations from the overall pattern) Of course, no set of real data is exactly described by it - It acts as a model for a CONTINUOUS Distribution - It is an idealized description that is easy to use & accurate enough for practical use Conceptually, it is similar to a regression line: - We use a least-squares regression line to model an observed linear trend & to make predictions about similar individuals in the population

Random Variable (X)

A variable whose value is a NUMERICAL OUTCOME OF A RANDOM PHENOMENON - Its value changes from one random choice to another They are mathematical models for all possible outcomes in a Sample Space (S) 2 types: 1. Discrete 2. Continuous

CONTINUOUS Sample Space

An INTERVAL of outcomes, & PROBABILITIES OVER the INTERVAL are ASSIGNED WITH a MATHEMATICAL FUNCTION - Ex: S = {all numbers between 0 & 1}

DISCRETE Sample Space

An ITEMIZED LIST of outcomes, where EACH OUTCOME HAS an ASSOCIATED PROBABILITY - Ex: S = {O+, O-, A+, A-, B+, B-, AB+, AB-}

Event

An OUTCOME or a SET OF OUTCOMES of a Random Phenomenon - That is, it is a SUBSET OF the SAMPLE SPACE

Ex 9.11: The random number generator will spread its output uniformly (evenly) across the entire interval from 0 to 1 as we allow it to generate a long sequence of numbers - The results of many trials are represented by the UNIFORM DENSITY CURVE shown in Figure 9.4

As Figure 9.4(a) illustrates, the probability that the random number generator produces a number between 0.3 & 0.7 is - P(0.3 ≤ Y ≤ 0.7) = 0.7 - 0.3 = 0.4 - Because the area under the density curve & above the interval from 0.3 to 0.7 is 0.4 The height of the curve is 1 & the area of a rectangle is the product of height & length, so the probability of any interval of outcomes is just the length of the interval Similarly, - P(Y ≤ 0.5) = 0.5 - P(Y > 0.8) = 0.2 - P(Y ≤ 0.5 or Y > 0.8) = 0.5 + 0.2 = 0.7 The last event consists of 2 nonoverlapping intervals, so the total area above the event is found by adding two areas, as illustrated by Figure 9.4(b) - This assignment of probabilities obeys all our rules for probability

Check Your Skills 9.17: A cat is about to have 6 kittens The sample space for counting the number of female kittens she has is... a. S = any number between 0 and 1 b. S = whole numbers 0 to 6 c. S = all sequences of 6 males or females by order of birth, such as FMMFFF

B

Check Your Skills 9.18: Here is the probability model for the blood type of a randomly chosen person of Hispanic ethnicity in the United States, according to the Red Cross: Blood type O A B AB Probability 0.57 0.31 0.10 ? The probability that a randomly chosen American of Hispanic ethnicity has type AB blood... a. Can be any number between 0 and 1 b. Is 0.02 c. Is 0.2

B

Check Your Skills 9.20: A study of freely forming groups in bars throughout Europe examined the number of individuals found in groups who were laughing together - Let X be the number of individuals in laughing groups - Here is the probability model for X: Number of individuals X 2 3 4 5 6 Probability 0.51 0.34 0.10 0.04 0.01 This probability model is... a. Continuous b. Discrete c. Discretely continuous

B

Check Your Skills 9.22: A study of freely forming groups in bars throughout Europe examined the number of individuals found in groups who were laughing together - Let X be the number of individuals in laughing groups - Here is the probability model for X: Number of individuals X 2 3 4 5 6 Probability 0.51 0.34 0.10 0.04 0.01 What is the mean μ of X? a. 0.2 b. 2.7 c. 4.0

B

Check Your Skills 9.23: A study of freely forming groups in bars throughout Europe examined the number of individuals found in groups who were laughing together - Let X be the number of individuals in laughing groups - Here is the probability model for X: Number of individuals X 2 3 4 5 6 Probability 0.51 0.34 0.10 0.04 0.01 What is the standard deviation σ of X? a. 0.77 b. 0.88 c. 1.41

B

Check Your Skills 9.24: The Mental Health Surveillance Study uses trained mental health clinicians to obtain a detailed mental health assessment of a very large random sample of adults - The study found that 17.9% of adults had received a diagnosis of mental illness within the 12 months preceding the interview For adults in the United States, what is the risk of receiving a diagnosis of mental illness in a given year? a. 0.147 b. 0.179 c. 0.218

B

Ex 9.2: The Gallup-Healthways sample survey collects health information about the American population - In February 2016, Gallup took a national random sample of 14,169 adults and found that 1587 (11.2%) of the people in the sample said they were sick with a cold the previous day

Because all adults had the same chance to be among the chosen 14,169, it seems reasonable to use this 11.2% as an estimate of the unknown proportion in the population - It's a fact that 11.2% of the sample were sick with a cold; we know because Gallup asked them - We don't know what percent of all adults in the United States were sick with a cold in February 2016, but we estimate that approximately 11.2% were If we use the sample proportion of 11.2% as an approximate probability that a randomly selected American adult had a cold in February 2016, how do we know that it is a reasonable probability model? - What if Gallup had taken a second random sample of 14,169 adults and asked them the same question? - The new sample would include different people - It is almost certain that there would not be exactly 1587 "sick with a cold" responses in this sample - That is, Gallup's estimate of the proportion of adults who were sick with a cold in February 2016 will vary from sample to sample Could it happen that one random sample finds that 11.2% of adults were sick with a cold and a second random sample finds that 15.3% were? - Random samples eliminate bias from the act of choosing a sample, but they can still be off because of the variability that results when we choose at random - If the variation when we take repeat samples from the same population is too great, we can't trust the results of any one sample - Let's approach this issue by considering what happens for a similar but easier (and cheaper) situation: coin tossing

Ex 9.10: In Ex 9.9 we used a density curve to estimate that the proportion of women in their 40s who are 62 inches or shorter is 31.6% - This is the area under the density curve for heights of 62 inches or less, as shown in Figure 9.3(b) Let's now ask a different question: - What is the probability that a randomly chosen woman in her 40s has a height of 62 inches or less?

Because the selection is random, this probability depends on the relative frequency of women in the population who are 62 inches tall or less, & that relative frequency is 31.6% - Therefore, the probability that a randomly selected woman in her 40s would measure 62 inches or less is 0.316 - This is the area under the density curve for heights 62 inches or less

Check Your Skills 9.19: In a table of random digits such as Table A, each digit is equally likely to be any of 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9 What is the probability that a digit in the table is 7 or greater? a. 7/10 b. 4/10 c. 3/10

C

Check Your Skills 9.21: A study of freely forming groups in bars throughout Europe examined the number of individuals found in groups who were laughing together - Let X be the number of individuals in laughing groups - Here is the probability model for X: Number of individuals X 2 3 4 5 6 Probability 0.51 0.34 0.10 0.04 0.01 What is the probability P(X ≤ 3)? a. 0.34 b. 0.51 c. 0.85

C

Check Your Skills 9.25: The Mental Health Surveillance Study uses trained mental health clinicians to obtain a detailed mental health assessment of a very large random sample of adults - The study found that 17.9% of adults had received a diagnosis of mental illness within the 12 months preceding the interview For adults in the United States, what are the odds of receiving a diagnosis of mental illness in a given year? a. 0.147 b. 0.179 c. 0.218

C

Basis for the Idea of Probability

CHANCE BEHAVIOR is UNPREDICTABLE in the SHORT run but has a REGULAR & PREDICTABLE PATTERN in the LONG run Ex: - Somewhere in the U.S., the next baby is about to be born - Will this baby be a boy or a girl? - We cannot know for sure because the outcome is not predetermined - But there is still a regular pattern that emerges clearly when many births are examined

FREQUENTIST Approach to Defining Probabilities

COMPUTING PROBABILITIES based on WHAT HAPPENS IN the LONG RUN - Because we rely on the RELATIVE FREQUENCY (PROPORTION) OF ONE PARTICULAR OUTCOME AMONG VERY MANY OBSERVATIONS of the Random Phenomenon - This idea ties Probability to actual outcomes This definition of Probability is based on data from many repetitions of the same Random Phenomenon

Ex 9.8: Pure dog breeds are often highly inbred, leading to high numbers of congenital defects - A study examined hearing impairment in 5333 Dalmatians - Call the number of ears impaired (deaf) in a randomly chosen Dalmatian X for short - The researchers found the following probability model for X: X 0 1 2 Probability 0.70 0.22 0.08

Check that the probabilities of the outcomes sum to exactly 1, as they should in a legitimate discrete probability model The probability that a randomly chosen Dalmatian has some hearing impairment is the probability that X is equal to or greater than 1: - P(X ≥ 1) = P(X = 1) + P(X = 2) - 0.22 + 0.08 - 0.30 Almost one-third of Dalmatians are deaf in 1 or both ears - This very high proportion may be explained in part by the fact that breeders cannot detect partial deafness from the dog's behavior - The study suggested giving the dogs a hearing test before considering them for breeding Note that the probability that X is greater than or equal to 1 ("at least 1") is not the same as the probability that X is strictly greater than 1 - The latter probability here is P(X > 1) = P(X = 2) = 0.08 - The outcome X = 1 is included in "greater than or equal to" & is not included in "strictly greater than"

Ex 9.12: The hearing impairment X in Ex 9.8 is a random variable whose possible values are the whole numbers {0, 1, 2} - The distribution of X assigns a probability to each of these outcomes

Compare this with the value Y obtained with the random number generator in Ex 9.11 - The values of Y fill the entire interval of numbers between 0 & 1 - The probability distribution of Y is given by its density curve, shown in Figure 9.4

Ex 9.4: Joe reads an article discussing the Search for Extraterrestrial Intelligence (SETI) project - We ask Joe, "What's the chance that we will find evidence of extraterrestrial intelligence in this century?" - Joe responds, "Oh, about 1%"

Does Joe assign probability 0.01 to humans finding extraterrestrial intelligence this century? - The outcome of the search is certainly unpredictable, but we can't reasonably ask what would happen in many repetitions - This century will happen only once and will differ from all other centuries in many ways, especially in terms of technology If probability measures "what would happen if we did this many times," Joe's 0.01 is NOT A PROBABILITY - The Frequentist Definition of Probability is based on data from many repetitions of the same random phenomenon Joe is giving us something else; his PERSONAL JUDGEMENT - Although Joe's 0.01 isn't a probability in the frequentist sense, it gives useful information about Joe's opinion that can be expressed in the language of probability - Such values are called PERSONAL PROBABILITIES

Disjoint (Mutually Exclusive) Events

Events that CAN NOT OCCUR SIMULTANEOUSLY - Their joint probability P(A & B) = 0 - They have NO OUTCOMES IN COMMON & so can never occur together

The Proportion in a Small/Moderate Number of Tosses can be...

FAR FROM the ACTUAL PROBABILITY

The Probability Model for a CONTINUOUS Random Variable assigns probabilities to INTERVALS OF OUTCOMES rather than to individual outcomes

In fact, all continuous probability models assign probability 0 to any individual outcome - Only intervals of values have positive probability To see that this is true, consider a specific outcome such as P(Y = 0.8) in Ex 9.11 - The PROBABILITY OF ANY INTERVAL is the same as ITS LENGTH - The point 0.8 has no length, so its probability is 0

Ex 9.6: Suppose that we want to choose a number at random between 0 & 1 - Applet & software random number generators readily perform this task - We might, for example, get the number 0.289351, or the number 0.5816462947

In fact, the sample space is an entire interval of numbers: - S = {all numbers between 0 & 1} Let's call the outcome of the random number generator Y for short How can we assign probabilities to such events as {Y = 1/3} or {0.3 ≤ Y ≤ 0.7}? - We cannot assign probabilities to each individual value of Y & then add them, because there are infinitely many possible values - Instead, we will need mathematical functions to assign probabilities over the entire Sample Space

Ex 9.1: A baby's sex depends on a pair of chromosomes, XX for girls & XY for boys, with each chromosome being inherited from one parent - Mendel's theory of genetic inheritance predicts that the XX & XY pairs should be roughly equally represented in the entire population of newborns (ova carry an X chromosome but spermatozoids carry either an X or a Y chromosome) - Based on this scientific theory, we assign an equal chance that the next newborn will be a girl (probability 0.5) or a boy (probability 0.5)

In practice, other factors may affect the success rate of gametes & the relative survival of male & female embryos until birth Here are some birth data for the United States in the past decades: 1990 1996 2002 2008 2014 # births 4,158,212 3,891,494 4,021,726 4,247,694 3,988,076 Proport. Males 0.5121 0.5115 0.5117 0.5117 0.5117 These values are reported by the U.S. National Center for Health Statistics "based on 100% of the birth certificates registered in all states and DC" - From these reports, we can assign a slightly different chance that the next newborn will be a girl (probability 0.488) or a boy (probability 0.512)

Sample Space (S)

In terms of a Random Phenomenon, it is the SET OF ALL POSSIBLE OUTCOMES - It can be a simple list or a more comprehensive description Ex: - When 1 baby is born, there are only 2 outcomes: male & female - This = {M, F}

Probability Distribution

In terms of a Random Variable (X), it tells us WHAT VALUES X CAN TAKE & HOW TO ASSIGN PROBABILITIES TO THOSE VALUES

Probability

In terms of any outcome of a Random Phenomenon, the PROPORTION OF TIMES the OUTCOME WOULD OCCUR in a VERY LONG SERIES OF REPETITIONS

PERSONAL Probability

Involves a NUMBER BETWEEN ZERO & ONE that EXPRESSES an INDIVIDUAL'S JUDGEMENT OF HOW LIKELY the OUTCOME IS - When the Probability is subjective & represents your personal degree of belief They vary from person to person - However, they are not necessarily arbitrary and flimsy Ex: - Joe may know nothing about the search for extraterrestrial life beyond what he learned during a quick perusal of the article, or he might be an expert whose opinion is based on conclusions from numerous rigorous scientific analyses - When scientists write, "It is very likely that the 21st-century mean rate of global mean sea level rise (...) will exceed that of 1971-2010," their stated opinion is grounded on extensive data collection and analysis, comprehensive computer simulations, and the consideration of various future global economic scenarios

Risk

It corresponds to the PROBABILITY OF an UNDESIRABLE EVENT such as death, disease, or side effects - That of a given adverse event, like its probability, is defined by the frequency of that adverse event in a population or sample of interest

Computing the Mean (μ) & the Standard Deviation (σ) of a Discrete Random Variable (X)

Let X be a Discrete Random Variable with k elements, x1, x2, ..., xk, in its Sample Space (S) The MEAN of X is: - μ = ∑(xi)[P(X = xi)] The VARIANCE of X is: - σ² = ∑(xi - μ)²[P(X = xi)] The STANDARD DEVIATION of X is the square root of its Variance: - σ = √σ² - A measure of dispersion away from the Mean that takes into account both how far each value of X is from the Mean (μ) & how likely each value is - This of it as roughly the expected deviation from the Mean in the long run

Probability Model

MATHEMATICAL DESCRIPTION OF a RANDOM PHENOMENON consisting of 2 parts: 1. SAMPLE SPACE (S) 2. A way of ASSIGNING PROBABILITIES TO EVENTS

Probability Models can be used to...

Make PREDICTIONS ABOUT EVENTS WITH UNCERTAIN OUTCOMES Whether we have a scientifically founded theory or empirical evidence for the entire population, we can use the long-term pattern to create a Probability Model to describe & quantify the uncertainty in Random Phenomena (e.g., a newborn's sex) - Some models may be more precise than others, but they are still models: that is, they are mathematical representations of underlying patterns We may not always have a scientific theory or a census of an entire population to generate a sound Probability Model - However, Random Samples & Randomized Experiments can be used to infer the properties of the Populations from which they were drawn

Ex 9.9: Figure 9.2 is a histogram of the heights of women 40 to 49 years of age in a large sample survey Overall, the distribution of heights is quite regular - The histogram is symmetric, & both tails fall off smoothly from a single center peak - There are no large gaps or obvious outliers - The smooth curve drawn over the histogram is a good description of the overall pattern of the data Our eyes respond to the areas of the bars in a histogram; these bar areas represent the percents of the observations - Figure 9.3(a) is a copy of Figure 9.2 with the leftmost bars shaded - The area of the shaded bars represents the women 62 inches tall or less - They make up 31.9% of all women in the sample; this is the cumulative percent, the sum of all bars for heights of 62 inches or less

Now look at the curve drawn through the bars In Figure 9.3(b), the area under the curve to the left of 62 inches is shaded - The smooth curve we use to model the histogram distribution is chosen with the specific constraint that the total area under the curve is exactly 1 - The total area represents 100%; that is, all the observations - We can then interpret areas under the curve as proportions of the observations The curve is now a density curve - The shaded area under the density curve in Figure 9.3(b) represents the proportion of women in their 40s who are 62 inches or shorter - This area is 31.6%, less than half a percentage point away from the actual 31.9% - Areas under the density curve give quite good approximations to the actual distribution of the sampled women

CONTINUOUS Probability Model

One in which the Sample Space contains an INFINITE NUMBER OF POSSIBLE OUTCOMES We cannot assign probabilities to each individual value in the sample space - But we can use a mathematical function to assign probabilities for any range of values within the sample space - These mathematical functions are called DENSITY CURVES It assigns probabilities as AREAS UNDER a DENSITY CURVE - The AREA UNDER the CURVE & ABOVE ANY RANGE OF VALUES ON the HORIZONTAL AXIS is the PROBABILITY OF an OUTCOME OCCURRING IN THAT RANGE

DISCRETE Probability Model

One with a SAMPLE SPACE made up of a LIST OF INDIVIDUAL OUTCOMES To assign probabilities: - List the probabilities of all the individual outcomes - These probabilities must be numbers between 0 & 1; they must have sum 1 - The probability of any event is the sum of the probabilities of the outcomes making up the event

Addition Rule for Disjoint Events

P(A or B) = P(A) + P(B) It extends to more than 2 events that are disjoint in the sense that no 2 have any outcomes in common If events A, B, & C are disjoint, the probability that 1 of these events occurs is: - P(A) + P(B) + P(C)

Chance =

Random Selection - Relied on to select Samples without built-in biases so they can be used to infer the properties of a target Population

CONTINUOUS Random Variables

Random variables that can take on ANY VALUE IN AN INTERVAL, with PROBABILITIES given as AREAS UNDER a DENSITY CURVE

DISCRETE Random Variables

Random variables that have a COUNTABLE (typically FINITE) list of possible outcomes

Random

Refers to a Phenomenon for which INDIVIDUAL OUTCOMES are UNCERTAIN but there is nonetheless a REGULAR DISTRIBUTION OF OUTCOMES in a LARGE NUMBER OF REPETITIONS

Ex 9.15: Sickle-cell anemia is a serious, inherited blood disease affecting the shape of red blood cells - Individuals with two copies of the gene causing the defect suffer pain from blocked arteries and can have their life shortened from organ damage - Individuals carrying only one copy of the defective gene ("sickle-cell trait") are generally healthy but may pass the gene to their offspring - An estimated two million Americans carry the sickle-cell trait If a couple learns from blood tests that they both carry the sickle-cell trait, the genetic laws of inheritance tell us that there is a 25% chance that they could conceive a child suffering from sickle-cell anemia

That is, the risk of conceiving a child who will have sickle-cell anemia is 0.25, or 25% The odds of this outcome are: - odds = 0.25/(1 - 0.25) = 0.25/0.75 = 0.333, or 1:3 In this example, the risk and the odds of conceiving a child who has sickle-cell anemia have quite different numerical values: - 0.25 and 0.33, respectively Always make sure that you understand how risk and odds relate to probability when you read reports about these concepts

Ex 9.13: Ex 9.8 described the distribution of the random variable X representing the number of impaired ears (deaf) in a randomly chosen Dalmatian - Researchers found the following probability model for X: X 0 1 2 Probability 0.70 0.22 0.08

The Mean of X is: - μ = ∑(xi)[P(X = xi)] - (0 x 0.70) + (1 x 0.22) + (2 x 0.08) - 0 + 0.22 + 0.16 - 0.38 The Variance of X is: - σ² = ∑(xi - μ)²[P(X = xi)] - [(0 - 0.38)² x 0.70] + [(1 - 0.38)² x 0.22] + [(2 - 0.38)² x 0.08] - [(-0.38)² x 0.70] + [0.62² x 0.22] + [1.62² x 0.08] - [0.1444 x 0.70] + [0.3844 x 0.22] + [2.6244 x 0.08] - 0.10108 + 0.08457 + 0.20995 - 0.3956 The Standard Deviation of X is: - σ = √σ² - √(0.3956) - 0.62897 In the long run, in very large sets of randomly selected Dalmatian dogs, we expect to find an average of 0.38 impaired ears per dog - The expected deviation from this value is, roughly speaking, 0.63 impaired ears

Compliment of Event A

The PROBABILITY that EVENT A WILL NOT OCCUR - P(A does not occur) = 1 - P(A)

Expected Value

The VALUE WE EXPECT TO GET, on average, OVER MANY REPETITIONS OF the RANDOM EVENT - The Mean (μ) of a Random Variable X - E(X) = μ

Ex 9.7: We already used the Addition Rule, without calling it by that name, to find the probabilities in Ex\ 9.5

The event "Rh+" contains 4 disjoint outcomes displayed in the sample space for blood type, so the addition rule (Rule 3) says that its probability is: - P(Rh+) = P(O+) + P(A+) + P(B+) + P(AB+) - 0.39 + 0.27 + 0.25 + 0.07 - 0.98 Check that the two probabilities in Ex 9.5, found using the addition rule, are all between 0 & 1 & add to exactly 1 - That is, the probability model for the Rhesus factor obeys Rules 1 & 2 What is the probability that an Asian American's blood type would not have a positive Rhesus? - By Rule 4, P(Rhesus is not positive) = 1 - P(Rhesus is positive) - 1 - 0.98 - 0.02

Probability with LARGE Samples

The expected deviation between the sample proportion and the true proportion in the population is smaller when we rely on larger samples. For now, we simply focus on the fact that - The outcome of a very large random sample or randomized experiment can be used to compute approximate probabilities Computing probabilities based on what happens in the long run is called a frequentist approach to defining probabilities, because we rely on the relative frequency (proportion) of one particular outcome among very many observations of the random phenomenon. This idea ties probability to actual outcomes. Yet we often encounter another, quite different, idea of probability

Ex 9.14: Patients immobilized for a substantial amount of time can develop deep vein thrombosis (DVT), a blood clot in a leg or pelvis vein - DVT can have serious adverse health effects and can be difficult to diagnose - On its website, the drug manufacturer Pfizer reports the outcome of a study looking at the effectiveness of the drug Fragmin (dalteparin) in preventing DVT in immobilized patients - Of the 1518 randomly chosen immobilized patients given Fragmin, 42 experienced a complication from DVT (the remaining 1476 patients did not)

The proportion of patients experiencing DVT complications is 42/1518 = 0.0277, or 2.77% We can use this information to compute the risk & odds of experiencing DVT complications for immobilized patients treated with Fragmin: - risk = 0.0277, or 2.77% - odds = 0.0277/(1 - 0.0277) = 42/1476 = 0.0285 The odds of experiencing DVT complications among immobilized patients given Fragmin are 42:1476, or about 1:35 - That is, for every such patient experiencing a DVT complication, approximately 35 patients do not experience a DVT complication

Ex 9.3: When you toss a coin, there are only 2 possible outcomes, heads or tails Figure 9.1 shows the results of simulating a coin toss 5000 times in a row on 2 separate occasions - After each toss from 1 to 5000, the proportion of all tosses so far that gave a head is recomputed and plotted on the vertical axis - (Notice that the horizontal axis is shown on a logarithmic scale to help us better see the initial tosses) - Trial A (solid line) begins tail, head, tail, tail - You can see that the proportion of heads for Trial A starts at 0 on the first toss, rises to 0.5 when the second toss gives a head, then falls to 0.33 and 0.25 as we get two more tails - Trial B (dotted line) starts with five straight heads, so the proportion of heads is 1 until the sixth toss

The proportion of tosses that produce heads is quite variable for a relatively small number of tosses - Trial A starts low and Trial B starts high - As we make more tosses, however, the proportion of heads for both trials gets close to 0.5 and stays there If we conducted a third trial in which we tossed the coin a great many times, the proportion of heads would again settle down to 0.5 in the long run - This is the intuitive idea of probability Probability 0.5 means "occurs half the time in a very large number of trials" - The probability 0.5 appears as a horizontal line on the graph

Odds

The ratio of the probability of that outcome over the probability of that outcome not occurring - A RATIO OF TWO PROBABILITIES, where the NUMERATOR represents the PROBABILITY OF AN EVENT OCCURRING & the DENOMINATOR represents the COMPLEMENTARY PROBABILITY OF THAT EVENT NOT OCCURRING They can take any positive value, including values greater than 1 - That of an event can be expressed as the numerical value of the ratio or as a ratio of two integers with no common denominator

Using Density Curves to Assign Probabilities over Continuous Intervals

There is a direct relationship between the representation (or proportion) of a given type of individual in a population & the probability that one individual randomly selected from the population will be of that given type Just as with discrete probabilities, we define probabilities over continuous intervals by the relative frequency of relevant individuals in the population - In this case, all individuals from the population that belong to the desired interval

Important Facts that Must be True for Any Assignment of Probabilities:

They follow from the idea of probability as "the long-run proportion of repetitions on which an event occurs" 1. Any probability is a NUMBER BETWEEN ZERO & ONE, inclusively - Any proportion is a number between 0 & 1, so any probability is also a number between 0 & 1 - An event with probability 0 never occurs, & an event with probability 1 occurs on every trial - An event with a probability of 0.5 occurs in half the trials in the long run 2. ALL POSSIBLE OUTCOMES of a sample space TOGETHER must have a PROBABILITY OF ONE - Because some outcome must occur on every trial, the sum of the probabilities for all possible outcomes of a random phenomenon must be exactly 1 3. When TWO EVENTS have NO OUTCOMES IN COMMON, they CAN NEVER HAPPEN TOGETHER, which means that THEIR JOINT PROBABILITY IS ZERO - Then, the probability that one or the other occurs is the sum of their individual probabilities - If one event occurs in 40% of all trials, a different event occurs in 25% of all trials, & the two can never occur together, then 1 or the other occurs in 65% of all trials because 40% + 25% = 65% 3. The PROBABILITY that an EVENT DOES NOT OCCUR is ONE MINUS the PROBABILITY that the EVENT DOES OCCUR - If an event occurs in (say) 70% of all trials, it fails to occur in the other 30% - The probability that an event occurs & the probability that it does not occur always add to 100%, or 1

Ex 9.5: Your blood type greatly impacts the kind of blood transfusion or organ transplant you can safely get 8 different blood types occur as a result of the presence or absence of certain molecules on the surface of red blood cells - A person's blood type is given as a combination of a group (O, A, B, or AB) & a Rhesus factor (+ or -)

They make up the Sample Space: - S = {O+, O-, A+, A-, B+, B-, AB+, AB-} How can we assign probabilities to this Sample Space? First, the frequencies of these eight blood types differ in different ethnic groups - Within a given ethnic group, we can use the blood types' frequencies in that group to assign their respective probabilities - The American Red Cross reports that, among Asian Americans, 39% have blood type O+, 1% O−, 27% A+, 0.5% A−, 25% B+, 0.4% B−, 7% AB+, & 0.1% AB− - Because 39% of all Asian Americans have blood type O+, the probability that a randomly chosen Asian American has blood type O+ is 39%, or 0.39 Thus we can use these frequencies to construct the complete Probability Model for blood types among Asian Americans: Blood type O+ O− A+ A− B+ B− AB+ AB− Probability 0.39 0.01 0.27 0.005 0.25 0.004 0.07 0.001 What if we were interested only in the person's Rhesus factor? - For any randomly selected Asian American, the Rhesus factor can be only positive or negative Therefore, the sample space for this new question is: - S = {Rh+, Rh-} Based on the known proportions of Rh+ & Rh− in the Asian American population, the probability model for Rhesus factor is: Rh factor Rh+ Rh- Probability 0.98 0.02

Apply Your Knowledge 9.1: Hemophilia refers to a group of rare hereditary disorders of blood coagulation - Because the disorder is caused by defective genes on the X chromosome, hemophilia affects primarily men According to the Centers for Disease Control and Prevention (CDC), the prevalence of hemophilia (the number in the population with hemophilia at any given time) among American males is 13 in 100,000 Explain carefully what this means - In particular, explain why it does not mean that if you obtain the medical records of 100,000 males, exactly 13 will be diagnosed with hemophilia

This means that if you repeatedly sampled 100,000 males, on average you would find 13 who were diagnosed with hemophilia - For one random sample of 100,000 males, you could find more or you could find fewer than 13 individuals with hemophilia

"Or" in P(A or B)

This term indicates that AT LEAST ONE EVENT MUST BE SATISFIED/TRUE - Either A ALONE, or B ALONE, or BOTH A & B TOGETHER

"And" in P(A & B)

This term indicates that BOTH EVENTS MUST BE SATISFIED/TRUE AT THE SAME TIME

When EVENTS are NOT DISJOINT & therefore CAN HAPPEN TOGETHER AT THE SAME TIME...

You MUST take care NOT to COUNT the PROBABILITY OF SOME EVENTS MORE THAN ONCE

Exercise 9.41 A 2015 Pew Research Center study examined the use of 5 social media sites (Facebook, Twitter, Instagram, Pinterest, LinkedIn) among American adult internet users - Let X represent the number of these social media sites used by any randomly selected American adult internet user - Based on the study findings, we may use the following probability distribution for X: X 0 1 2 3 4 5 P(X) 0.20 0.28 0.24 0.16 0.08 0.04 a. Obtain the probability that an American online adult uses at most one of these sites - Obtain the probability P(2 ≤ X < 4) b. Obtain the mean μ of X and interpret this value in context c. Obtain the standard deviation σ of X - Give an approximate interpretation for this value

a) At most one site: 0.48 - P(2 ≤ X < 4) = 0.40 b. μ = 1.76 - The mean number of social media used (of the main 5) we'd expect to find among very many American adult internet users c. σ = 1.36 - Which can be thought of, roughly speaking, as the expected deviation from the mean in this population

Exercise 9.43: Many random number generators allow users to specify the range of the random numbers to be produced - Suppose that you specify that the random number Y can take any value between 0 and 2 - Then the density curve of the outcomes has constant height between 0 and 2, and height 0 elsewhere a. Is the random variable Y discrete or continuous? - Why? b. What is the height of the density curve between 0 and 2? - Draw a graph of the density curve c. Use your graph from part b and the fact that probability is area under the curve to find

a) Y is continuous, because it can take any value between 0 and 2 b. Height = 0.5 c. P(Y ≤ 1) = 0.5

Apply Your Knowledge 9.3: Probability is a measure of how likely an event is to occur - Match the probabilities that follow with each statement of likelihood given - (The probability value is usually a more exact measure of likelihood than is the verbal statement) 0 0.01 0.6 0.99 1 a. This event is impossible - It can never occur b. This event is certain - It will occur on every trial c. This event is very unlikely, but it will occur once in a while in a long sequence of trials d. This event will occur more often than not

a. 0 b. 1 c. 0.01 d. 0.6

Exercise 9.31: The CDC provides the breakdown of the sources of infection leading to hepatitis C in Americans - Here are the probabilities of each infection source for a randomly chosen individual with hepatitis C: Source of infection Probability Intravenous drug use 0.60 Unprotected sex 0.15 Transfusion (before screening) 0.10 Unknown or other 0.11 Occupational ? a. What is the probability that a person with hepatitis C was infected in the course of his or her professional occupation? b. What is the probability that a person with hepatitis C was infected through a known risky behavior (intravenous drug use or unprotected sex)?

a. 0.04 b. 0.75

Exercise 9.35: A large governmental survey of dental care among children 2 to 17 years old found the following distribution of times since the last visit to the dentist: Time since last visit Probability 6 months or less 0.57 More than 6 months but no more than 1 year 0.18 More than 1 year but no more than 2 years 0.08 More than 2 years but no more than 5 years 0.03 More than 5 years ? a. What is the probability that a child has not seen a dentist in more than 5 years? b. What is the probability that a randomly chosen child has not seen a dentist within the last 6 months? c. What is the probability that the child has seen a dentist within the last year?

a. 0.14 b. 0.43 c. 0.75

Apply Your Knowledge 9.7: Rabies is a viral disease of mammals transmitted through the bite of a rabid animal - The virus infects the central nervous system, causing encephalopathy & ultimately death - The Florida Department of Health reports the distribution of documented cases of rabies for all of 2016: Species Raccoon Bat Fox Other Probability 0.53 0.22 0.10 ? a. What probability should replace "?" in the distribution? b. What is the probability that a reported case of rabies is not a raccoon? c. What is the probability that a reported case of rabies is either a bat or a fox?

a. 0.15 b. 0.47 c. 0.32

Exercise 9.27: Manatees are an endangered species of herbivorous, aquatic mammals found primarily in the rivers and estuaries of Florida - As part of its conservation efforts, the Florida Fish and Wildlife Commission records the cause of death for every recovered manatee carcass in its waters - Of the 392 recorded manatee deaths in 2012, 68 were perinatal and 81 were caused by collision with a watercraft a. What is the probability that the death of a randomly selected manatee was due to collision with a watercraft? b. What is the probability that the death was not due to collision with a watercraft? c. What is the probability that the cause of death was due to perinatal problems or collision with a watercraft? - What is the probability that it was due to some other cause?

a. 0.2066 b. 0.7934 c. 0.3801 & 0.6199, respectively

Apply Your Knowledge 9.11: Let X be a random number between 0 & 1 produced by the random number generator described in Ex 9.11 & Figure 9.4 - Find the following probabilities: a. P(X ≤ 0.4) b. P(X < 0.4) c. P(0.3 ≤ X ≤ 0.5) d. P(X < 0.3 or X > 0.5)

a. 0.4 b. 0.4 c. P(0.3 ≤ X ≤ 0.5) = P(X ≤ 0.5) - P(X ≥ 0.3) - 0.5 - 0.3 - 0.2 d. P(X < 0.3 or X > 0.5) = P(X < 0.3) + P(X > 0.5) - 0.3 + 05 - 0.8

Apply Your Knowledge 9.9: A survey by Gallup asked a random sample of American adults about their soda consumption - Let's call X the number of glasses of soda consumed on a typical day - Gallup found the following probability model for X: X 0 1 2 3 4+ Probability 0.52 0.28 0.09 0.04 0.07 Consider the events: - A = {number of glasses of soda is 1 or greater} - B = {number of glasses of soda is 2 or less} a. What outcomes make up the event A? - What is P(A)? b. What outcomes make up the event B? - What is P(B)? What outcomes make up the event "A or B"? - What is P(A or B)? - Why is this probability not equal to P(A) + P(B)?

a. P(X ≥ 1) = P(X = 1) + P(X = 2) + P(X = 3) + P(X ≥ 4) - 0.28 + 0.09 + 0.04 + 0.07 - 0.48 b. P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2) - 0.52 + 0.28 + 0.09 - 0.89 c. 1.00 - Because this probability contains all the outcomes in the interval range

Apply Your Knowledge 9.15: Human papillomavirus (HPV) infection is the most common sexually transmitted infection - Certain types of HPV can cause genital warts in both men and women and cervical cancer in women - The U.S. National Health and Nutrition Examination Survey (NHANES) contacted a representative sample of 1921 women between the ages of 14 and 59 years and asked them to provide a self-collected vaginal swab specimen - Of these specimens, 515 tested positive for HPV, indicating a current HPV infection a. Give the probability, risk, and odds that a randomly selected American woman between the ages of 14 and 59 years has a current HPV infection b. The survey broke down the data by age group: Age group (years) 14-19 20-24 25-29 30-39 40-49 50-59 Percent HPV positive 24.5 44.8 27.4 27.5 25.2 19.6 Give the risk and the odds of being HPV positive for women in each age group - Which age group has the highest risk, and has the highest odds, of testing positive for HPV?

a. Probability = 0.268 - risk = 0.268 - odds = 0.366 b. By age group of increasing age: - risk: 0.245, 0.448, 0.274, 0.275, 0.252, & 0.196, respectively - odds: 0.325, 0.812, 0.377, 0.379, 0.337, & 0.244 - Risk & odds are greatest among 20- to 24-year-olds

Exercise 9.47: The National Cancer Institute (NCI) compiles U.S. epidemiology data for a number of different cancers (through the Surveillance, Epidemiology, and End Results Program [SEER]) - Among women, breast cancer is one of the most common and deadliest types of cancer a. Based on SEER data, the NCI estimates that a woman in her 30s has a 0.43% chance of being diagnosed with breast cancer - Compute the risk and the odds of being diagnosed with breast cancer for a randomly chosen woman in her 30s b. The risk of breast cancer increases with age. The NCI estimates that a woman in her 60s has a 3.65% chance of being diagnosed with breast cancer - Compute the risk and the odds of being diagnosed with breast cancer for a randomly chosen woman in her 60s c. The NCI also reports a lifetime risk of breast cancer—that is, the probability that a woman born in the United States today will develop breast cancer at some time in her life, based on current rates of breast cancer - The most recent estimate of lifetime risk is 12.7% - Give the probability, risk, and odds of developing breast cancer over a lifetime for a randomly chosen woman born in the United States today d. Compare the lifetime values you got in part c with the values by decade you found in parts a and b - Which way of communicating risk would you say is most informative (lifetime or by decade)?

a. Risk = 0.0043 - odds = 0.00432 b. Risk = 0.0365 - odds = 0.0379 c. Probability = risk = 0.127 - odds = 0.145 d. By decade, because risk varies over a lifetime

Apply Your Knowledge 9.5: A couple wants to have 3 children - Assume that the probabilities of a newborn being male or being female are the same & that the gender of 1 child does not influence the gender of another child a. There are 8 possible arrangements of girls & boys - What is the sample space for having 3 children (gender of the first, second, & third child)? - All eight arrangements are (approximately) equally likely b. The future parents are wondering how many boys they might get if they have 3 children - Give a probability model (sample space & probabilities of outcomes) for the number of boys - Follow the method of Ex 9.5

a. S = {BBB, BBG, BGB, BGG, GBB, GBG, GGB, GGG} b. S = {0, 1, 2, 3} - Associated probabilities = 1/8, 3/8, 3/8, 1/8

Exercise 9.29: Government data assign a single cause for each death that occurs in the United States - The data show that, among persons aged 15 to 24 years, the probability is 0.41 that a randomly chosen death was an accident, 0.16 that it was a homicide, and 0.15 that it was a suicide. a. What is the sample space for the probability model of major causes of death in this age group based on the information you are given? b. What is the probability that a death was either an accident, a homicide, or a suicide? - What is the probability that the death was due to some other cause?

a. S = {accident, homicide, suicide, other} b. 0.72 & 0.28, respectively

Exercise 9.37: To track epidemics, the CDC requires physicians to report all cases of important transmissible diseases - In 2014, for instance, a total of 350,062 cases of the sexually transmitted disease gonorrhea were officially reported - Here is how they break down by the age group of the diagnosed patient Age group (years) Percent of gonorrhea cases 0−14 1 15−19 19 20−24 33 25−29 20 30−39 17 40 or older 10 a. Explain why this is a proper probability model - Is the sample space discrete or continuous? b. What is the probability that a randomly selected gonorrhea case that year was a patient in his or her 20s (age 20 to 29)?

a. The groups make up all possible age outcomes and the sum of probabilities is 100% - Discrete b. 53%

Apply Your Knowledge 9.13: The 2011 National Youth Risk Behavior Survey provides insight on the physical activity of high school students in the United States - More than 15,000 high schoolers were asked, "During the past 7 days, on how many days were you physically active for a total of at least 60 minutes per day?" - Physical activity was defined as any activity that increased heart rate - Let X represent the response - The survey results give the following probability model for X: Days 0 1 2 3 4 5 6 7 Probability 0.15 0.08 0.10 0.11 0.10 0.12 0.07 0.27 a. Verify that this is a legitimate discrete probability model b. Describe the event X < 7 in words - What is P(X < 7)? c. Express the event "physically active on at least one day" in terms of X - What is the probability of this event? d. Obtain the mean μ & standard deviation σ of X

a. The sum of probabilities is 1 b. The student was physically active on fewer than 7 days in the past week - P(X < 7) = 0.73 c. P(X > 0) = P(X ≥ 1) = 0.85 d. μ = 3.92 - σ = 2.54

Exercise 9.33: Insulin injections and oral medication are two key options in diabetes treatment - The 2014 National Diabetes Statistics Report provides the breakdown of diabetes treatments among American adults with diagnosed diabetes, as shown in the following table: Diabetes treatment Percent Insulin only 14.0 Both insulin and oral medication 14.7 Oral medication only 56.9 Neither insulin nor oral medication 14.4 a. Explain why this is a proper probability model - Is the sample space discrete or continuous? b. What is the probability that a randomly selected American adult diagnosed with diabetes is treated with insulin, either alone or combined with oral medication? c. What is the probability that a randomly selected American adult diagnosed with diabetes is not treated with insulin at all?

a. The sum of probabilities is 1 - Discrete b. 0.287 c. 0.713

Exercise 9.39: The study in the Exercise 9.38 also gave the distribution of eye color X among male hawks: Eye color X Yellow Light orange Orange Dark orange Red 1 2 3 4 5 Probability 0.00 0.11 0.40 0.32 0.17 a. Write the event "eye color less than 2 or more than 4" in terms of X - What is the probability of this event? b. Describe the event 2 < X ≤ 4 in words - What is its probability?

a. X < 2 or X > 4 - P(X < 2 or X > 4) = 0.17 b. "Eye color greater than 2 and less than or equal to 4" - P(2 < X ≤ 4) = 0.72

When Only the Odds are Provided, Compute the Risk (p) Using this Formula:

p = odds/(1 + odds) In the sickle-cell anemia example, the odds were 1/3 We can find the risk from this information as: - p = odds/(1 + odds) - (1/3)/[1 + (1/3)] = - (1/3)/(4/3) - (1/3) x (3/4) - 1/4 - 0.25

If an outcome A has probability p of occurring, then...

risk(A) = p odds(A) = p/(1 - p) In general, when the undesirable event is not very frequent, risk & odds give similar numerical values (because the odds denominator, 1 - p, is close to 1) - In other situations, risk & odds can be very different


Conjuntos de estudio relacionados

11.1 - 11.7 Security Assessments

View Set

Theory and Practice of Counseling and Psychotherapy - Midterm

View Set

Chapter 24 Fluid, Electrolyte, and Acid-Base Balance

View Set

Blood Vessels and Circulation Anatomy II Exam 1

View Set

Technical Support Fundamentals - Google Coursera Course - IT Support Professional Certification part 2

View Set