Statistics Chapter 4: Probability
CLASS PROBLEM: Toss a coin, if it lands heads, roll a die once. If it lands tails, flip the coin one more time. What is the sample space, and what is the size of the sample space?
S={(H,1), (H,2),...(H,6), (T,H), (T,T)} lsl=8
Example: the american vet ass. claims that the annual cost of medical care for dogs averages $100 with a standard deviation of 30$, and the annual cost of medical care for cats averages $130 with a standard deviation of $35 a) what's the expected difference in cost between cats and dogs? b) what's the standard deviation of the difference between cats and dogs? c) if the differences in costs is normally distributed, what's the probability that the medical expenses for a woman's dog is greater than that for her ca?
a) E(C-D)=E(C)-E(D)=120-100=$20 b) V(C-D)=V(C)+V(D)=1225+900=2125 > O c-d=$46.1 c) we are told the difference is normal, and we already found the center and spread. Difference N(20,46.1) P(difference<0)=P(Z<(0-20/46.1)=P(Z<-.4338)=.3322
Random
a phenomenon is random if any individual outcome is unpredictable, but the distribution of outcomes over many repetitions is known example: toss a coin. no flip is predictable, but many flips will result in approximately half heads and half tails -remember that random does not mean that each outcome is equally likely, it only means that a particular outcome cannot be predicted with certainty
Caution
a random variable does not share the same properties as an algebraic variable -for an algebraic variable X: X+X+X=3X -for a random variable, each X may turn out differently, so X+X+X doesnotequal 3X -this distinction matter when calculating variance. -X+X+X should really be written X1+X2+X3
at a hospital, the probability of a patient having surgery is 12%, and obstetric treatment 16% and the probability of both is 2%. What is the probability that a patient will have neither treatment?
.74
Rules of Thumb (1)
1) "and" means multiply when the events are independent -toss a coin three times. what is the probability of all three being tails? -that is, tails first AND tails second AND tails third -since coin flips are independent we multiple, .5x.5x.5=.125
record the number of people that walk into a post office each day. a) what is the sample space? b) How do you think the outcomes will be distributed (what shape)
a) S={0,1,2,3,....) lsl= infinity b) skewed-right
CLASS PROBLEM: The probability of encountering heavy traffic on a Monday is 0.8, and the probability of encountering heavy traffic on a Tuesday is 0.6 1. someone claims the probability of heavy traffic occurring both days is .3, why is this impossible? 2. the person retracts their claim, but insists that Monday and Tuesday are independent of each other. What is the probability of encountering heavy traffic on Monday or Tuesday? 3. What is the probability of encountering heavy traffic at least one Tuesday in a Month? (successive Tuesdays are independent)
1. 0.8+0.6-0.3=1.1 2. P(M or T)= P(M)+P(T)-P(M and T)= 0.8+0.6-(0.8)(0.6)=0.92 3. P(equal to or greater than 1)=1-P(none)=1-(0.4)^4=0.9744
Rules of Thumb (2)
2) "or" means add when the events are disjoint -roll two dice. What is the probability that the sum of the faces is 5 or 11? -since the sum cannot be 5 and 11 at the same time, these are disjoint outcomes, so we add: P(sum=5)+P(sum=11)=4/36+2/36=1/6
Rules of Thumb (3)
3) for any probability question, first decide whether it is easier to calculate it directly, or easier to calculate the opposite and subtract from 1. -a coin is tossed 7 times, what is the probability of tails occurring at least once? -easier to answer the opposite: P(tails at least once)=1-P(no tails) -"no tails" means "heads first AND heads second AND..." P(no tails)=.5 x .5 x .5 x .5 x .5 x .5 x .5= .0078 P(at least once)=1-P(no tails)=1-0.0078=.9922
Example: S = {0,1,2,3,4,5,6,7,8} A = {2,3,6,7} B = {0,3,6,8}
A and B = {3,6} A or B = {0,2,3,6,7,8} A^c and B = {0,8} A^c or B^c = {0,1,2,4,5,7,8} (A and B)^c = {0,1,2,4,5,7,8} (A or B)^c = {1,4,5} A and A^c = {ø}
Class problem: Roll a die twice, what is the probability that the number on the second cast is greater than the one on the first cast?
P(2nd>1st) = 15/36=5/12
Calculating probability: Roll a die twice, what is the probability that the sum of the faces will be 8?
P(Sum=8)=5/36
sample space
the sample space is the set of all possible outcomes, denoted S example: toss a coin three times. The sample space is ... S={HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} -the size of S is denoted lSl. -example: toss a die twice. The sample space is... S={(1,1), (1,2),...(1,6), (2,1), (2,2),...(2,6),...(6,6)} example: pull two cards from a well-shuffled deck. How many elements are in the sample space?
Class Problem: A card is drawn from a deck of 52 cards. -what is the probability that it is neither a diamond nor an ace? -What is the probability that it is either not a diamond or it is not an ace?
-13 cards are diamonds and 3 more are aces, that leaves 36 cards, so 36/52= .6923 -there is only one card that doesn't fit either category-the ace of diamonds, so 51/52= .9808
Events
-An event is some set of outcomes from the sample space. events are denoted by capital letters A,B,C.... -The complement of an event A is the event that A doesn't happen -It is denoted A^c and may be thought of as the event "not A" -Two events are Independent if the probability of one occurring is not influenced by the other occurring
Events
-The event A and B is the set of outcomes that belong to both sets (their overlap) -The event A or B is both sets taken together. -the Empty set, denoted ø is the set containing no elements at all -two events are disjoint if they cannot both occur -disjoint events are sometimes called mutually exclusive, since the occurrence of one excludes the possibility of the other occurring
Prosecutor's fallacy
-a man is on trial for a crime, and forensic evidence is found at the scene which implicates him. -a prosecutor has an expert witness testify that the probability of finding this forensic evidence is 1 in 20,000 if the person is innocent -by itself, this argument is misleading... -the defense counters that there are 1,000,000 ppl in this city and so there are 50 people who could have left this evidence. -thus there is still only a 1 in 50 chance that the defendant is the one that left this evidence -the prosecutor would have to make an argument that significantly narrows down this pool of 40 people, like additional evidence. -this is tantamount to someone winning the lottery, and the prosecutor charging them of cheating because the odds of winning were so low.
Rules of Thumb (3 continued)
-if there are 23 people in a room, what is the probability that at least two of them have the same b-day? -P(at least 2) = 1-P(all different)= # different bdays for 23 people/# possible bdays for 23 people = 1- (365*364*363...*343/365*365*365*...*365)=1-.4927 = 50.73%
4.3 random variables
-random variable is a variable that assigns a number to each outcome of an experiment. This is not to be confused with an algebraic variable. -the probability distribution of a random variable is a listing of each possible outcome of a random variable together with that outcomes probability -X: X1, X2, X3... -P(X): P1,P2,P3... example: toss a coin 3 times. Let X=the number of heads -X:0,1,2,3 -P(X): 1/8,3/8,3/8,1/8
Better example
-woman visits her doc and gets tested for rare disease -doc indicates that the test is 99% accurate (false positive=1%) -woman tests positive, she concludes there is a 99% chance she has the disease -this is a rare disease, suppose the incidence in the population is 1 in 50,000. -if 50,000 people are tested, we would expect 500 to test positive even though only one person has the disease -thus, even after testing positive, she only has a 1 in 500 chance of having the disease
4.1 randomness
...
4.2 probability models
...
Class Problem: Out of 125 students surveyed, 12 were accounting majors, 24 were business majors, and 34 were either an accounting major or business major (or both). Draw and label a Venn Diagram
Acc 10 Both 2 Bus 22
Class problem: employee bonuses are awarded at the end of the year. Thomas realizes it is possible for him to get a $5000 bonus, but it is unlikely. He is twice as likely to get a $2000 bonus, seven times as likely to get a $1000 bonus, and ten times as likely to get a $500 bonus. -construct the probability distribution for Thomas's bonus (first call the probability of getting a $5000 bonus p)
Bonus: 5000 2000 1000 500 probability: p 2p 7p 10p -sum of probabilities = 1 > 20p=1 > p=0.05 bonus: 5000 2000 1000 500 Probability: .05 .10 .35 .50 E(X)=(5000)(.05)+(2000)(.10)+(1000)(.35)+(500)(.5)=1050 V(X)=(5000-1050)^2(.05)+(2000-1050)^2(.10)+(1000-1050)^2(.35)+(500-1050)^2(.5)=(powerpoint says 1,022,500 but I got 378,996,250)
Properties of Mean and Variance
E(c)=c V(c)=0 E(X+/-Y)=E(X)+/-E(Y) E(cX)=cE(X) V(cX)=c^2V(X) if X and Y are independent: V(X+/-Y)=V(X)+V(Y)
INDEPENDENCE: two events A and B are independent if P(A and B)=P(A)*P(B)
Example: P(A)= .3 P(B)= .5 P(A and B)= .10 .15 does not equal .10 so A and B are not independent Example: P(A)=.2 P(B)= .6 P(A or B)= .68 are A and B independent? (first use addition rule) By the addition rule. P(A and B)=.12 and (.2)(.6)=.12, so A and B are independent. -do not confuse independence with disjoin. Independence cannot be illustrated on a Venn diagram.
CLASS PROBLEM: In real estate ads it is found that 64% of homes have garages, 9% have pools, and 28% have a finished basement. 5% have a garage and a pool, 19% have a garage and a basement, 4% have a basement and a pool, and 2% have all three. What percentage of homes do not have any of these three?
G=64-2=62-3-17=42 P=9-2=7-3-2=2 B=28-2=26-17-2=7 G&P=5-2=3 G&B=19-2=17 B&P=4-2=2 All=2 100-42-2-7-3-17-2-2=25
Examples:
In a game, a die is thrown. Alan pays Sally $1 if the die falls 1,2, or 3, and $3 if the die falls 4 or 5. If the die falls 6, Sally has to pay Alan $8. What is the expected value and standard deviation of the amount Sally wins? Winnings X: 1 3 -8 P(X): 0.5 0.333 0.1667 -E(X)=(1)(0.5)+(3)(0.333)+(-8)(0.1667)=(power point got $0.1667 but my calculations were $0.1654) -PwPtV(X)=(1-.1667)^2(0.5)+(3-.1667)^2(.333)+(-8-.1667)^2(.1667)= 14.13 -myV(X)= (1-.1654)^2(0.5)+(3-.1654)^2(.333)+(-8-.1654)^2(.1667)=14.13
The Addition Rule: P(A or B)=P(A) + P(B)-P(A and B) Overlap counted twice, subtract out once.
In an office building of 80 people, 28 work on Saturday, 11 work on Sunday, and 3 people work on both Sunday and Saturday. What is the probability that a person in this office works at least one of these days? P(Sat or Sun)= P(Sat) + P(Sun) - P(Both) = 28/80+11/80-3/80=.45
Class problem: John is suing his landlord. If he wins. he will be awarded $6000 and will not have to pay any court costs. If he loses, he will have to pay court fees totaling $200. -john has found a lawyer that will represent him for $1200. If he hires this lawyer, there is an 80% chance he will win, and if he represents himself there is only a 60% chance that he will win. -should john hire this lawyer? (calculate his expected net winnings using the lawyer and his expected net winnings not using the lawyer)
With lawyer: 4800 -1400 P(X): .8 .2 -E(X)= (4800)(.8)+(-1400)(.2)=3560 Without lawyer: 6000 -200 P(X): .6 .4 -E(X)=(6000)(.6)+(-200)(.4)=3520
Class problem: K, A, and M have completed several relay triathlons. K-swimming, A-bikes, M-runs. Their respective completion times (in hours) have means .77, 1.33, and .9, and their respective standard deviations are .05, .08, and .06. a) what is their expected team finish time? b) what is the standard deviation of the team finish time? c) assume their team finish times are normally distributed. What is the probability that they finish the triathlon 15 minutes earlier than usual?
a)E(K+A+M)=E(K)+E(A)+E(M)=.77+1.33+.9=3 b)V(K+A+M)=V(K)+V(A)+V(M)=.0025+.0064+.0036=.0125 oK+A+M=Square root of .0125=.1118 c) T N(3, .1118) > P(T<2.75)=P(Z<2.236)=0.0127
Examples:
calculate the mean and standard deviation of the following random variable: -X: -2 3 7 -P(X): .3 .1 .6 -E(X)= (-2)(.3)+(3)(.1)+7(.6)=3.9 -V(X)=(-2-3.9)^2(.3)+(3-3.9)^2(.1)+(7-3.9)^2(.6)=16.29
4.4 properties of random variables
definitions: expected value (or mean) of a random variable: this is denoted E(X) Variance of a random variable: this is denoted V(X)
probability
the probability of an outcome is the proportion of times that it would occur over many repetitions. -often, people expect the outcomes to settle into some regularity much sooner than they actually do.
Gambler's fallacy, or "law of averages"
psychological prejudice that assumes observations will behave as expected much sooner than necessary. In other words, thinking an event is "due" or "not due" -playing a different lottery number than last week's winning number because the chances it would come up twice in a row are so small. -building your home in the exact spot that a meteor struck reasoning it would almost impossible for a meteor to strike in the same place twice. -a man brings a bomb on a plane. he reasons "the chances of there being a bomb on a plane are so small, so the chances of there being another one are almost zero"
Law of Large Numbers
states that as an experiment is repeated over and over, the observed frequency of an outcome gets closer to its expected frequency.
benford's law, also called the first-digit law
states that for certain kinds of data, the first digit in each data value has a curious frequency -this can be used to access the legitimacy of certain date -for appropriate data, first digits have the following distribution )with the last value missing -first digit: 1 2 3 4 5 6 7 8 9 -Probability: .301 .176 .125 .097 .079 .067 .058 .051 ? -1. what is the probability that the first digit is 9? .046 -2. What is the probability that the first digit is at least 2? (pp says .699 but I don't understand that)
Example:
suppose X and Y are independent, and E(X)=120 ox=12 E(Y)=300 ox=16 Find the mean and standard deviation of 2X-5Y E(2X-5Y)=2E(X)-5E(Y)=2(120)-5(300)=-1260 V(2X-5Y)=V(2X)+V(5Y)=4V(X)+25V(Y)=4(144)+25(256)=6976 > o2x-5y=square root of 6976=83.522