Unit 8: Random Variables and Probability Distributions.

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Another way of writing variance:

Var(X + Y) = Var(X) + Var(Y) can also be written as (σsub(X+Y))^2 = ((σsub(X))^2) + ((σsub(Y))^2) σ here has a subscript of the variable, and is squared at the same time.

Variance, formula form.

Var(X) = E((X - μx)^2), or ((σsub(X))^2)

Geometric Standard Deviation

Square root of (1-p)/p P = chance of win

μ

mu, or mean/expected value

Not about "(N choose K)" setups

things like (6 choose 6) and (6 choose 0) both come out to 1.

10% rule of independence (binomials)

"If our sample is less than or equal to 10% of the population, then it is ok to assume (approx.) independence." Example: X = # of boys from 3 trials of selection from a classroom of N students that were 50% girls and 50% boys.

Using Binomial Formulas

"find the probability of 2 wins in 3 attempts" = P(2 successes in 3 attempts) = Win * Win * Fail "find the amount of combinations for 2 wins in 3 attempts" = (3!)/(2!(3-2)!) "find the probability of exactly 2 wins in 3 attempts" or "find P(X = 2)" = P(2 wins in 3 attempts) = (3 choose 2) * Win * Win * Fail "find P(X < 1)" or "find P(X > 3) = P(at most 1 win in 5 attempts) = P(0 wins) + P(1 win) P(at most 1 win in 5 attempts) = (fail)^5 + [(5 choose 1) * (Win) * (Fail)^4] or P(more than 3 wins in 5 attempts) = P(4 wins) + P(5 wins) P(X > 3) = P(X = 4) + P(X = 5

Binomial Variable parts

- made up of independent trials (10 flips, 5 tries, etc) - Each trial has only two outcomes (got the thing we needed/did not get the thing) - there is a fixed amount of trials - Each trial has an identical chance for success

6!

6 factorial

Binomial Probability Distribution

A probability distribution showing the probability of x successes in n trials of a binomial experiment. -- X = # of successes in 6 attempts A = % chance of 1 success B = % chance of 1 fail P(X = 0) = (6 choose 0)*A^0 * B^6 P(X = 0) = 1 * 1 * B^6 P(X = 1) = (6 choose 1)* A^1 * B^5 P(X = 1) = 6 * A^1 * B^5 P(X = 2) = (6 choose 2)* A^2 * B^4 P(X = 2) = 15 * A^2 * B^4 P(X = 3) = (6 choose 3)* A^3 * B^3 P(X = 3) = 20 * A^3 * B^3 P(X = 4) = (6 choose 4)* A^4 * B^2 P(X = 4) = 15 * A^4 * B^2 P(X = 5) = (6 choose 5)* A^5 * B^1 P(X = 5) = 6 * A^5 * B^1 P(X = 6) = (6 choose 6)* A^6 * B^0 P(X = 6) = 1 * A^6 * B^0 --- NOTES: - Even though the "choose X" coefficients are symmetrical, the resulting percentages from a filled distribution will not be, because he chances of success and failure are uneven.

Geometric Random Variable Distributions

Always right-skewed, with an infinitely long tail.

Alt method for Variance of Random Variables:

Assuming the following info: E(X) = μx = 16 σx = 0.8 15 < X < 17 E(Y) = μy = 4 σy = 0.6 3 < Y < 5 ... (σsub(X+Y))^2 = ((σsub(X))^2) + ((σsub(Y))^2) (σsub(X+Y))^2 = ((0.8)^2) + ((0.6))^2) (σsub(X+Y))^2 = (0.64) + (0.36) (σsub(X+Y))^2 = 0.64 + 0.36 (σsub(X+Y))^2 = 1

Quick Formulas (Subtracting)

D = X - Y E(D) = E(X) - E(Y) Var(D)^2 = Var(X)^2 - Var(Y) ^2 σ(D) = Sq root of(Var(X)^2 + Var(Y) ^2) TIPS: - use the σ(D) formula when finding deviation, not determining difference in it.

Geometric Random Mean

E(X) = 1/P P = chance of win

Generalizing K scores (in N attempts)

F = % chance of 1 success P(Exactly K Successes in N attempts) = (N choose K) * F^K * (1 - F)^(N - K)

Finding P(|D| < N), also phrased as within N points of each other.

First, find your means and deviations. If it's two, proceed as normal. If it's just one mean and deviation, proceed as such: μ = 41 σ = 9 μ = 41 +- 41 = 0 σ = Sq root[9^2 + 9^2] = 12.728 Now, take a Normal Distribution chart, and mark two lines: one N points Up from the mean, and one N points down from the mean. Now, find the z scores behind each line (away from mean), and subtract them from each other. The result should be your answer.

Mean of the sum/difference of random variables

For any two random variables X and Y, if T=X+Y then the mean of T is equal to the mean of Y plus the mean of X. If D=X-Y, the. The mean of D is the difference between X's and Y's means. In general, the mean of the sum of several random variables is the sum of their means. Example: Let's say we have two variables, X and Y. We also have their mean: μx = 3 μy = 4 Now, what we're looking for is E(X + Y) or μ-sub-(X + Y). μ-sub-(X + Y) just represents the two means added together, and can be re-written and solved: μ-sub-(X + Y) = μx + μy μ-sub-(X + Y) = 3 + 4 μ-sub-(X + Y) = 7 That's pretty easy, but what if you subtracted? Well, formatting has to be slightly different. We know that Y's expected value (mean) is bigger than X's, so we should put it front, like so: E(Y - X) = μ-sub-(Y - X) = μy - μx μ-sub-(Y - X) = 4 - 3 μ-sub-(Y - X) = 1

Is this probability Distribution model valid?

If all probabilities add to 100% or 1, then yes. If not, then it is not a valid model.

Geometric random variable Probability Formula

It will (almost) always start at P(X = 1), but going past that you can use this: P( X = Z ) = P(win) * P(fail)^(Z - 1) --> examples: P(X = 1) = P(win) * P(fail)^0 P(X = 4) = P(win) * P(fail)^3 P(X = 2) = P(win) * P(fail)^1

Probability with discrete random variables

Let's say Hugo plans to buy baseball cards until he gets a card he wants. He can only buy 4 packs, and suppose each pack has an 0.2 of containing the correct card. Let X be the amount of packs Hugo buys. The probability distribution for X is this: 1 pack has an 0.2 chance 2 packs have an 0.16 3 packs have a 0.128 4 packs have an unknown chance. we need P(X _> 2) P(X _> 2) = the probability of 2, 3 and 4 packs together. P(X >_ 2) = 0.8 bc if 1 pack is 0.2, then the rest must be 0.8 to add to 100. TIPS: - Add your probabilities together if you know all of them, and split it. - Don't round to 2 decimal places if it doesn't want that.

E(X) forms

Mean of a Discrete Random Variable, also known as its Expected Value, can be written as: E(variable) = something or μ with a subscript of (variable) = something mu (μ) is often used for mean, so that's why it is used.

Find probability with mean and standard deviation.

Let's say Shinji commutes to work every day, and worries about running out. His usage follows a normal distribution for the commute, but varies going home. The parts of the commute are also independent. We have the following chart as well: μw = 10L μh = 10L σw = 1.5L σh = 2L Now, suppose Shinji has 25L and wants to go to work and back. What's the probability he runs out fuel? Well, we need to find Work + Home, which looks like: T = W + H μT = μw + μh = 20L σT = ((σsub(w))^2) + ((σsub(h))^2) = 6.25 Now, we can still follow a normal distribution. In order for him to run out of fuel, he needs to consume more than 25 L of it. 25 is set pretty far from the mean (20), near the far edges. In order to find out exactly how far, we need to calculate how many standard deviations we are: Z = (25-20)/2.5 = 2 Therefore, 25 is about 2 deviations above the mean, and the z-score of that is 0.9772 Now, that gives us the area of the rest of the distribution, not actually the bit past 25, where he runs out of fuel. In order to find it, we need to subtract our existing z-score from 1: 1 - 0.9772 = 0.0228 Therefore, the probability that Shinji runs out of fuel is a 2.28% chance.

Variance of the sum of random variables

Let's say we have 2 variables, X and Y. We know their expected values, deviations, and they are constrained within a range as well, as shown here: E(X) = μx = 16 σx = 0.8 15 < X < 17 E(Y) = μy = 4 σy = 0.6 3 < Y < 5 From there, we need Var(X + Y). Now, it works very similarly to adding means together, as you can see: Var(X + Y) = Var(X) + Var(Y) (Note that this is assuming X and Y are independent) We can also start to look at this a bit differently, using the ranges of both variables. Starting with the highest possible value: Var(X + Y) = Var(X) + Var(Y) Var(X + Y) = 17 + 5 Var(X + Y) = 22 and the smallest: Var(X + Y) = Var(X) + Var(Y) Var(X + Y) = 15 + 3 Var(X + Y) = 18 Now we can say this: 18 < (X+Y) < 22 And have an answer. Another way to think about this is that both X and Y have a range of 2 (17-15 and 5-3), and our answer had a range of 4 (22-18). [warning, the range process is a theory and was untested at time of writing]

Variance of the difference of random variables

Let's say we have 2 variables, X and Y. We know their expected values, deviations, and they are constrained within a range as well, as shown here: E(X) = μx = 16 σx = 0.8 15 < X < 17 E(Y) = μy = 4 σy = 0.6 3 < Y < 5 From there, we need Var(X - Y). Basically, It works the same as adding. The smallest and largest possible values for X - Y should be as follows: Var(X - Y) = Var(X) - Var(Y) Var(X - Y) = 15 - 5 Var(X - Y) = 10 and the biggest: Var(X - Y) = Var(X) - Var(Y) Var(X - Y) = 17 - 3 Var(X - Y) = 14 In order to obtain the smallest possible or largest possible, the ranges used can change. It should almost never be big - Big or Small - Small. Now, our answer looks like this: 10 < (X-Y) < 14 and again, the range is 4 as opposed to its components' 2s.

Why independence in variance of random variables matters

Let's say we have two variables, X and Y. They are dependent on each other, and we need to add them. Currently, we know that X + Y must equal 24, that Var(X) is 4 and σx is 2. We also know Var(Y) =4 , and σy is 2 as well. However, we can't add the variances together as we normally would because they directly affect each other. If we did try to add them together, it would come out to 8, which contradicts the earlier rule that X+Y must always be 24. But! if X + Y must always be 24, with no wiggle room, then we know that X + Y has no variance. Therefore, we can say this: Var(X + Y) = 0.

Shift a (random Discrete) variable (adding)

Let's take a random variable (X) and its (normal) distribution, with lines at the mean, 1 deviation down and 1 deviation up. An easy way to shift would be to take a new random variable, Y, and give it X + a constant, represented by k. Y = X + k filled out, it might look like Y = X + 10, or Y = X - 5. Either way, it shifts the whole thing to the right (or left for negatives), adjusting the mean and deviations of the distribution by the constant. For example, if you had a mean of 6, and you shifted by 2, your new mean is now 8. Shifting affects standard deviation the same way. Now, keep in mind that the new, shifted distribution actually belongs to Y, not X, but we can solve for the mean of X using it, like so: μy = μx + k μy - k = μx [can only be completed when filled out, but you get it] The standard deviation is just a measurement away from the mean, and should stay the same: σy = σx

Scale a (random Discrete) variable (multiplying)

Let's take a random variable (X) and its (normal) distribution, with lines at the mean, 1 deviation down and 1 deviation up. Another way to shift would be to take a new random variable, Z, and give it X * a constant, represented by k. Z = kX filled out, it might look like Z = 10X, or Z = 5X. Either way, it scales the distribution, meaning that it would become shorter and longer by a factor of k, while retaining the same base area. For example, if you had a mean of 6, and you scaled by 2, your new mean is now 12. Scaling affects standard deviation a little differently, like so: σz = σx * k The scaled version's mean is edited similarly to a shifted one, as seen here: μz = μx * k -- and can (probably) be solved to get the mean of X.

Transforming a (discrete random) variable example.

Let's take a variable X, and have it represent the number of successes a player gets in a carnival game. The game works by shooting 2 free-throws (basketball). We have a chart displaying the probability distribution of X, number of successes in 2 attempts, and some summary statistics: if X = 0, then P(X) is 0.16 if X = 1, then P(X) is 0.48 if X = 2, then P(X) is 0.36 σx (about) = 0.69 μx = 1.2 The game costs 15$ to play, and gives back 10$ for every success. In this case, we want to find the mean and standard deviation of the player's net gain, represented by N. If the player gets 10 back a shot, we can say N = 10 times the amount of attempts: N = 10X BUT! he needs to pay to play, so we add to this: N = 10X - 15. Now, we have several options for X to be, and we need to use them all before we can start solving for the mean or deviation of N. N = 10(0) - 15 becomes N = -15, with P(N) of 0.16 N = 10(1) - 15 becomes N = -5, with P(N) of 0.48 N = 10(2) - 15 becomes N = 5, with P(N) of 0.36. With this knowledge, we can start by solving for mean, which looks like: μn = 10(1.2) - 15 μn = 12 - 15 μn = -3 we can do the same to the standard deviation [σn = 10(σx) - 15] And get our answers: μn = -3 and σn = 6.9 IMPORTANT: standard deviation (almost) NEVER gets something like -15 added to its equation, plan accordingly.

SRS

Simple Random Sample

Binomial Probability

P(Success) = X P(Miss) = Y P(Exactly 2 Successes in 6 attempts) = (6 choose 2)X^2 * Y^4 P(Exactly 2 Successes in 6 attempts) = 15 * X^2 * Y^4 -- first part is X^2 times Y^2, because: X * X * Y * Y * Y * Y is just X^2 * Y^4 Y X Y X YY is the same, because it's just a diff order. The second bit is finding out all possible combinations, whose representation looks like the image and will be represented as "6C2" and "(6 choose 2)". Exclamation marks are not punctuation here. 6C2 = (6 choose 2) = 6!/2!(6-2)! we solve that by. turning it into this: (6 * 5 * 4 * 3 * 2 * 1)/(2 * 1 * 4 * 3 * 2 * 1) "4321" bit cancels out, making: (6 * 5)/(2* 1) Cancel out 2 by turning 6 into 2 and 3, making: (3 * 5)/1 15/1 15 possible combinations.

Cumulative Geometric Probability (fewer than a value)

P(win) = 0.1 P(lose) = 0.9 V = # of attempts until a win. P(V < 5) = P(V = 1) + P(V = 2) + P(V = 3) + P(V = 4) P(V < 5) = (0.1) + (0.1 * 0.9) + (0.1 * 0.9^2) + (0.1 * 0.9^3) P(V < 5) = 0.1 + 0.09 + 0.081 + 0.0729 P(V < 5) = 0.3439 or 34.39% OR: 1 - P(first 4 failed) 1 - (0.9^4) = 0.3439 or 34.39% NOTE: P(V < 2) = P(V = 2) + P(V = 1) or 1 - P(first 2 failed)

Cumulative Geometric Probability (greater than a value)

P(win) = 0.12 P(lose) = 0.88 V = # of attempts until a win. P(V > 4) is technically just P(V = 5) + P(V = 6) + infinity, but to solve it we use: P(V not > 4) = P(first 4 failed) = 0.88^4 P(V not > 4) = 0.88^4 P(V not > 4) = 0.5997 P(V not > 4) and P(V > 4) are the same, so therefore: P(V > 4) = 0.5997 or 59.97% NOTE: P(V > 4) = 0.88^3

Geometric Random Variables

Random variables that count the number of repetitions of the chance process it takes for the outcome of interest to occur. Checklist: - trial outcome win/fail - independent trial results - same probability of win in each trial - unlimited amount of trials. Also shown as "Y = # of trials until condition X has been achieved."

Difference in distributions

Suppose that: - Men have a mean height of 178, and deviation of 8 - Women have a mean height of 170 and a deviation of 6 - both heights are normally distributed - The chosen man and women are independent. What is the probability that the woman is taller than the man? Let's put M for man and W for woman, and use D = M - W. Solving that, we can get the mean and deviation of D as well. μD = μm - μw = 8 (σD) = Sq root of[((σm))^2) + ((σw))^2)] = 10 We can establish the distributions pretty easily, knowing the deviation and mean of M, W, and D. With them established, we can go back to original question and re-write it. What is the probability that the woman is taller than the man? = P(D < 0) Now, we can put the 0 mark on D's distribution (a little bit closer than the first deviation down) and solve for it. As mentioned before, we need to use this: Z = (n - μ)/σ Z = (0 - 8)/10 Z = -0.8 And find the Z-score: 0.21186 or about a 2.12% that the woman is taller than the man. NOTES: - subtracting from 1 is needed if you're trying to solve for one of the ends of the distribution. - If only one set of info is provided, substitute it in for both values. (EX: μm = 65 + 65)

Quick Formulas (adding)

T = X + Y E(T) = E(X) + E(Y) Var(T)^2 = Var(X)^2 + Var(Y) ^2

How to check a distribution graph for validity

The decimals or probabilities are on the side, so you would add them up. Checking this graph looks like 0.1 + 0.2 + 0.1 + 0.4 + 0.1 = 0.9 Which shows that it is not valid.

Interpreting Expected Value/Mean

The mean/expected value of a random variable is the long-run average outcome of a random phenomenon carried out a very large number of times. For example, A lottery ticket costs 2$, and the back says "the overall odds of winning with this ticket are 1 : 50, and the expected return is $0.95" We can infer the following from this: - The chance of winning is 1/51 - The return is 0.95 That means that 0.95 is not a win prize (the prize is unknown) and there is a low chance of winning. The best answer here is one that restates or reinforces our info, such as the average return would be 0.95. Tips: - Net gain is different from Expected return. - Expected return is strictly over large quantities of time/objects. - Expected return cannot be used as probability.

Valid Discrete Probability Distribution Model

We'll use this example to create a probability model: An alien abducts 97 chickens, 47 cows and 77 humans. he randomly selects one from that sample to experiment on. They all have an equal probability of getting selected, so what is the probability of selecting each? Well, we have 221 animals total, so to get the probability of each we just have to divide them by 221, like so: 97 chickens/221 animals = 0.44 47 cows/221 animals = 0.21 77 humans/221 animals = 0.35 Now, we need to double check and make sure the probabilities add to 100 --- 0.44 + 0.21 + 0.35 = 1 --- which they do, so this is a valid model.

variance of a discrete random variable

Weighted average of the squared deviations of the values of the variable from their mean. Example: X is the number of workouts in a week. We have a chart that says: When X = 0 or X = 4, P(X) = 0.1 When X = 1, P(X) = 0.15 When X = 2, P(X) = 0.4 When X = 3, P(X) = 0.25 and E(X) = 2.1. Now, we want to find Var(X). Var(X) also spans across all our data points, so it looks like this: Var(X) = (0 - 2.1)^2 x 0.1 + (1 - 2.1)^2 x 0.15 + (2 - 2.1)^2 x 0.4 + (3 - 2.1)^2 x 0.25 + (4 - 2.1) x 0.1 Formula for Var(X) seems to be: Var(variable) = (1st value - E(variable))^2 * 1st value P(variable) + (2nd value - E(variable))^2 * 2nd balue P(variable), and so on. Simplified, Var(X) = 1.19

Find the Mean (μ) and Standard Deviation (σ) of a binomial random variable

X = # of defects in 500. P(defect) for each trial is 0.02 (or 2%) μx = 500 * 0.02 = 10 σx = Sq root of[ Var(X) ] σx = Sq root of[ (500 * 0.02(1 - 0.02) ] σx = Sq root of[ (500 * 0.02(0.98)) ] σx = Sq root of[ (500 * 0.0196) ] σx = Sq root of[ (9.8) ] σx is about 3.13

Binomial Variable (example)

X = # of heads after flipping a coin 10 times - That's a binomial variable, because each flip is independent, there's a fixed amount of tries (10), each trial can only be two things (head/tails), and each trial has the same probability of success. Y = # of kings after drawing 2 cards from a deck without replacement - Y is not a binomial variable, because the trials are not independent, and there fore the probability changes between attempts as well

Variance of a Binomial Random Variable

X = # of successes after N trials where P(success) for each trial is P Var(X) = N * P(1 - P) Or Var(X) = N * Var(Y) Var(Y) = P(1 - P)^2 + (1 - P)(0 - P)^2 = P(1 - P) --> Example X = 3 wins in 5 games where P(win) = 50% Var(X) = 5 * 0.5(1 - 0.5) Var(X) = 5 * 0.5(0.5) Var(X) = 5 * 0.25 Var(X) = 1.25

Expected value of a Binomial Random Variable

X = # of successes after N trials where P(success) for each trial is P Y = P(Y = 1) = P, P(Y = 0) = 1 - P E(X) = NP Or E(X) = N * E(Y) E(Y) = 1P * 0(P - 1) = P ---> Example X = 3 wins in 5 games where P(win) = 50% E(X) = 5 * 0.5

Standard deviation of a binomial random variable

X = # of successes after N trials where P(success) for each trial is P σx = Sq root of[ Var(X) ] or σx = Sq root of[ (σx)^2 ]

Graphing Binomial Probability Distribution

X = # of successes in 6 attempts A = % chance of 1 success B = % chance of 1 fail P(X = 0) = (6 choose 0)*A^0 * B^6 P(X = 1) = (6 choose 1)* A^1 * B^5 P(X = 2) = (6 choose 2)* A^2 * B^4 P(X = 3) = (6 choose 3)* A^3 * B^3 P(X = 4) = (6 choose 4)* A^4 * B^2 P(X = 5) = (6 choose 5)* A^5 * B^1 P(X = 6) = (6 choose 6)* A^6 * B^0 -> graphing Take the highest resulting percent, approximate it up to the nearest ten, and put it at the top limit of the Y-axis. Fill in the rest of the numbers with percents as well. (if 40 is your top, put 20 in the middle and start filling in.) Take the distribution of X, and put the values on the X-axis. (in this case, 0 as the first tick, 1 as the next, and so on until you reach 6) Mark your percents on the graph as you would normally. (we used a bar graph/histogram here)

Nested Fractions

Y / (1/x) = Y divided by 1/X. which is the same as: Y * (X/1)

Mean of a Discrete Random Variable

multiply each possible value by its probability, then add all the products. Example: X is the number of workouts in a week. We have a chart that says: When X = 0 or X = 4, P(X) = 0.1 When X = 1, P(X) = 0.15 When X = 2, P(X) = 0.4 When X = 3, P(X) = 0.25 From that, We can tell that X is a discrete random variable. We also know that we're looking for expected value (mean), or E(X). To find E(X)'s values, take each X value and multiply by its probability... 0 * 0.1 = 0 1 * 0.15 = 0.15 2 * 0.4 = 0.8 3 * 0.25 = 0.75 4 * 0.1 = 0.4 Add the results together... 0 + 0.15 + 0.8 + 0.75 + 0.4 = 2.1 And we have the answer: E(X) = 2.1 Tips: - If there are other numbers in the chart, ignore them and only focus on X - X's values can be any numbers.

σ

sigma (lowercase), standard deviation

Standard Deviation of a discrete random variable

the square root of the variance. Example: X is the number of workouts in a week. We have a chart that says: When X = 0 or X = 4, P(X) = 0.1 When X = 1, P(X) = 0.15 When X = 2, P(X) = 0.4 When X = 3, P(X) = 0.25 We also know E(X) = 2.1 and Var(X) = 1.19. Now, all we need is σ -sub- X, or standard deviation. σ -sub- X is just the square root of 1.19 so roughly 1.09.


Kaugnay na mga set ng pag-aaral

Chapter 10- Elder Abuse and Neglect

View Set

Macro: Chapter 14: MONETARY POLICY

View Set

Chapter 20- Muscular System and Pathologies

View Set

Geography module 4 week 6 grade 8

View Set