CSCI 154 Midterm 1
Trapezoid Rule
A definite integral can be approximated by a trapezoid Integral from a to b = (b - a) * (f(a) + f(b)) / 2) Rough approximation but increases as the number of intervals N increases, so does the result In the limit (infinite number of trapezoids) we would approximate the integral with absolute accurracy
Random variables
A function that assigns a real number to each outcome in the sample space of a random experiment A variable whose values depends on the outcome of a random phenomenon (experiment) Must be numerical Usually, capital letters are used to denote Ex: Define X by assigning the value 1 or 0 to each if when flipping a coming we get heads or tails, respectively
Probability Space
A mathematical construct that provides a formal model of a random experiment Is a triple (𝛀, F, P)
Convolution
A mathematical operation on two function (f and g) to produce a third function that expresses how the shape of one is modified by the other Written as f*g (f * g) defines as ∫f(𝛕)g(t-𝛕) d𝛕 from negative infinity to infinity
Mean of continuous random variable
A measure of the center of the distribution and considers a weighted average of the possible values of the random variable, with the pdf providing weights 𝜇_X = E(X) = ∫x*f_X(x) dx, integrate from negative infinity to infinity Where a pivot is placed so that the PDF balances
Random variate
A particular outcome of a random variable: the random variates which are other outcomes of the same random variable might have different values
Statistic
A quantitative characteristic of a sample that often helps estimate or test the population parameter (such as the sample mean or sample variance)
Parameter
A quantitative characteristic of the population that you are interested in estimating or testing (such as the population mean or variance)
Process
A series of events that achieve a particular result, or a series of changes that happen naturally Always exists in the context of systems
System
A set of things working together as parts of a mechanism or an interconnecting network
Pseudorandom Number Generator
Algorithms that produce pseudo-variate More formally: An algorithm that generates a sequence of numbers that approximate the properties of a sequence of random numbers Numbers generated are not truly random and are determined by an initial value called the seed Good statistical properties are a central requirement for the output
Error Function
Also called the Gauss error function No closed from solution exist; It cannot be evaluated in a finite number of operations (informal definition) erf(x) = (1/√𝜋) ∫e^(-t^2) dt from -x to x = 2/√𝜋 ∫e^(-t^2) dt from 0 to x We can approximate a solution with numerical methods Probability of a random variable to fall in the range of -x to x
Normal Distribution
Also known as Gaussian Distribution Characterized by a bell-shape probability distribution function A large number of random variables observed in nature possess a frequency distribution that is approximately mound shaped and can be modeled by this pdf is bell shaped Characteristics: - f(x) = 1 / (√(2𝛑𝛔^2)) * e^(-(x-𝜇)^2 / 2𝜎^2) - F(x) = 1/2[1 + erf(x-𝜇 / 𝜎√2)]
Experiment
Also known as a trial A procedure that can be infinitely repeated and has a well-defined set of possible outcomes There are two types
Simulation
An approximative imitation of the operation of a system or a process
Probability Mass Function (PMF)
Assigns the probability that a discrete random variable is exactly equal to some value Typically depicted as a table, plot, or equation The notion p(x) is typically used for the PMF of a random discrete variable x Basic Properties - 0 ≤ p(x) ≤ 1 (probability of any outcome is between 0 and 1) - ∑p(x_i) = 1 (sum of all probabilities of all events is 1) - p(x) = P(X = x) (probability of event is random variable taking this value) Example: The random variable W is assigned 0 if a mice is underweight, 1 if it's considered normal, and 2 if it's overweight, and 3 if obese W 0 1 2 3 p(W) = 0.02 0.27 0.33 0.38
intersection / P(E1) = 0 / P(E1) = 0
Assume that you shoot only one bullet. Identify the conditional probability P(E2 | E1) and P(E1) = 0.8. Only 1 bullet so if we shoot one target we know for the that we did not hit the other one. The intersection is 0.
0.4 / 0.8 = 0.5
Assume that you shoot only one bullet. Identify the conditional probability P(E2 | E1) given that P(E2 ∩ E1) = 0.4 and P(E1) = 0.8.
0.8*0.8 = 0.64
Assume that you shoot two bullets. Identify P(E1,t=2) | E1,t=1) given P(E1,t=1) = P(E1, t=2) = 0.8. The probability of shooting target 1 again after you already shot target 1 The probability stays the same, your shooting does not get better; each event is independent just want two things to happen
P(E1) = target 1 area / wall area
Assuming a bit shoots bullets randomly (all over the wall). Identify the probability of hitting the first target, E1 with a single bullet.
P(E1 ∪ E2) = target 1 area + target 2 area - target intersection area
Assuming a bot shoots bullets randomly (all over the wall). Identify the probability of hitting any one of the two targets, E1 ∪ E2, with a single bullet (we know P(E1 ∩ E2) = K)
Probability of event is between 0 and 1
Probability Collary 0 ≤ P(E) ≤ 1
Event Probability Axiom 3
E_i ∩ E_j = ∅(∀ i, j: i ≠ j) -> P(U E_i) = ∑ P(E_i) if the intersection of 2 events is empty (disjoint) then the probability of the Union is the sum of the probabilities of events. Ex: What is the probability of rolling a 2 or 5? Since they are disjoint you can just add the probabilities
Central Limit Theorem
Establishes that, in many cases, when independent random variables are added, their properly normalized sum tends toward a normal distribution even if the original variables themselves are not normally distributed Adding random variables that don't follow normal distribution -> sum following a normal distribution The PDF of the sum of two independent real-valued random variables equals the convolution of the PDF of the original values Ex: Rolling 1 die has uniform probability (all the same), as we more roll dice we get a more normal distribution
Mean
Expected value 𝜇 of a discrete random variable X is the sum of the possible values of X multiplied by the probability of the value Is the measure of central tendency 𝜇 = E(x) = ∑ x * p(x)
Axioms
Formalize probability in terms of a probability space which is a construct that models a random experiment
Independent
Identify if the following events are dependent or independent? Getting a 1 and getting a 5 on different dice rolls Getting an even number and getting a 2 on different dice rolls Picking a 2 of hearts and subsequently a 4 of hearts from the deck with card replacement
Dependent
Identify if the following events are dependent or independent? Getting an even number and a 2 on the same dice roll Getting and 2 and a 5 on the same dice roll Picking a 2 of hearts and subsequently a 4 of hearts from the deck without card replacement
False, the sample space is {HT, TT, TH, HH}
Identify whether the following statement is correct John defines an experiment to be flipping two coins. John flips two coins and observes the outcome HT. The sample space is {HT, TT, HH}
False, this is an outcome
Identify whether the following statement is correct Maris rolls a 6-sided die and records the number of dots facing up. 4 dots are facing up. The 4 dots facing up are an experiment.
P(A | B1)P(B1) + P(A | B2)P(B2) + P(A | B3)P(B3)
If B1, B2, B3 partition 𝛀 then: P(A) = P(A ∩ B1) + P(A ∩ B2) + P(A ∩ B3) =
Laplace's Definition
If S is a finite sample space of equally likely outcomes, and E is an event, that is, a subset of S, then the probability of E is p(E) = |E| / |S|. Probability problem on finite sample spaces of equally likely outcomes can be generally tackled using counting techniques. Ex: What is the probability of rolling an even number when rolling a die? 3/6 = 0.5
Dependent Events
If the occurrence of one event affects the probability of occurrence of the other (they are not independent) P(E1 | E2) = P(E1 ∩ E2) / P(E2), Conditional probability cares about the past P(E1 ∩ E2) = P(E1) + P(E2) - P(E1 ∪ E2), Addition Rule to calculate intersection P(E1 ∪ E2) = P(E1) + P(E2) - P(E1 ∩ E2), Addition rule to calculate union
Independent events
If the occurrence of one event does not affect the probability of occurrence of the other P(E1 ∩ E2) = P(E1)P(E2), the probability of the intersection of the two events equals the two probabilities of the events multiplied P(E1 | E2) = P(E1), The probability of Event 1 given Event 2 has occurred is just the probability of Event 1 P(E1 ∪ E2) = P(E1) + P(E2) - P(E1)P(E2), Addition Rule
Numerical Integration
In numerical analysis, comprises a broad family algorithms for calculating the numerical value of a definite integral In 1D integrals, definite integral corresponds to estimating the respective area below the line For high dimensional integrals Monte Carlo methods are more suitable
Confidence Interval
In range of values for unknown parameter and has an associated confidence level that give probability with which it will occur
Standard Deviation
Is the square root of the variance 𝜎 = √(𝜎^2)
P(A intersection B)
Join probability Both event happening at the same time x is apart of A AND x is apart of B
0(0.2) + 100(0.7) + 150(0.1) = 0 + 70 + 15 = 85
Kelly earns money testing websites at $10 per site. X represents Kelly's weekly earnings. She estimates the probability of testing 0 sites in a week in 20%, 10 sites is 70%, and 15 sites in 10%. What is the mean of X?
sqrt(2025) = 45
Kelly earns money testing websites at $10 per site. X represents Kelly's weekly earnings. She estimates the probability of testing 0 sites in a week in 20%, 10 sites is 70%, and 15 sites in 10%. What is the standard deviation for X? 𝜇 = 85 𝜎^2 = (0-85)^2 * 0.2 + (100 - 85)^2 * 0.7 + (150 - 85)^2 * 0.1 = 7225(0.2) + 225(0.7) + 4225(0.1) = 2025
Drawbacks of simulation experiments
Less accurate Less cost-efficient Slower Can be an overkill
Variance of continuous random variable
Measure of the spread of distribution and is calculated as var(X) = 𝜎_x ^2 = E[(X - 𝜇_X )^2 = ∫ (x - 𝜇_X )^2 f_X (x) dx std is the same A smaller this and standard deviation correspond to a distribution with values closer to the mean
Explicit what-if experiments
Most straightforward type of simulation The question is explicitly stated and considers hypothetical scenarios A series of these can also be used to find the best solution for a particular problem (optimization) Ex: What would happen if the sun explodes? What would happen if a particular virus emerges? Simulate several antiviral drugs to find the most efficient one.
Chain Rule
Order of intersection doesn't matter Given two events E1 and E2, P(E1 ∩ E2) = P(E2 ∩ E1) = P(E2 | E1)P(E1) = P(E1 | E2)P(E2) Rearranged conditional probability to solve for intersection
Probability of the Union of Partition
Probability Collary P(U𝛱) = 1 The union of the probabilities of the disjoint parts = 1
Probability of the empty set
Probability Collary P(∅) = 0
Addition Rule
P(A ∪ B) = P(A) + P(B) - P(A ∩ B) Probability of A plus the probability of B minus the probability of the intersection of Events A and B Use when we don't want the probability of both events
Probability of an event that won't occur
P(A') = 1 - P(A)
Set of events
Part of the probability space triple as F Each event is a set containing zero or more outcomes Power set of omega Ex: Rolling a die, F = {∅, {1}, {2},...,{1,...,6}}
Function
Part of the probability space triple as P The function from events to probabilities P(event) = probability Ex: Rolling a die, P: F -> [0,1] (real number between 0 and 1)
Sample space
Part of the probability space triple as omega Set of all possible outcomes Ex: Rolling a dice, 𝛀 = {1, 2, 3, 4, 5, 6}
Conditional Probability
Probability of something happening given something else happened The probability of E2 given E1 is defined by P(E1 | E2) = P(E1 ∩ E2) / P(E1) Captures the idea of event E2 given that event E1 already occurred In some way E1 becomes the new sample space 𝛀 of E1 ∩ E2
Monte Carlo Integration
Random points are thrown and the ratio of points that fall in the corresponding area reveal the value of the integral # in s / (# in f(x) + # in s) * c^2 s is the area under the curve, c^2 is the area of the entire space from a to b, f(x) is the area above the line Best with uniform distributions that do not favor particular sub areas Best suited for multidimensional integrals
Discrete random variable
Random variable where its possible value comes from a countable set Ex: Whole numbers, integers
Continuous random variable
Random variable where its possible value comes from an uncountable set Ex: Real numbers, decimals Every distinct x-value has a zero width and the probability for a single x-value is zero P(X = x) = 0 Find probabilities for intervals rather than specific values, referred to as probabilities of intervals
Benefits of simulation experiments
Safer Legal More ethical More cost-efficient Faster Easier to tune Easier to communicate Easier to concentrate focus (can reveal information that is hidden in the complexity of the real world)
The law of large numbers
States that the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer to the expected value as more trials are performed Derives directly from the definition of the mean Ex: While a casino may loose money in a single spin of the roulette wheel, its earnings will tend towards a predictable percentage over a large number of spins
Sample
Subset of a population
Objectives of science
1) Describe the world 2) Explain the world 3) Predict the world 4) Change the world All above rely on the scientific method
0.05
Bea's Herbs and Teas offers five teas at a tea tasting. X is the number of teas a customer purchases after the tasting. Based on the information from previous tasting, the PMF of X is shown below. X 0 1 2 3 4 5 p(x) 0.05 0.1 0.2 0.15 0.2 0.3 What is F(0)?
0.2 + 0.3 = 0.5
Bea's Herbs and Teas offers five teas at a tea tasting. X is the number of teas a customer purchases after the tasting. Based on the information from previous tasting, the PMF of X is shown below. X 0 1 2 3 4 5 p(x) 0.05 0.1 0.2 0.15 0.2 0.3 What is the probability that a customer purchases more than 3 teas?
0.05 + 0.1 + 0.2 + 0.15 = 0.5
Bea's Herbs and Teas offers five teas at a tea tasting. X is the number of teas a customer purchases after the tasting. Based on the information from previous tasting, the PMF of X is shown below. X 0 1 2 3 4 5 p(x) 0.05 0.1 0.2 0.15 0.2 0.3 What is the probability that a customer purchases no more than 3 teas?
0.05 + 0.1 + 0.2 + 0.15 + 0.2 + 0.3 = 1
Bea's Herbs and Teas offers five teas at a tea tasting. X is the number of teas a customer purchases after the tasting. Based on the information from previous tasting, the PMF of X is shown below. X 0 1 2 3 4 5 p(x) 0.05 0.1 0.2 0.15 0.2 0.3 What is the value of F(6.1)?
Probability Density Function (PDF)
Describes the relative likelihood of all values for a continuous random variable Notation f(x) is typically used for the pdf Basic Properties: - f(x) ≥ 0 for all x, relative likelihood ≥ 0 - ∫f(x) dx = 1, The integration of the whole domain from negative infinity to positive infinity is 1 - P(X = x) = 0, the probability of an exact thing happening is 0 - P(b ≤ X < a) = ∫f(x) dx from a to b, integrate density function to get probability between a and b interval No negative values Graphical representation is the most descriptive Area under the curve provides the probabilities Derivative of CDF f_X(x) = d/dx F_X(x)
Uses of simulation
Can enhances every aspect of human activity Can support decision making, training, entertainment, and even obedience Perfect environment to train humans Used to train agents in risk-free environments Used to train collectives of agents Used to train collectives of both humans and agents (human agent collectives)
simulation programming
Can often be difficult; difficult to code and debug Many simulation languages and/or simulation coding paradigms have been proposed over time Trend today is to use, develop and/or refine general purpose languages and simulation libraries, instead of inventing specific new languages
Cumulative Distributed Function (CDF) of discrete random variable
Captures the probability that for any number x, the observed value of the random variable will be at most x Notation F(x) typically used for the CDF of a random variable x F(x) = p(X ≤ x) = ∑ p(X = x_i) = ∑p(x_i) Sum of all probabilities from the beginning to the event we care about Defined on the real number line Non-decreasing function of X (either increases or stays constant) Example: W 0 1 2 3 p(W) = 0.02 0.27 0.33 0.38 F(w) = 0, w < 0 0.02, 0 ≤ w ≤ 1 0.29, 1 ≤ w < 2 0.62, 2 ≤ w < 3 1, 3 ≤ w
The set including the 6x6 = 36 tuples
Consider the random experiment of rolling two dice. Identify the sample space.
Uniform Distribution
Describes an experiment where there is an arbitrary outcome that lies between certain outcomes Has the following characteristics: a and b are the parameters, independent variables - f(x) = 1/(b-a) for a ≤ x ≤ b - f(x) = 0 for x < a or x < b - F(x) = 0 for x < a - F(x) = (x/(b-a)) - (a/(b-a)) for a ≤ x ≤ b - F(x) = 1 for x > b - E(X) = 1/2(a+b) - V(X) = 1/12(b-a)^2 - Slope is 1/b-a
Bayes' Theorem
Describes the probability of an event, based on prior knowledge of conditions that might be related to the event and is defined as P(A | B) = P(B | A)P(A)/P(B) and (P(B | A)P(A)) / (P(B | A)P(A) + P(B | not(A))P(not(A))) Derived from chain rule For independent events P(A | B) = P(A)
Probability of a subset is less than the probability of the set
E_1 ⊆ E_2 -> P(E_1) ≤ P(E_2)
Sample Variance
𝜎^2 = ∑ ((x_i - x bar)^2) / n Biased estimator Underestimates the variance by a factor of (n-1) / n Correct using Bessel's correction, use it for large enough samples when you want to estimate the population variance 𝜎^2 = ∑ ((x_i - x bar)^2) / n-1
Probability Theory
The branch of mathematics concerned with probability Treats the concept in a rigorous mathematical manner by expressing it through a set of axioms
Population
The entire group of individuals you want to study
Probability
The measure of the likelihood than an event will occur; the higher the probability of an event, the more likely it is that the event will occur Helps model uncertainty
Event Probability Axiom 1
The probability of event E needs to satisfy Kolmogrov's axioms: P(E) ≥ 0 ∧ P(E) ∈ R Probability of event is positive and a real number
Event Probability Axiom 2
The probability of event E needs to satisfy Kolmogrov's axioms: P(𝛀) = 1 Probability space is 1
Cumulative Distributed Function (CDF) of continuous random variable
The probability that for any number x, the observed value of the random variable will be at most x (the difference between < and ≤ is irrelevant here) Notation F(x) typically used Basic Properties: - Always starts at 0 and ends at 1 and never decreases as the value of X increases - May only approach the limits of 0 and 1 if the possible values of x are infinite Integral of PDF F_X(x) = ∫f_X(t) dt
Total Probability
The proposition that if {B_n : n = 1, 2, 3,...} (countable partition) is a finite or a countable infinite partition of a sample space (in other words a set of pairwise disjoint events whose union is the entire sample space) and each event B_n is measurable then for any event A of the same sample space P(A) = ∑P(A ∩ B_n) Alternatively, P(A) = ∑P(A | B_n)P(B_n) where for any n for which P(B_n) = 0 these termes are simply omitted from the summation, because P(A | B_n) is finite The summation can be interpreted as a weighted average, and consequently, the marginal probability, P(A) is sometimes called average probability
(99/100)(6/10) + (95/100)(4/10) = (594 + 380) / 1000 = 974/1000
Total Probability Suppose that two factories supply light bulbs to the market. Factory X's bulbs work for over 5000 hours in 99% of cases, whereas Factory Y's bulbs work for over 5000 hours in 95% of cases. It is known that Factory X supplies 60% of the total bulbs available and Y supplies 40% of the total bulbs available. What are the chances that a purchased bulb will work for longer than 5000 hours? P(B_x) = 6/10 is the probability that the purchased bulb was manufactured by Factory X P(B_y) = 4/10 is the probability that the purchased bulb was manufactured by Factory Y P(A | B_x) = 99/100 is the probability that a bulb manufactured by X will work for over 5000 hours P(A | B_y) = 95/100 is the probability that a bulb manufactured by Y will work for over 5000 hours P(A) = P(A | B_x)P(B_x) + P(A | B_y)P(B_y)
Mutually Exclusive Events
Two (highly) dependent events, E1 and E2, are this if their intersection is 0; P(E1 ∩ E2) = 0, meaning the events can't happen at the same time, it's either one or the other P(E1 | E2) = P(E1 ∩ E2) / P(E2) = 0, Conditional Probability P(E1 ∪ E2) = P(E1) + P(E2), Addition Rule to find Union
Probability Tree Diagrams
Used to represent a probability space May represent a series of independent events or conditional probabilities Each node on the diagram represents an event and is associated with the probability of that event
0, No they are highly dependent (mutually exclusive)
What is the probability of getting heads and tails at the same coin flip? Are those events independent?
0.5 * 0.5 = 0.25, yes
What is the probability of getting heads and tails at two different coin flips in this order? P(A ∩ B) = P(A)P(B) Are those events independent?
1/(50!)
What is the probability that the numbers 11, 4, 17, 39, and 23 are drawn in that order from a bin with 50 balls labeled with the numbers 1-50 if the ball selected is not returned to the bin? Sampling without replacement
1/(50^5)
What is the probability that the numbers 11, 4, 17, 39, and 23 are drawn in that order from a bin with 50 balls labeled with the numbers 1-50 if the ball selected is returned to the bin before the next ball is selected? Sampling with replacement
Random Experiment
When an experiment has more than one possible outcomes
Deterministic Experiment
When an experiment has only one possible outcome
Python
Widely used for simulation programming Multi-paradigm, general purpose, high level programming language with an active community, great support, and many relevant languages
Population Variance
𝜎^2 = ∑ ((x_i - 𝜇)^2) / n
Variance
𝜎^2 of a discrete random variable X is a measure of the spread of a distribution and is calculated as 𝜎^2 = V(X) = ∑ ((x_i - 𝜇)^2 * p(x_i)) Sum of the squares minus the probability weight
(n + r - 1)! / (r!(n-1)!)
r combinations Repetition is allowed Order does not matter Selecting r from n
n^r
r permutations Repetition is allowed Order matters Selecting r from n
n! / (r!(n-r)!)
r-combinations Repetition is not allowed Order does not matter Selecting r from n
n! / (n-r)!
r-permutations Repetition is not allowed Order matters Selecting r from n
Sample Mean
x bar = ∑ xi / n Makes a good estimator of a population mean, as its expected value is equal to the population mean (unbiased estimator)
P(A U B)
x is apart of A OR x is apart of B A happens or B happens
Population Mean
𝜇 = ∑ xi / n
Population Standard Deviation
𝜎 = sqrt(∑ ((x_i - 𝜇)^2) / n) Correct using Bessel's correction, use it for large enough samples when you want to estimate the population variance 𝜎 = sqrt(∑ ((x_i - 𝜇)^2) / n-1)
Sample Standard Deviation
𝜎 = sqrt(𝜎^2 = ∑ ((x_i - x bar)^2) / n)
