CSCI 154 Midterm 1

Ace your homework & exams now with Quizwiz!

Trapezoid Rule

A definite integral can be approximated by a trapezoid Integral from a to b = (b - a) * (f(a) + f(b)) / 2) Rough approximation but increases as the number of intervals N increases, so does the result In the limit (infinite number of trapezoids) we would approximate the integral with absolute accurracy

Random variables

A function that assigns a real number to each outcome in the sample space of a random experiment A variable whose values depends on the outcome of a random phenomenon (experiment) Must be numerical Usually, capital letters are used to denote Ex: Define X by assigning the value 1 or 0 to each if when flipping a coming we get heads or tails, respectively

Probability Space

A mathematical construct that provides a formal model of a random experiment Is a triple (𝛀, F, P)

Convolution

A mathematical operation on two function (f and g) to produce a third function that expresses how the shape of one is modified by the other Written as f*g (f * g) defines as ∫f(𝛕)g(t-𝛕) d𝛕 from negative infinity to infinity

Mean of continuous random variable

A measure of the center of the distribution and considers a weighted average of the possible values of the random variable, with the pdf providing weights 𝜇_X = E(X) = ∫x*f_X(x) dx, integrate from negative infinity to infinity Where a pivot is placed so that the PDF balances

Random variate

A particular outcome of a random variable: the random variates which are other outcomes of the same random variable might have different values

Statistic

A quantitative characteristic of a sample that often helps estimate or test the population parameter (such as the sample mean or sample variance)

Parameter

A quantitative characteristic of the population that you are interested in estimating or testing (such as the population mean or variance)

Process

A series of events that achieve a particular result, or a series of changes that happen naturally Always exists in the context of systems

System

A set of things working together as parts of a mechanism or an interconnecting network

Pseudorandom Number Generator

Algorithms that produce pseudo-variate More formally: An algorithm that generates a sequence of numbers that approximate the properties of a sequence of random numbers Numbers generated are not truly random and are determined by an initial value called the seed Good statistical properties are a central requirement for the output

Error Function

Also called the Gauss error function No closed from solution exist; It cannot be evaluated in a finite number of operations (informal definition) erf(x) = (1/√𝜋) ∫e^(-t^2) dt from -x to x = 2/√𝜋 ∫e^(-t^2) dt from 0 to x We can approximate a solution with numerical methods Probability of a random variable to fall in the range of -x to x

Normal Distribution

Also known as Gaussian Distribution Characterized by a bell-shape probability distribution function A large number of random variables observed in nature possess a frequency distribution that is approximately mound shaped and can be modeled by this pdf is bell shaped Characteristics: - f(x) = 1 / (√(2𝛑𝛔^2)) * e^(-(x-𝜇)^2 / 2𝜎^2) - F(x) = 1/2[1 + erf(x-𝜇 / 𝜎√2)]

Experiment

Also known as a trial A procedure that can be infinitely repeated and has a well-defined set of possible outcomes There are two types

Simulation

An approximative imitation of the operation of a system or a process

Probability Mass Function (PMF)

Assigns the probability that a discrete random variable is exactly equal to some value Typically depicted as a table, plot, or equation The notion p(x) is typically used for the PMF of a random discrete variable x Basic Properties - 0 ≤ p(x) ≤ 1 (probability of any outcome is between 0 and 1) - ∑p(x_i) = 1 (sum of all probabilities of all events is 1) - p(x) = P(X = x) (probability of event is random variable taking this value) Example: The random variable W is assigned 0 if a mice is underweight, 1 if it's considered normal, and 2 if it's overweight, and 3 if obese W 0 1 2 3 p(W) = 0.02 0.27 0.33 0.38

intersection / P(E1) = 0 / P(E1) = 0

Assume that you shoot only one bullet. Identify the conditional probability P(E2 | E1) and P(E1) = 0.8. Only 1 bullet so if we shoot one target we know for the that we did not hit the other one. The intersection is 0.

0.4 / 0.8 = 0.5

Assume that you shoot only one bullet. Identify the conditional probability P(E2 | E1) given that P(E2 ∩ E1) = 0.4 and P(E1) = 0.8.

0.8*0.8 = 0.64

Assume that you shoot two bullets. Identify P(E1,t=2) | E1,t=1) given P(E1,t=1) = P(E1, t=2) = 0.8. The probability of shooting target 1 again after you already shot target 1 The probability stays the same, your shooting does not get better; each event is independent just want two things to happen

P(E1) = target 1 area / wall area

Assuming a bit shoots bullets randomly (all over the wall). Identify the probability of hitting the first target, E1 with a single bullet.

P(E1 ∪ E2) = target 1 area + target 2 area - target intersection area

Assuming a bot shoots bullets randomly (all over the wall). Identify the probability of hitting any one of the two targets, E1 ∪ E2, with a single bullet (we know P(E1 ∩ E2) = K)

Probability of event is between 0 and 1

Probability Collary 0 ≤ P(E) ≤ 1

Event Probability Axiom 3

E_i ∩ E_j = ∅(∀ i, j: i ≠ j) -> P(U E_i) = ∑ P(E_i) if the intersection of 2 events is empty (disjoint) then the probability of the Union is the sum of the probabilities of events. Ex: What is the probability of rolling a 2 or 5? Since they are disjoint you can just add the probabilities

Central Limit Theorem

Establishes that, in many cases, when independent random variables are added, their properly normalized sum tends toward a normal distribution even if the original variables themselves are not normally distributed Adding random variables that don't follow normal distribution -> sum following a normal distribution The PDF of the sum of two independent real-valued random variables equals the convolution of the PDF of the original values Ex: Rolling 1 die has uniform probability (all the same), as we more roll dice we get a more normal distribution

Mean

Expected value 𝜇 of a discrete random variable X is the sum of the possible values of X multiplied by the probability of the value Is the measure of central tendency 𝜇 = E(x) = ∑ x * p(x)

Axioms

Formalize probability in terms of a probability space which is a construct that models a random experiment

Independent

Identify if the following events are dependent or independent? Getting a 1 and getting a 5 on different dice rolls Getting an even number and getting a 2 on different dice rolls Picking a 2 of hearts and subsequently a 4 of hearts from the deck with card replacement

Dependent

Identify if the following events are dependent or independent? Getting an even number and a 2 on the same dice roll Getting and 2 and a 5 on the same dice roll Picking a 2 of hearts and subsequently a 4 of hearts from the deck without card replacement

False, the sample space is {HT, TT, TH, HH}

Identify whether the following statement is correct John defines an experiment to be flipping two coins. John flips two coins and observes the outcome HT. The sample space is {HT, TT, HH}

False, this is an outcome

Identify whether the following statement is correct Maris rolls a 6-sided die and records the number of dots facing up. 4 dots are facing up. The 4 dots facing up are an experiment.

P(A | B1)P(B1) + P(A | B2)P(B2) + P(A | B3)P(B3)

If B1, B2, B3 partition 𝛀 then: P(A) = P(A ∩ B1) + P(A ∩ B2) + P(A ∩ B3) =

Laplace's Definition

If S is a finite sample space of equally likely outcomes, and E is an event, that is, a subset of S, then the probability of E is p(E) = |E| / |S|. Probability problem on finite sample spaces of equally likely outcomes can be generally tackled using counting techniques. Ex: What is the probability of rolling an even number when rolling a die? 3/6 = 0.5

Dependent Events

If the occurrence of one event affects the probability of occurrence of the other (they are not independent) P(E1 | E2) = P(E1 ∩ E2) / P(E2), Conditional probability cares about the past P(E1 ∩ E2) = P(E1) + P(E2) - P(E1 ∪ E2), Addition Rule to calculate intersection P(E1 ∪ E2) = P(E1) + P(E2) - P(E1 ∩ E2), Addition rule to calculate union

Independent events

If the occurrence of one event does not affect the probability of occurrence of the other P(E1 ∩ E2) = P(E1)P(E2), the probability of the intersection of the two events equals the two probabilities of the events multiplied P(E1 | E2) = P(E1), The probability of Event 1 given Event 2 has occurred is just the probability of Event 1 P(E1 ∪ E2) = P(E1) + P(E2) - P(E1)P(E2), Addition Rule

Numerical Integration

In numerical analysis, comprises a broad family algorithms for calculating the numerical value of a definite integral In 1D integrals, definite integral corresponds to estimating the respective area below the line For high dimensional integrals Monte Carlo methods are more suitable

Confidence Interval

In range of values for unknown parameter and has an associated confidence level that give probability with which it will occur

Standard Deviation

Is the square root of the variance 𝜎 = √(𝜎^2)

P(A intersection B)

Join probability Both event happening at the same time x is apart of A AND x is apart of B

0(0.2) + 100(0.7) + 150(0.1) = 0 + 70 + 15 = 85

Kelly earns money testing websites at $10 per site. X represents Kelly's weekly earnings. She estimates the probability of testing 0 sites in a week in 20%, 10 sites is 70%, and 15 sites in 10%. What is the mean of X?

sqrt(2025) = 45

Kelly earns money testing websites at $10 per site. X represents Kelly's weekly earnings. She estimates the probability of testing 0 sites in a week in 20%, 10 sites is 70%, and 15 sites in 10%. What is the standard deviation for X? 𝜇 = 85 𝜎^2 = (0-85)^2 * 0.2 + (100 - 85)^2 * 0.7 + (150 - 85)^2 * 0.1 = 7225(0.2) + 225(0.7) + 4225(0.1) = 2025

Drawbacks of simulation experiments

Less accurate Less cost-efficient Slower Can be an overkill

Variance of continuous random variable

Measure of the spread of distribution and is calculated as var(X) = 𝜎_x ^2 = E[(X - 𝜇_X )^2 = ∫ (x - 𝜇_X )^2 f_X (x) dx std is the same A smaller this and standard deviation correspond to a distribution with values closer to the mean

Explicit what-if experiments

Most straightforward type of simulation The question is explicitly stated and considers hypothetical scenarios A series of these can also be used to find the best solution for a particular problem (optimization) Ex: What would happen if the sun explodes? What would happen if a particular virus emerges? Simulate several antiviral drugs to find the most efficient one.

Chain Rule

Order of intersection doesn't matter Given two events E1 and E2, P(E1 ∩ E2) = P(E2 ∩ E1) = P(E2 | E1)P(E1) = P(E1 | E2)P(E2) Rearranged conditional probability to solve for intersection

Probability of the Union of Partition

Probability Collary P(U𝛱) = 1 The union of the probabilities of the disjoint parts = 1

Probability of the empty set

Probability Collary P(∅) = 0

Addition Rule

P(A ∪ B) = P(A) + P(B) - P(A ∩ B) Probability of A plus the probability of B minus the probability of the intersection of Events A and B Use when we don't want the probability of both events

Probability of an event that won't occur

P(A') = 1 - P(A)

Set of events

Part of the probability space triple as F Each event is a set containing zero or more outcomes Power set of omega Ex: Rolling a die, F = {∅, {1}, {2},...,{1,...,6}}

Function

Part of the probability space triple as P The function from events to probabilities P(event) = probability Ex: Rolling a die, P: F -> [0,1] (real number between 0 and 1)

Sample space

Part of the probability space triple as omega Set of all possible outcomes Ex: Rolling a dice, 𝛀 = {1, 2, 3, 4, 5, 6}

Conditional Probability

Probability of something happening given something else happened The probability of E2 given E1 is defined by P(E1 | E2) = P(E1 ∩ E2) / P(E1) Captures the idea of event E2 given that event E1 already occurred In some way E1 becomes the new sample space 𝛀 of E1 ∩ E2

Monte Carlo Integration

Random points are thrown and the ratio of points that fall in the corresponding area reveal the value of the integral # in s / (# in f(x) + # in s) * c^2 s is the area under the curve, c^2 is the area of the entire space from a to b, f(x) is the area above the line Best with uniform distributions that do not favor particular sub areas Best suited for multidimensional integrals

Discrete random variable

Random variable where its possible value comes from a countable set Ex: Whole numbers, integers

Continuous random variable

Random variable where its possible value comes from an uncountable set Ex: Real numbers, decimals Every distinct x-value has a zero width and the probability for a single x-value is zero P(X = x) = 0 Find probabilities for intervals rather than specific values, referred to as probabilities of intervals

Benefits of simulation experiments

Safer Legal More ethical More cost-efficient Faster Easier to tune Easier to communicate Easier to concentrate focus (can reveal information that is hidden in the complexity of the real world)

The law of large numbers

States that the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer to the expected value as more trials are performed Derives directly from the definition of the mean Ex: While a casino may loose money in a single spin of the roulette wheel, its earnings will tend towards a predictable percentage over a large number of spins

Sample

Subset of a population

Objectives of science

1) Describe the world 2) Explain the world 3) Predict the world 4) Change the world All above rely on the scientific method

0.05

Bea's Herbs and Teas offers five teas at a tea tasting. X is the number of teas a customer purchases after the tasting. Based on the information from previous tasting, the PMF of X is shown below. X 0 1 2 3 4 5 p(x) 0.05 0.1 0.2 0.15 0.2 0.3 What is F(0)?

0.2 + 0.3 = 0.5

Bea's Herbs and Teas offers five teas at a tea tasting. X is the number of teas a customer purchases after the tasting. Based on the information from previous tasting, the PMF of X is shown below. X 0 1 2 3 4 5 p(x) 0.05 0.1 0.2 0.15 0.2 0.3 What is the probability that a customer purchases more than 3 teas?

0.05 + 0.1 + 0.2 + 0.15 = 0.5

Bea's Herbs and Teas offers five teas at a tea tasting. X is the number of teas a customer purchases after the tasting. Based on the information from previous tasting, the PMF of X is shown below. X 0 1 2 3 4 5 p(x) 0.05 0.1 0.2 0.15 0.2 0.3 What is the probability that a customer purchases no more than 3 teas?

0.05 + 0.1 + 0.2 + 0.15 + 0.2 + 0.3 = 1

Bea's Herbs and Teas offers five teas at a tea tasting. X is the number of teas a customer purchases after the tasting. Based on the information from previous tasting, the PMF of X is shown below. X 0 1 2 3 4 5 p(x) 0.05 0.1 0.2 0.15 0.2 0.3 What is the value of F(6.1)?

Probability Density Function (PDF)

Describes the relative likelihood of all values for a continuous random variable Notation f(x) is typically used for the pdf Basic Properties: - f(x) ≥ 0 for all x, relative likelihood ≥ 0 - ∫f(x) dx = 1, The integration of the whole domain from negative infinity to positive infinity is 1 - P(X = x) = 0, the probability of an exact thing happening is 0 - P(b ≤ X < a) = ∫f(x) dx from a to b, integrate density function to get probability between a and b interval No negative values Graphical representation is the most descriptive Area under the curve provides the probabilities Derivative of CDF f_X(x) = d/dx F_X(x)

Uses of simulation

Can enhances every aspect of human activity Can support decision making, training, entertainment, and even obedience Perfect environment to train humans Used to train agents in risk-free environments Used to train collectives of agents Used to train collectives of both humans and agents (human agent collectives)

simulation programming

Can often be difficult; difficult to code and debug Many simulation languages and/or simulation coding paradigms have been proposed over time Trend today is to use, develop and/or refine general purpose languages and simulation libraries, instead of inventing specific new languages

Cumulative Distributed Function (CDF) of discrete random variable

Captures the probability that for any number x, the observed value of the random variable will be at most x Notation F(x) typically used for the CDF of a random variable x F(x) = p(X ≤ x) = ∑ p(X = x_i) = ∑p(x_i) Sum of all probabilities from the beginning to the event we care about Defined on the real number line Non-decreasing function of X (either increases or stays constant) Example: W 0 1 2 3 p(W) = 0.02 0.27 0.33 0.38 F(w) = 0, w < 0 0.02, 0 ≤ w ≤ 1 0.29, 1 ≤ w < 2 0.62, 2 ≤ w < 3 1, 3 ≤ w

The set including the 6x6 = 36 tuples

Consider the random experiment of rolling two dice. Identify the sample space.

Uniform Distribution

Describes an experiment where there is an arbitrary outcome that lies between certain outcomes Has the following characteristics: a and b are the parameters, independent variables - f(x) = 1/(b-a) for a ≤ x ≤ b - f(x) = 0 for x < a or x < b - F(x) = 0 for x < a - F(x) = (x/(b-a)) - (a/(b-a)) for a ≤ x ≤ b - F(x) = 1 for x > b - E(X) = 1/2(a+b) - V(X) = 1/12(b-a)^2 - Slope is 1/b-a

Bayes' Theorem

Describes the probability of an event, based on prior knowledge of conditions that might be related to the event and is defined as P(A | B) = P(B | A)P(A)/P(B) and (P(B | A)P(A)) / (P(B | A)P(A) + P(B | not(A))P(not(A))) Derived from chain rule For independent events P(A | B) = P(A)

Probability of a subset is less than the probability of the set

E_1 ⊆ E_2 -> P(E_1) ≤ P(E_2)

Sample Variance

𝜎^2 = ∑ ((x_i - x bar)^2) / n Biased estimator Underestimates the variance by a factor of (n-1) / n Correct using Bessel's correction, use it for large enough samples when you want to estimate the population variance 𝜎^2 = ∑ ((x_i - x bar)^2) / n-1

Probability Theory

The branch of mathematics concerned with probability Treats the concept in a rigorous mathematical manner by expressing it through a set of axioms

Population

The entire group of individuals you want to study

Probability

The measure of the likelihood than an event will occur; the higher the probability of an event, the more likely it is that the event will occur Helps model uncertainty

Event Probability Axiom 1

The probability of event E needs to satisfy Kolmogrov's axioms: P(E) ≥ 0 ∧ P(E) ∈ R Probability of event is positive and a real number

Event Probability Axiom 2

The probability of event E needs to satisfy Kolmogrov's axioms: P(𝛀) = 1 Probability space is 1

Cumulative Distributed Function (CDF) of continuous random variable

The probability that for any number x, the observed value of the random variable will be at most x (the difference between < and ≤ is irrelevant here) Notation F(x) typically used Basic Properties: - Always starts at 0 and ends at 1 and never decreases as the value of X increases - May only approach the limits of 0 and 1 if the possible values of x are infinite Integral of PDF F_X(x) = ∫f_X(t) dt

Total Probability

The proposition that if {B_n : n = 1, 2, 3,...} (countable partition) is a finite or a countable infinite partition of a sample space (in other words a set of pairwise disjoint events whose union is the entire sample space) and each event B_n is measurable then for any event A of the same sample space P(A) = ∑P(A ∩ B_n) Alternatively, P(A) = ∑P(A | B_n)P(B_n) where for any n for which P(B_n) = 0 these termes are simply omitted from the summation, because P(A | B_n) is finite The summation can be interpreted as a weighted average, and consequently, the marginal probability, P(A) is sometimes called average probability

(99/100)(6/10) + (95/100)(4/10) = (594 + 380) / 1000 = 974/1000

Total Probability Suppose that two factories supply light bulbs to the market. Factory X's bulbs work for over 5000 hours in 99% of cases, whereas Factory Y's bulbs work for over 5000 hours in 95% of cases. It is known that Factory X supplies 60% of the total bulbs available and Y supplies 40% of the total bulbs available. What are the chances that a purchased bulb will work for longer than 5000 hours? P(B_x) = 6/10 is the probability that the purchased bulb was manufactured by Factory X P(B_y) = 4/10 is the probability that the purchased bulb was manufactured by Factory Y P(A | B_x) = 99/100 is the probability that a bulb manufactured by X will work for over 5000 hours P(A | B_y) = 95/100 is the probability that a bulb manufactured by Y will work for over 5000 hours P(A) = P(A | B_x)P(B_x) + P(A | B_y)P(B_y)

Mutually Exclusive Events

Two (highly) dependent events, E1 and E2, are this if their intersection is 0; P(E1 ∩ E2) = 0, meaning the events can't happen at the same time, it's either one or the other P(E1 | E2) = P(E1 ∩ E2) / P(E2) = 0, Conditional Probability P(E1 ∪ E2) = P(E1) + P(E2), Addition Rule to find Union

Probability Tree Diagrams

Used to represent a probability space May represent a series of independent events or conditional probabilities Each node on the diagram represents an event and is associated with the probability of that event

0, No they are highly dependent (mutually exclusive)

What is the probability of getting heads and tails at the same coin flip? Are those events independent?

0.5 * 0.5 = 0.25, yes

What is the probability of getting heads and tails at two different coin flips in this order? P(A ∩ B) = P(A)P(B) Are those events independent?

1/(50!)

What is the probability that the numbers 11, 4, 17, 39, and 23 are drawn in that order from a bin with 50 balls labeled with the numbers 1-50 if the ball selected is not returned to the bin? Sampling without replacement

1/(50^5)

What is the probability that the numbers 11, 4, 17, 39, and 23 are drawn in that order from a bin with 50 balls labeled with the numbers 1-50 if the ball selected is returned to the bin before the next ball is selected? Sampling with replacement

Random Experiment

When an experiment has more than one possible outcomes

Deterministic Experiment

When an experiment has only one possible outcome

Python

Widely used for simulation programming Multi-paradigm, general purpose, high level programming language with an active community, great support, and many relevant languages

Population Variance

𝜎^2 = ∑ ((x_i - 𝜇)^2) / n

Variance

𝜎^2 of a discrete random variable X is a measure of the spread of a distribution and is calculated as 𝜎^2 = V(X) = ∑ ((x_i - 𝜇)^2 * p(x_i)) Sum of the squares minus the probability weight

(n + r - 1)! / (r!(n-1)!)

r combinations Repetition is allowed Order does not matter Selecting r from n

n^r

r permutations Repetition is allowed Order matters Selecting r from n

n! / (r!(n-r)!)

r-combinations Repetition is not allowed Order does not matter Selecting r from n

n! / (n-r)!

r-permutations Repetition is not allowed Order matters Selecting r from n

Sample Mean

x bar = ∑ xi / n Makes a good estimator of a population mean, as its expected value is equal to the population mean (unbiased estimator)

P(A U B)

x is apart of A OR x is apart of B A happens or B happens

Population Mean

𝜇 = ∑ xi / n

Population Standard Deviation

𝜎 = sqrt(∑ ((x_i - 𝜇)^2) / n) Correct using Bessel's correction, use it for large enough samples when you want to estimate the population variance 𝜎 = sqrt(∑ ((x_i - 𝜇)^2) / n-1)

Sample Standard Deviation

𝜎 = sqrt(𝜎^2 = ∑ ((x_i - x bar)^2) / n)


Related study sets

Understanding Operating System CH 6

View Set

APEX health 5.2.1 public policy and your safety

View Set

Lesson 19- Writing and Evaluating Expressions/Order of Operations

View Set

Maternal Newborn 3.)- Preeclampsia

View Set

Psychology and the Law Final Exam

View Set