Numerical Methods
Union operator
E U F is all the outcomes in both E or F or both
marginalization theorem (sum rule)
The probability of something happening is the same as the sum of that thing happening within each smaller group.
Probability Mass Function (PMF)
a mathematical relation that assigns probabilities to all possible outcomes for a discrete random variables
quadratic exponential pdf
all are gaussian
lower Cholesky factorization
allows you to transform to standard gaussian space
max entropy
equilibrium (bell curve)
total probability theorem
if all the different b's are mutually exclusive and exhaustive, the probability of a is equal to the sum of the probabilities of each b * prob of a given b
Aleatoric Uncertainty
inherent variation that cannot be eliminated
independance
one provides no information about the other, or when the probability one on thing happening given the other thing is the same as the probability of the first thing happening overall
system analysis
predict how the system output behaves using a system model
Monte Carlo Simulation
simulate independent, identically distributed samples evaluate h(x) approximate H as the average of the samples, which is the same as the true distribution because of the central limit theorem
The distance β
Hasofer-Lind reliability index', or 'reliability index', in short in the structural reliability literature.
information theory
The theory that the information provided by a particular event is inversely related to the probability of its occurrence.
Model Uncertainty
Uncertainty concerning whether a selected model is correct
likelihood function
a number that summarizes how well a model's predictions fit the observed data
transformations for scalar quantities
can always be done by equating cdf
predictive analysis
forecasting (forward uncertainty quantification problem) using a mutli-dimensional integral. analytical analysis is typically impractical, while numerical is inefficient for n>3
entropy
measure of the amount of uncertainty about which one of the values of the variable is the actual one, also the average amount of transmitted info when transmitting the random variable
relative entropy (Kullback-Leibler divergence)
measure of the difference between two difference distributions
discrete to continuous
probability mass function (PMF) to probability density function (PDF), turn sums to integrals
Bayes Theroem
probability of a given b = probability of b given a * prob of a / prob of b in words, bayes theorem allows us to use our knowledge of the prior and to calculate a posterior (predictive) model
expectation of error
should be 0 (in addition) or 1 (in multiplicative) in order to get unbiased predictions. e is usually normally distributed
failure integral MC and IS
simulation of "rare events" requires a relatively large number of samples to get rare events
physics vs statistics based models
statistical model is based on real-world output (no need to introduce model discrepancy error to relate model to y) statistical model is still parametric
event
subset of the sample space
Complement Ec
the outcome in the sample space that is not E
Intersection operator
the outlines that are part of both groups. If intersection is 0, the groups are mutually exclusive
posterior probability
the probability that a hypothesis is true after consideration of the evidence
sample space
the set of all possible outcomes
epistemic uncertainty
uncertainty from lack of knowledge
Experimental Uncertainty
uncertainty of incomplete data or limited accuracy
Parameter Uncertainty
uncertainty of which values of the model parameters will give the best representation of the real system and environment
normalized IS
used when we don't have enough info about p(x) (cannot weight previous samples of q(x) MC estimate is now a quotient of two MC estimates normalized IS estimate is biased in order of 1/K (asymptotically biased) usually better than standard IS
why does IS work
variance of estimate changes optimal IS is the one that minimizes E(h(x)p(x)/q(x))^2)
engineering applications
we want the expectation of y (equal to the expectation of z if the error is unbiased) and the probability that something happens
defensive importance sampling
weighting the proposal density slightly to ensure robustness of the IS density, and the ratio of the integrand to the proposal density is bounded
uncorrelated
when covariance is zero. If X and Y are independent, then they are uncorrelated. However, uncorrelated variables are not necessarily independent
additive error
y = z + e
multiplicative error
y = z*e or additive error on the logarithm same idea as the additive error, leading us to the same equation just based on the log
covariance
A measure of linear association between two variables. Positive values indicate a positive relationship; negative values indicate a negative relationship
stratified sampling
A type of probability sampling in which the population is divided into groups with a common attribute and a random sample is chosen within each group to reduce overall variance. Not always clear how the groups should be separated
ANOVA
Analysis of variance. A statistical procedure examining variations between two or more sets of interval or ratio data.
Taylor series expansion of log of integrand
Approach relies on fitting a Gaussian distribution at MPP or, equivalently, truncating the Taylor series expansion of the log of the integrand can approximate the integral simply be identifying the MPP and evaluating the Hessian of the log of the integrand and the integrand at that point! Unfortunately we do not know how accurate approximation is whether the formula can give a good approximation to the actual value of H depends on how close the integrand is to a Gaussian type function. If multiple MPP exist (multiple maxima), then sum across contributions from each of them.
Quality of taylor expansion approximation
Are all MPP interior points and is the integrand sufficiently smooth around them Does the integrand look like a Gaussian type of function in the neighborhood of the MPP Can we calculate the Hessian matrix accurately at the MPP Can we calculate all MPPs, or at least all important ones? If multiple MPP exist is there a significant overlap between them? If there is then we have overestimation of the integral Preferable to transform integral to standard Gaussian space
probabilistic integrals
Because of the fast decay of most pdf's (such as exponential family) and the smooth characteristics of the response of engineering systems, the "mass" of the integrand is concentrated in narrow regions near the maxima of the integrand. This feature can be exploited for greater efficiency local maxima of the integrand (if k(x) is not strictly positive, then we would focus on all local extrema) are called design points or most probable points (MPP) x* and play critical role in estimation of the probabilistic integral. They can be identified by optimization
Laplace asymptotic expansion
Contrary to Taylor series expansion (Gaussian fit to integrand) Laplace approximation is an asymptotic result. the asymptotic relationship in holds because as λ increases, the term will be very peaked around x*, leaving other regions unimportant in accounting for the value of the integral, and making only the value of f(x) at x* relevant
X and Y are orthogonal if
E[XY] = 0
SORM
Second order reliability method SORM extends FORM by establishing a quadratic approximation of the real failure surface around the MPP
point-estimate
Single value that serves as an estimate of a population parameter offer good approximation if integrand is very peaked, but there is no estimation of accuracy (cannot estimate how close the estimation is to true value) used for failure probability ("rare events") simply need to locate MPP (peak of integrand).
monte carlo
a class of computational methods that are sampling based, or rely on repeated sampling to get the result. Used when it is inefficient or infeasible to compute with deterministic algorithm. Efficiency of these systems based on the number of samples required to get accurate and useful results
likelihood function
a number that summarizes how well a model's predictions fit the observed data without other knowledge, max entropy approach is that we assume no correlation between measurements
measurement noise
additional error/noise from the measuring the data can deal with this error separately similarly to the model prediction error, treat as uncertain and use the max entropy principle otherwise, just put this with the prediction error, treating the y as the system measured output and not the true output as long as the measurement error is low, though, the predictions for y are pretty much the same though
distribution estimator
after repeating MCS multiple times and getting a bunch of different means, the means will be normally distributed
Global sensitivity analysis
aims to identify the importance of each component of vector x (or groups of them) towards the overall probabilistic performance Which uncertain parameters are responsible for the variability in QoI h(x) I am observing? If I treat some components as deterministic instead of probabilistic, how much should I expect the results to change?
variance reduction
analytical calculation of integral space filling (LHS) MC, works when covariance is positive stratified sampling I control variate
cross entropy
average amount of transmitted info when transmitting the random variable with distribution p using codebook q
how to choose variance of error?
based on observations of the system in absence of data, can choose prediction error variance explicitly based on the second-order stats of the model (second moment or variance of z, or model). Second moment or variance. Variance only addresses error that is not modeled Second moment addresses error in modeled and unmodeled parameters Error variance can also be treated as uncertain with some probability density
conditionally independent
basically, the two groups become independent given another piece of information
Latin Hypercube Sampling (LHS)
between random sampling and quasi-random sampling. The space is partitioned into deterministic groups equal to the number of samples, and then the sample is randomly chosen from each group. Covers the space better than pure random MC sampling
cv of estimator
can get without multiple MCS values as 1/sqrt(k) of cv variance, because it is based on K (number of samples per test) not n (number of tests) Therefore, computation efficiency is independent of the dimension n, and increased only by K^(1/2) for given K, accuracy of H depends on variance
original or transformed space
can solve in original space or transformed space, usually transforms into standard Gaussian space if both are already known, just need to map one to the other
direct monte carlo
class of MC algorithms where each sample is independent from the next
Sobol' indices
correspond to the portion of the total variance that is attributed to each input or combination of inputs. This directly shows the global sensitivity with respect to these components (larger value indicates bigger importance
selection of proposal density
create a dependence on the efficiency of the algorithm on the dimension n of x. in other words, the more dimensions, the more samples are rejected can start with the uniform distribution, and use the samples from that to get a better proposal density use a spread larger than what is expected of course for robustness
how accurate is MCS?
depends on how close p(x) and h(x)p(x) differ. Basically are hoping that the important areas of p(x) are similar to the important areas of h(x)p(x). If they don't, then not accurate (if h(x) is super peaked and pdf is not, the info will not be useful)
What info about b is missing given a?
equal to the -log P(b | a)
uncertain prediction error
error between the real system and the model
total sensitivity index
evaluate the total effect of xi measures the contribution to the output variance of xi, including all variance caused by its interactions, of any order, with any other input variables Sensitivity indices reveal the global importance of each random variable, or of interactions between random variables. Higher importance reveals greater contribution from probabilistic perspective; variables having lower value for first order indexes could be potentially ignored (treated as deterministic) without any impact on the estimated probabilistic performance H;
probability logic
extension of boolean logic, where P(b|c) gives us the degree of plausibility of b based on the information in c. basically what is the probability of something given the information that is available
uncertainty quantification
field that is looking at overcoming the restrictions of computational cost
IS density selection
find all local maxima of integrand argmax h(x)p(x) construct local IS around each maxima (usually Gaussian mixture) find prominent local maximum and construct IS around that point globally Generate some samples and construct IS from sample
FORM
first order reliability method approximates the failure surface with a linear approximation that is tangential to the actual failure boundary at the MPP from Gaussian PDF can use this multiple times
Bayesian vs. Frequentist statistics
frequentist states is the idea that after running a lot of different events, a certain percentage is one things. Bayesian is a measure of uncertainty. Rules of probability are the same in both approaches. Frequentist approach does not explicitly indicate that the probability depends of the the choice of the model/subset
importance sampling
generate samples in area of interest to reduce variance of MC uses a proposal density use when drawing samples from proposal density is not easy high variance with high dimensions "weight" samples 1. Saves time and money as analyzing an entire species is costly and time consuming. 2. Gives a representation of the population
simulating samples
generating a sequence that when analyzed statistically has the characteristics of p(x). this sequence is called the "name of the pdf" or random samples
rejection sampling
generating samples from a target density use a simple probability distribution to envelope the target density, then sample from the simple density and reject the samples that are not part of the target density. Multiply the proposal pdf with a constant until the entire target pdf is captured estimate the target density (choose proposal pdf) and generate each independent sample best to have the constant M just large enough to envelope the target density
posterior probability distribution
get this by applying Bayes Theorem provides update for both the model parameters and the prediction error for physics-based modeling, mostly interested in updating the system model parameters to make predictions for different outputs than in the data posterior probability distribution = (likelihood)*(Prior)/Evidence evidence is normalization
system identification and estimation
goal to use data to improve the model or estimate the internal state of the system
difference between H and Pf integrals (expected mean of data verses chance that something has failed)
h(x) and therefore integrand is typically smooth (continuous, differentiable), while the integrand for Pf is not.
posterior analysis
how do we use the previous information to establish a posterior predictive analysis for the future behavior of the system
system modeling
how to choose the correct model, either a physics-based model or a statistical model
parameter identification
how we can update the uncertainty in our model parameters θ based on the new information that we have about the system behavior
resultant posterior distribution characteristics
if the prior is flat (not a lot of info about the prior) then the likelihood function will dominate as M gets larger, the likelihood is more peaked, and the uncertainty in the posterior is smaller
boolean logic
knowing one thing gives us complete information about the other thing. If we know c, then we know that b is either true or false
assumptions for modeling error
model produces unbiased predictions (does not consistently under or over predict) error is independent from the model parameter (error uncertainty is not the same as parametric uncertainty)
Quasi Monte Carlo
not random method of sampling. for example, picking numbers that are equidistance apart from each other
dependence on the prior
one of the critiques of Bayesian inference, but prior can be neglected if there is enough data
prior
probability of something before we see the data
conditional probability
probability that something occurred given that something else already occurred
discrete random variable
random variable that can take on any value from a finite or countable infinite set X
basic random sampling
the basis of MC sampling starts with a deterministic algorithm of random numbers (usually between 0 and 1). how we get those numbers are important and where the seed starts from (basically, what random sampling method is being used)
features of the posterior distribution
the more data we have, the larger the peaks of the probability of the parameter given the data around the values of the parameters that make the predictions match the data better
principle of maximum entropy
the most appropriate probability model is the one that maximizes the entropy of x (maximizes the amount of uncertainty about which one of the values is the actual one given the info). Basically the one that incorporates the largest possible uncertainty into the system
IS is good as long as
the proposal density is appropriately chosen across all dimensions Can partitions the dimension into two sets, one that has significant impact on the integrand and the other set does not prioritize robustness when using IS
product rule
the sum of one thing happening and another thing happening is the same as the probability of one thing happening when the other thing has happened multiplied by the probability of the other thing happening
importance sampling resampling
use existing trials of q(x) and weight them to get an idea of p(x) and resample with a proposal density closer to the p(x)
control variate
used to reduce variance in MC methods using information about the errors in estimates of known quantities to reduce the error of an estimate of an unknown quantity. Basically, use a similar but less computationally intensive model that is correlated to the one that you need but easier to work with. But you need to know the relationship between the two models
when is IS efficient
when optimal density is significantly different from p(x) we can use IS to show the sensitivity of the component of x to the integral
numerical optimization
when we cannot optimize analytically, we do so numerically can numerically use gradient-based (most common and efficient) or gradient-free techniques such as the genetic algorithm
model class selection
when we have more than one candidate model descriptions for the system, which one has the highest plausibility