Numerical Methods

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Union operator

E U F is all the outcomes in both E or F or both

marginalization theorem (sum rule)

The probability of something happening is the same as the sum of that thing happening within each smaller group.

Probability Mass Function (PMF)

a mathematical relation that assigns probabilities to all possible outcomes for a discrete random variables

quadratic exponential pdf

all are gaussian

lower Cholesky factorization

allows you to transform to standard gaussian space

max entropy

equilibrium (bell curve)

total probability theorem

if all the different b's are mutually exclusive and exhaustive, the probability of a is equal to the sum of the probabilities of each b * prob of a given b

Aleatoric Uncertainty

inherent variation that cannot be eliminated

independance

one provides no information about the other, or when the probability one on thing happening given the other thing is the same as the probability of the first thing happening overall

system analysis

predict how the system output behaves using a system model

Monte Carlo Simulation

simulate independent, identically distributed samples evaluate h(x) approximate H as the average of the samples, which is the same as the true distribution because of the central limit theorem

The distance β

Hasofer-Lind reliability index', or 'reliability index', in short in the structural reliability literature.

information theory

The theory that the information provided by a particular event is inversely related to the probability of its occurrence.

Model Uncertainty

Uncertainty concerning whether a selected model is correct

likelihood function

a number that summarizes how well a model's predictions fit the observed data

transformations for scalar quantities

can always be done by equating cdf

predictive analysis

forecasting (forward uncertainty quantification problem) using a mutli-dimensional integral. analytical analysis is typically impractical, while numerical is inefficient for n>3

entropy

measure of the amount of uncertainty about which one of the values of the variable is the actual one, also the average amount of transmitted info when transmitting the random variable

relative entropy (Kullback-Leibler divergence)

measure of the difference between two difference distributions

discrete to continuous

probability mass function (PMF) to probability density function (PDF), turn sums to integrals

Bayes Theroem

probability of a given b = probability of b given a * prob of a / prob of b in words, bayes theorem allows us to use our knowledge of the prior and to calculate a posterior (predictive) model

expectation of error

should be 0 (in addition) or 1 (in multiplicative) in order to get unbiased predictions. e is usually normally distributed

failure integral MC and IS

simulation of "rare events" requires a relatively large number of samples to get rare events

physics vs statistics based models

statistical model is based on real-world output (no need to introduce model discrepancy error to relate model to y) statistical model is still parametric

event

subset of the sample space

Complement Ec

the outcome in the sample space that is not E

Intersection operator

the outlines that are part of both groups. If intersection is 0, the groups are mutually exclusive

posterior probability

the probability that a hypothesis is true after consideration of the evidence

sample space

the set of all possible outcomes

epistemic uncertainty

uncertainty from lack of knowledge

Experimental Uncertainty

uncertainty of incomplete data or limited accuracy

Parameter Uncertainty

uncertainty of which values of the model parameters will give the best representation of the real system and environment

normalized IS

used when we don't have enough info about p(x) (cannot weight previous samples of q(x) MC estimate is now a quotient of two MC estimates normalized IS estimate is biased in order of 1/K (asymptotically biased) usually better than standard IS

why does IS work

variance of estimate changes optimal IS is the one that minimizes E(h(x)p(x)/q(x))^2)

engineering applications

we want the expectation of y (equal to the expectation of z if the error is unbiased) and the probability that something happens

defensive importance sampling

weighting the proposal density slightly to ensure robustness of the IS density, and the ratio of the integrand to the proposal density is bounded

uncorrelated

when covariance is zero. If X and Y are independent, then they are uncorrelated. However, uncorrelated variables are not necessarily independent

additive error

y = z + e

multiplicative error

y = z*e or additive error on the logarithm same idea as the additive error, leading us to the same equation just based on the log

covariance

A measure of linear association between two variables. Positive values indicate a positive relationship; negative values indicate a negative relationship

stratified sampling

A type of probability sampling in which the population is divided into groups with a common attribute and a random sample is chosen within each group to reduce overall variance. Not always clear how the groups should be separated

ANOVA

Analysis of variance. A statistical procedure examining variations between two or more sets of interval or ratio data.

Taylor series expansion of log of integrand

Approach relies on fitting a Gaussian distribution at MPP or, equivalently, truncating the Taylor series expansion of the log of the integrand can approximate the integral simply be identifying the MPP and evaluating the Hessian of the log of the integrand and the integrand at that point! Unfortunately we do not know how accurate approximation is whether the formula can give a good approximation to the actual value of H depends on how close the integrand is to a Gaussian type function. If multiple MPP exist (multiple maxima), then sum across contributions from each of them.

Quality of taylor expansion approximation

Are all MPP interior points and is the integrand sufficiently smooth around them Does the integrand look like a Gaussian type of function in the neighborhood of the MPP Can we calculate the Hessian matrix accurately at the MPP Can we calculate all MPPs, or at least all important ones? If multiple MPP exist is there a significant overlap between them? If there is then we have overestimation of the integral Preferable to transform integral to standard Gaussian space

probabilistic integrals

Because of the fast decay of most pdf's (such as exponential family) and the smooth characteristics of the response of engineering systems, the "mass" of the integrand is concentrated in narrow regions near the maxima of the integrand. This feature can be exploited for greater efficiency local maxima of the integrand (if k(x) is not strictly positive, then we would focus on all local extrema) are called design points or most probable points (MPP) x* and play critical role in estimation of the probabilistic integral. They can be identified by optimization

Laplace asymptotic expansion

Contrary to Taylor series expansion (Gaussian fit to integrand) Laplace approximation is an asymptotic result. the asymptotic relationship in holds because as λ increases, the term will be very peaked around x*, leaving other regions unimportant in accounting for the value of the integral, and making only the value of f(x) at x* relevant

X and Y are orthogonal if

E[XY] = 0

SORM

Second order reliability method SORM extends FORM by establishing a quadratic approximation of the real failure surface around the MPP

point-estimate

Single value that serves as an estimate of a population parameter offer good approximation if integrand is very peaked, but there is no estimation of accuracy (cannot estimate how close the estimation is to true value) used for failure probability ("rare events") simply need to locate MPP (peak of integrand).

monte carlo

a class of computational methods that are sampling based, or rely on repeated sampling to get the result. Used when it is inefficient or infeasible to compute with deterministic algorithm. Efficiency of these systems based on the number of samples required to get accurate and useful results

likelihood function

a number that summarizes how well a model's predictions fit the observed data without other knowledge, max entropy approach is that we assume no correlation between measurements

measurement noise

additional error/noise from the measuring the data can deal with this error separately similarly to the model prediction error, treat as uncertain and use the max entropy principle otherwise, just put this with the prediction error, treating the y as the system measured output and not the true output as long as the measurement error is low, though, the predictions for y are pretty much the same though

distribution estimator

after repeating MCS multiple times and getting a bunch of different means, the means will be normally distributed

Global sensitivity analysis

aims to identify the importance of each component of vector x (or groups of them) towards the overall probabilistic performance Which uncertain parameters are responsible for the variability in QoI h(x) I am observing? If I treat some components as deterministic instead of probabilistic, how much should I expect the results to change?

variance reduction

analytical calculation of integral space filling (LHS) MC, works when covariance is positive stratified sampling I control variate

cross entropy

average amount of transmitted info when transmitting the random variable with distribution p using codebook q

how to choose variance of error?

based on observations of the system in absence of data, can choose prediction error variance explicitly based on the second-order stats of the model (second moment or variance of z, or model). Second moment or variance. Variance only addresses error that is not modeled Second moment addresses error in modeled and unmodeled parameters Error variance can also be treated as uncertain with some probability density

conditionally independent

basically, the two groups become independent given another piece of information

Latin Hypercube Sampling (LHS)

between random sampling and quasi-random sampling. The space is partitioned into deterministic groups equal to the number of samples, and then the sample is randomly chosen from each group. Covers the space better than pure random MC sampling

cv of estimator

can get without multiple MCS values as 1/sqrt(k) of cv variance, because it is based on K (number of samples per test) not n (number of tests) Therefore, computation efficiency is independent of the dimension n, and increased only by K^(1/2) for given K, accuracy of H depends on variance

original or transformed space

can solve in original space or transformed space, usually transforms into standard Gaussian space if both are already known, just need to map one to the other

direct monte carlo

class of MC algorithms where each sample is independent from the next

Sobol' indices

correspond to the portion of the total variance that is attributed to each input or combination of inputs. This directly shows the global sensitivity with respect to these components (larger value indicates bigger importance

selection of proposal density

create a dependence on the efficiency of the algorithm on the dimension n of x. in other words, the more dimensions, the more samples are rejected can start with the uniform distribution, and use the samples from that to get a better proposal density use a spread larger than what is expected of course for robustness

how accurate is MCS?

depends on how close p(x) and h(x)p(x) differ. Basically are hoping that the important areas of p(x) are similar to the important areas of h(x)p(x). If they don't, then not accurate (if h(x) is super peaked and pdf is not, the info will not be useful)

What info about b is missing given a?

equal to the -log P(b | a)

uncertain prediction error

error between the real system and the model

total sensitivity index

evaluate the total effect of xi measures the contribution to the output variance of xi, including all variance caused by its interactions, of any order, with any other input variables Sensitivity indices reveal the global importance of each random variable, or of interactions between random variables. Higher importance reveals greater contribution from probabilistic perspective; variables having lower value for first order indexes could be potentially ignored (treated as deterministic) without any impact on the estimated probabilistic performance H;

probability logic

extension of boolean logic, where P(b|c) gives us the degree of plausibility of b based on the information in c. basically what is the probability of something given the information that is available

uncertainty quantification

field that is looking at overcoming the restrictions of computational cost

IS density selection

find all local maxima of integrand argmax h(x)p(x) construct local IS around each maxima (usually Gaussian mixture) find prominent local maximum and construct IS around that point globally Generate some samples and construct IS from sample

FORM

first order reliability method approximates the failure surface with a linear approximation that is tangential to the actual failure boundary at the MPP from Gaussian PDF can use this multiple times

Bayesian vs. Frequentist statistics

frequentist states is the idea that after running a lot of different events, a certain percentage is one things. Bayesian is a measure of uncertainty. Rules of probability are the same in both approaches. Frequentist approach does not explicitly indicate that the probability depends of the the choice of the model/subset

importance sampling

generate samples in area of interest to reduce variance of MC uses a proposal density use when drawing samples from proposal density is not easy high variance with high dimensions "weight" samples 1. Saves time and money as analyzing an entire species is costly and time consuming. 2. Gives a representation of the population

simulating samples

generating a sequence that when analyzed statistically has the characteristics of p(x). this sequence is called the "name of the pdf" or random samples

rejection sampling

generating samples from a target density use a simple probability distribution to envelope the target density, then sample from the simple density and reject the samples that are not part of the target density. Multiply the proposal pdf with a constant until the entire target pdf is captured estimate the target density (choose proposal pdf) and generate each independent sample best to have the constant M just large enough to envelope the target density

posterior probability distribution

get this by applying Bayes Theorem provides update for both the model parameters and the prediction error for physics-based modeling, mostly interested in updating the system model parameters to make predictions for different outputs than in the data posterior probability distribution = (likelihood)*(Prior)/Evidence evidence is normalization

system identification and estimation

goal to use data to improve the model or estimate the internal state of the system

difference between H and Pf integrals (expected mean of data verses chance that something has failed)

h(x) and therefore integrand is typically smooth (continuous, differentiable), while the integrand for Pf is not.

posterior analysis

how do we use the previous information to establish a posterior predictive analysis for the future behavior of the system

system modeling

how to choose the correct model, either a physics-based model or a statistical model

parameter identification

how we can update the uncertainty in our model parameters θ based on the new information that we have about the system behavior

resultant posterior distribution characteristics

if the prior is flat (not a lot of info about the prior) then the likelihood function will dominate as M gets larger, the likelihood is more peaked, and the uncertainty in the posterior is smaller

boolean logic

knowing one thing gives us complete information about the other thing. If we know c, then we know that b is either true or false

assumptions for modeling error

model produces unbiased predictions (does not consistently under or over predict) error is independent from the model parameter (error uncertainty is not the same as parametric uncertainty)

Quasi Monte Carlo

not random method of sampling. for example, picking numbers that are equidistance apart from each other

dependence on the prior

one of the critiques of Bayesian inference, but prior can be neglected if there is enough data

prior

probability of something before we see the data

conditional probability

probability that something occurred given that something else already occurred

discrete random variable

random variable that can take on any value from a finite or countable infinite set X

basic random sampling

the basis of MC sampling starts with a deterministic algorithm of random numbers (usually between 0 and 1). how we get those numbers are important and where the seed starts from (basically, what random sampling method is being used)

features of the posterior distribution

the more data we have, the larger the peaks of the probability of the parameter given the data around the values of the parameters that make the predictions match the data better

principle of maximum entropy

the most appropriate probability model is the one that maximizes the entropy of x (maximizes the amount of uncertainty about which one of the values is the actual one given the info). Basically the one that incorporates the largest possible uncertainty into the system

IS is good as long as

the proposal density is appropriately chosen across all dimensions Can partitions the dimension into two sets, one that has significant impact on the integrand and the other set does not prioritize robustness when using IS

product rule

the sum of one thing happening and another thing happening is the same as the probability of one thing happening when the other thing has happened multiplied by the probability of the other thing happening

importance sampling resampling

use existing trials of q(x) and weight them to get an idea of p(x) and resample with a proposal density closer to the p(x)

control variate

used to reduce variance in MC methods using information about the errors in estimates of known quantities to reduce the error of an estimate of an unknown quantity. Basically, use a similar but less computationally intensive model that is correlated to the one that you need but easier to work with. But you need to know the relationship between the two models

when is IS efficient

when optimal density is significantly different from p(x) we can use IS to show the sensitivity of the component of x to the integral

numerical optimization

when we cannot optimize analytically, we do so numerically can numerically use gradient-based (most common and efficient) or gradient-free techniques such as the genetic algorithm

model class selection

when we have more than one candidate model descriptions for the system, which one has the highest plausibility


Set pelajaran terkait

Expansion and Colombian Exchange Study Guide

View Set

AP Human Geography Unit 2- Chapter 3

View Set

CompTIA ITF+ Software Development

View Set

Comptia A+ Command Line Tools & Recovery Console/WinRE Tools

View Set

FUNDS - Medication Administration - Ch. 32

View Set

RN Pediatric Nursing Online Practice 2023 B

View Set

econ 2306 test one preview sheet

View Set