Chapter 11: Estimation Theory and Statistics
Hyperparameters
A hyperparameter is a parameter from a prior distribution; it captures the prior belief before data is observed. In any machine learning algorithm, these parameters need to be initialized before training a model.
Parameters
A model parameter is a configuration variable that is internal to the model.
Psychometric Function
A psychometric function is an inferential model applied in detection and discrimination tasks. It models the relationship between a given feature of a physical stimulus, e.g. velocity, duration, brightness, weight etc., and forced-choice responses of a human or animal test subject.
Random Variables
A random variable is a variable whose values depend on outcomes of a random phenomenon.
Estimator
An estimator attempts to approximate the unknown parameters using the measurements.
Estimation Theory
Estimate an unknown parameter from the data (set of i.i.d. observations) of a set of random variables.
Neural Decoding based on tuning curve model
Fischer information reaches a minimum at the peak of the tuning curve. Thus the intuition that the neuron is most informative about its preferred stimulus is wrong. The correct intuition is that the neuron is most informative about stimuli falling onto a region of the tuning curve where the the slope is steep.
Fischer Information
Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.
Maximum A Posteriori (MAP) Estimation
In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to the method of maximum likelihood (MLE) estimation, but employs an augmented optimization objective which incorporates a prior distribution over the quantity one wants to estimate. MAP estimation can therefore be seen as a regularization of MLE estimation.
Bayesian Estimation
In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function (i.e., the posterior expected loss). Equivalently, it maximizes the posterior expectation of a utility function.
Bayesian Inference or Least Squares Error (LS) Estimation
Instead of directly maximizing the posterior probability density function, we choose the estimator with least posterior expected error. The Least Squares estimator equals the conditional mean of the parameter.
Off Axis Pointing and Optimal Localization by Echolocating bats
Is centering a stimulus in the field of view an optimal strategy to localize and track it? Fruit bats trained to localize a target in complete darkness did not center the sonar beam on the target but instead pointed it off axis, accurately directing the maximum slope ("edge") of the beam onto the target. Information-theoretic calculations showed that using the maximum slope is optimal for localizing the target, at the cost of detection. The tradeoff between detection (optimized at stimulus peak) and localization (optimized at maximum slope) is fundamental to spatial localization and tracking accomplished through hearing, olfaction, and vision.
Statistical Inference
Statistical inference is the process of using data analysis to deduce properties of an underlying distribution of probability.
Cramer-Rao Lower Bound
The Cramér-Rao bound expresses a lower bound on the variance of unbiased estimators of a deterministic parameter, stating that the variance of any such estimator is at least as high as the inverse of the Fisher information.
Fischer Information Utility
The Fisher information tells us how good our unbiased estimator is (in terms of MSE) in theory given our model assumption. We do not need to consider any data for drawing this conclusion, i.e., the conclusion is valid in case the data is exactly perfectly modeled by the chosen likelihood function. In case the model assumption is incorrect, the estimate will likely be even worse.
James-Stein estimator
The James-Stein estimator is a biased estimator of the mean of Gaussian random vectors. It can be shown that the James-Stein estimator always achieves lower mean squared error (MSE) than the maximum likelihood estimator.
Expectation of a Random Variable
The expected value (or mean) of X, where X is a discrete random variable, is a weighted average of the possible values that X can take, each value being weighted according to the probability of that event occurring.
Biased Estimator
The mean of its sampling distribution is not equal to the true value of the parameter being estimated.
Unbiased Estimator
The mean of the sampling distribution equals the true value of the parameter being estimated
Bias-Variance decomposition
The mean-squared error of an estimator can be decomposed into a bias term and a variance term.
Maximum Likelihood Estimator
The point in the parameter space that maximizes the likelihood function, that the process described by the model produced the data that was actually observed. MLE is asymptotically unbiased, but not at first sample.
Bayesian Decision Theory
When the parameter takes on discrete values.