7.1.2.3. Introduction to Bayesian approach

The probability of an event occurring can be seen as the limit of its occurrence rate or as the quantification of a personal judgement or opinion regarding its realisation. This is a difference in interpretation that usually distinguishes the frequentist from the Bayesian perspective. For a simple illustration one can flip a coin: the probability of getting heads, denoted \(\mathbb{P}[{\rm head}]\) is either the average result of a very large number of experiments (this definition is very factual, but its value depends strongly on the number of experiments) or the personal belief that the coin is well-balanced or not (which is basically an a priori opinion that might be based on observations, or not).

Let’s call \((W,Z)\) a random vector with a joint probability density \(f_{(W,Z)}(w,z)\) and marginal densities written as \(f_{W}(w)\) and \(f_{Z}(z)\). From there, the Bayes’ rule states that:

(7.5)\[f_{W|Z}(w|z)=\frac{f_{Z|W}(z|w)\times f_{W}(w)}{f_{Z}(z)}\]

where \(f_{W|Z}(w|z)\) (respectively \(f_{Z|W}(z|w)\)) is the conditional probability density of \(W\) knowing that \(z\) has been realised (and vice-versa respectively). These laws are called conditional laws.

Getting back to our formalism introduced previously, using Equation 7.5 implies that the probability density of the random variable \(\theta\) given our observations, which is called posterior distribution, can be expressed as

(7.6)\[\pi_{post} (\theta|\mathbf{y}) = \frac{L(\mathbf{y}|\theta) \pi_{prior} (\theta)} {\pi(\mathbf{y})} \propto L(\mathbf{y}|\theta) \pi_{prior} (\theta)\]

In this equation, \(L(\mathbf{y}|\theta)\) represents the conditional probability of the observations knowing the values of \(\theta\), \(\pi_{prior}(\theta)\) is the a priori probability density of \(\theta\), often referred to as prior, \(\pi(\mathbf{y})\) is the marginal likelihood of the observations, which is constant in our scope (as it does not depend on the values of \(\theta\) but only on its prior, as \(\pi(\mathbf{y})=\int_{\Theta} L(\mathbf{y}|\theta) \pi_{prior} (\theta)d\theta\) , it consists only of a normalizing factor).

The prior law is said to be proper when one can integrate it, and improper otherwise. It is conventional to simplify the notations, by writing \(\pi(\theta|\mathbf{y})\) instead of \(\pi_{post}(\theta|\mathbf{y})\) and also \(\pi(\theta)\) instead of \(\pi_{prior}(\theta)\). The choice of the prior is a crucial step when defining the calibration procedure and it must rely on physical constraints of the problem, expert judgement and any other relevant information. If none of these are available or reliable, it is still possible to use non-informative priors, in which case calibration relies solely on the data. One can find more discussions on non-informative priors here [Bio15, Jef46].