7.1. Brief reminder of theoretical aspects

This section presents different calibration methods that are provided to help achieve an accurate estimation of the parameters of a model with respect to data (either from experiment or from simulation). The methods implemented in Uranie are ranging from point estimation to more advanced Bayesian techniques and they mainly differ in the hypotheses they rely on.

In general, a calibration procedure requires an input dataset meaning an existing set of elements (either resulting from simulations or experiments). This ensemble (of size \(n\)) can be written as

\[\mathcal{D} = \{ (\mathbf{x}^{i},\mathbf{y}^{i}), i=1,\ldots,n\}\]

where \(\mathbf{x}^{i}\) is the i-th input vector which can be written as \(\mathbf{x}^{i}=(x^{i}_1\,\ldots\,x^{i}_{n_X})\) while \(\mathbf{y}^{i}\) is the i-th output vector which can be written as \(\mathbf{y}^{i}=(y^{i}_1\,\ldots\,y^{i}_{n_Y})\).

These data will be compared with model predictions, where the model is a mathematical function \(\mathbf{f}_\theta : \mathbb{R}^{n_X} \rightarrow \mathbb{R}^{n_Y}\). From now on and unless otherwise specified the dimension of the output is set to 1 (\(n_Y=1\)) which means that the reference observations and the predictions of the model are scalars (the observations will then be written \(y\) and the predictions of the model \(f_\theta(\mathbf{x})\)).

In addition to the already introduced previously input vector, the model also depends on a parameter vector \(\theta \in \Theta \subset \mathbb{R}^p\) which is constant but unknown. The model is deterministic, meaning that \(f_\theta(\mathbf{x})\) is constant once both \(\mathbf{x}\) and \(\theta\) are fixed. In the rest of this documentation, a given set of parameter values \(\theta\) is called a configuration.

The standard hypothesis for probabilistic calibration is that the observations differ from the predictions of the model by a certain amount which is supposed to be a random variable as

(7.1)\[\varepsilon = y - f_\theta(\mathbf{x})\]

where \(\varepsilon\) is a random variable whose expectation is equal to 0 and which is called residuals. This variable represents the deviation between the model prediction and the observation under investigation. It might arise from two possible origins which are not mutually exclusive:

  • experimental: affecting the observations. For a given observation, it could be written \(\varepsilon_{\rm obs} = y_{\rm real} - y\)

  • modelling: the chosen model \(f_\theta\) is intrinsically not correct. This contribution could be written \(\varepsilon_{\rm model} = f^*_\theta - f_\theta\)

As the ultimate goal is to have \(y_{\rm real} - f^*_\theta = 0\), injecting back the two contributions discussed above, this translates back to Equation 7.1, only breaking down:

\[y - f_\theta = \varepsilon_{\rm obs} + \varepsilon_{\rm model}\]

The rest of this section introduces two important discussions that will be referenced throughout this module:

The former is simply the way to obtain statistics over the \(n\) samples of the reference observations when comparing them to a set of parameters and how these statistics are computed when the \(n_Y \neq 1\). The latter is a general introduction, partly reminding elements already introduced in other sections and discussing some assumptions and theoretical foundations needed to understand the methods discussed later on.

On top of this description, there are several predefined calibration procedures proposed in the Uranie platform: