7.1.1. Distances and likelihoods used to compare observations and model predictions

There are many ways to quantify the agreement between the reference observations and the model predictions, given a parameter vector \(\theta\). Depending on the framework adopted (deterministic or Bayesian), different tools are required. In a deterministic setting, a distance is used to measure how far the prediction is from the observation. In contrast, within the Bayesian framework, the analogous concept is the likelihood, a function that evaluates the probability of observing the data given a parameter vector.

Since the number of variables \(n_Y\) used to perform the calibration can be greater than one, it might be useful to introduce the coefficients \(\lbrace \omega_j \rbrace_{j \in [1,n_Y]}\) to weight the contribution of each variable relative to the others. The following lists the distance functions available in Uranie:

  • L1 distance function (sometimes called Manhattan distance): \( \displaystyle d(\mathbf{y}, \mathbf{f_\theta}(\mathbf{x})) = \sum_{j=1}^{n_{Y}} \omega_{j} \times \bigg ( \sum_{i=1}^{n} | \mathbf{y}^j_i - \mathbf{f_\theta}(\mathbf{x})^j_i | \bigg ) \, ;\)

  • Least squares distance function: \( \displaystyle d(\mathbf{y},\mathbf{f_\theta}(\mathbf{x})) = \sum_{j=1}^{n_{Y}} \sqrt{ \omega_{j} \sum_{i=1}^{n} (\mathbf{y}^j_i - \mathbf{f_\theta} (\mathbf{x})^j_i )^2 } \, ;\)

  • Relative least squares distance function: \( \displaystyle d(\mathbf{y},\mathbf{f_\theta} (\mathbf{x})) = \sum_{j=1}^{n_{Y}} \sqrt{ \omega_{j} \sum_{i=1}^{n} \bigg (\frac{ \mathbf{y}^j_i - \mathbf{f_\theta}(\mathbf{x})^j_i}{\mathbf{y}^j_i} \bigg )^2 } \, ;\)

  • Weighted least squares distance function: \( \displaystyle d(\mathbf{y},\mathbf{f_\theta} (\mathbf{x})) = \sum_{j=1}^{n_{Y}} \sqrt{ \omega_{j} \sum_{i=1}^{n} \psi_i^j\times ( \mathbf{y}^j_i - \mathbf{f_\theta}(\mathbf{x})^j_i )^2 } \, ,\) where the coefficients \(\lbrace \psi_i^j \rbrace_{i \in [1,n]}\) are associated with the \(j\)-th variable and are used to weight each observation with respect to the others;

  • Mahalanobis distance function: \(\displaystyle d(\mathbf{y},\mathbf{f_\theta} (\mathbf{x})) = \sum_{j=1}^{n_{Y}} \sqrt{ \omega_{j} ( \mathbf{y}^j - \mathbf{f_\theta} (\mathbf{x})^j )^T \Sigma^{-1} (\mathbf{y}^j - \mathbf{f_\theta}(\mathbf{x})^j ) } \) where \(\Sigma\) is the covariance matrix of the observations.

Regarding the likelihood functions already implemented, only the Gaussian log-likelihood for independent parameters is available, as it is the most commonly used. Its expression follows:

  • Gaussian log-likelihood for independent parameters: \( \displaystyle \textit{log-}\mathcal{L} \left(\theta | \mathbf{x},\mathbf{y}\right) = -\frac{1}{2}\sum_{j=1}^{n_{Y}}\sum_{i=1}^{n}\left( \log\left( 2 \pi \left(\sigma_i^j\right)^2 \right)+\left(\frac{\mathbf{y}^j_i - \mathbf{f_\theta}(\mathbf{x})^j_i }{\sigma_i^j}\right)^2\right)\) where the coefficients \(\lbrace \sigma_i^j \rbrace_{i \in [1,n]}\) are the standard deviations of each observation associated to the \(j\)-th variable.

If needed, it is still possible to define a custom likelihood (or distance).

These definitions are not orthogonal. Indeed, if \(\lbrace \psi_i\rbrace_{i \in [1,n]}=\alpha, \alpha \in \mathbb{R}\), then the least squares function is equivalent to the weighted least squares one. This situation is realistic, as it can correspond to the case where the least squares estimation is weighted with an uncertainty affecting the observations, assuming the uncertainty is constant throughout the data (meaning \(\alpha = \sigma^{-2}\)). This is called the homoscedasticity assumption and it is important for the linear case, as discussed later on.

One can also compare the relative and weighted least squares, if \(\alpha = \mathbb{R}\) and \(\lbrace \psi_i=(\alpha\%\times y_i)^{-1}\rbrace_{i\in[i,n]}\) these two forms become equivalent (the relative least squares is useful when uncertainty on observations is multiplicative). Finally, if one assumes that the covariance matrix of the observations is the identity (meaning \(\Sigma = \mathbf{1}\)), the Mahalanobis distance is equivalent to the least squares distance.

Warning

It might seem natural to think that the lower the distance is, the closer our parameters are to the real values. Bearing this in mind would mean thinking that “having a null distance” is the ultimate target of calibration, which is actually dangerous. As for the general discussion in Generating Surrogate Models, the risk could be to overfit the set of parameters by “learning” just the set of observations at our disposal as the “truth”, not considering that the residuals (introduced in Equation 7.1) might be here to introduce observation uncertainties. In this case, knowing the value of the uncertainty on the observations, the ultimate target of the calibration might be to get the best agreement of observations and model predictions within the model uncertainty, which can be translated into a distribution of the reduced-residuals (that would be something like \(\lbrace (y^i-f_\theta^i)/\sigma_{\varepsilon_i} \rbrace_{i \in [1, n]}\) in a scalar case) behaving like a standard normal distribution.