# Interest in the least square measurement

The least squares distance function introduced in [](#calibration_introduction_distance_compare_model) 
is widely used when considering calibration issues. This is true whether calibration is performed 
within a statistical framework or not (see the discussion on uncertainty sources in 
[](#calibration_reminder)). The importance of the least squares approach can be understood by adding 
an additional assumption on the residuals defined previously. If one considers that the residuals are normally 
distributed, it implies that one can write 

```{math}
\varepsilon_i = \mathcal{N}(0,\sigma^2_{\varepsilon_i})\;\; {\rm for} \;\; i=1, \dots, n,
```

where $\sigma_{\varepsilon_i}$ can quantify both sources of uncertainty and whose values are supposed 
known. The formula above can be used to transform {eq}`epsilonCalib` into (setting $n_Y=1$ 
for simplicity): 

```{math}
:label: eq_observationCondThetaNorm

y_i \sim Y_i | \theta := \mathcal{N}(f_\theta(\mathbf{x}_i),\sigma^2_{\varepsilon_i})
```

This particular case is very interesting, as from {eq}`eq_observationCondThetaNorm` it 
becomes possible to write down the probability of the observation set $\mathcal{D}$ as the product 
of all its component probabilities which can be summarised as such: 

```{math}
:label: eq_MLENormCalib

L(\mathbf{y}|\theta) = \prod_{i=1}^{n} \ell(y_i|\theta) = \prod_{i=1}^{n}\frac{1}{\sqrt{2\pi} 
\sigma_{\varepsilon_{i}}} e^{-\frac{1}{2} \Big 
(\frac{y_{i} -f_{\theta}(\mathbf{x}_i)}{\sigma_{\varepsilon_i}}\Big)^{2}}
```

It is logical to consider that, since the dataset $\mathcal{D}$ has been observed, the probability of 
this collection of observations must be high. The probability defined 
in {eq}`eq_MLENormCalib` can then be maximised by varying $\theta$ in order to get its most 
probable values. This is called the Maximum Likelihood Estimation (MLE) and maximising the likelihood 
is equivalent to minimising the logarithm of the likelihood which can be written as: 

```{math}
:label: eq_loglikelihoodMLECalib

\log{ L(\mathbf{y}|\theta) } = -\frac{n}{2} \log{2\pi\sigma_{\varepsilon_i}^2} - \frac{1}{2} 
\sum_{i=1}^{n} \Big (\frac{y_{i} -f_{\theta}(\mathbf{x}_i)}{\sigma_{\varepsilon_i}}\Big)^{2}
```

The first part of the right-hand side is independent of $\theta$ which means that minimising the 
log-likelihood is basically focusing on the second part of the right-hand side which essentially 
corresponds to the weighted least squares distance with the weights set to 
$\lbrace \psi_i=\sigma^{-2}_{\varepsilon_i} \rbrace_{i\in[i,n]}$. Their values depend on the underlying 
model assumptions, and this discussion is postponed to another section (this is discussed in 
[](#calibration_minimisation)). More details on least squares concepts can be found in many references, 
such as {cite}`Borck1996, Hansen2013`.