# Interest in the least square measurement The least squares distance function introduced in [](#calibration_introduction_distance_compare_model) is widely used when considering calibration issues. This is true whether calibration is performed within a statistical framework or not (see the discussion on uncertainty sources in [](#calibration_reminder)). The importance of the least squares approach can be understood by adding an additional assumption on the residuals defined previously. If one considers that the residuals are normally distributed, it implies that one can write ```{math} \varepsilon_i = \mathcal{N}(0,\sigma^2_{\varepsilon_i})\;\; {\rm for} \;\; i=1, \dots, n, ``` where $\sigma_{\varepsilon_i}$ can quantify both sources of uncertainty and whose values are supposed known. The formula above can be used to transform {eq}`epsilonCalib` into (setting $n_Y=1$ for simplicity): ```{math} :label: eq_observationCondThetaNorm y_i \sim Y_i | \theta := \mathcal{N}(f_\theta(\mathbf{x}_i),\sigma^2_{\varepsilon_i}) ``` This particular case is very interesting, as from {eq}`eq_observationCondThetaNorm` it becomes possible to write down the probability of the observation set $\mathcal{D}$ as the product of all its component probabilities which can be summarised as such: ```{math} :label: eq_MLENormCalib L(\mathbf{y}|\theta) = \prod_{i=1}^{n} \ell(y_i|\theta) = \prod_{i=1}^{n}\frac{1}{\sqrt{2\pi} \sigma_{\varepsilon_{i}}} e^{-\frac{1}{2} \Big (\frac{y_{i} -f_{\theta}(\mathbf{x}_i)}{\sigma_{\varepsilon_i}}\Big)^{2}} ``` It is logical to consider that, since the dataset $\mathcal{D}$ has been observed, the probability of this collection of observations must be high. The probability defined in {eq}`eq_MLENormCalib` can then be maximised by varying $\theta$ in order to get its most probable values. This is called the Maximum Likelihood Estimation (MLE) and maximising the likelihood is equivalent to minimising the logarithm of the likelihood which can be written as: ```{math} :label: eq_loglikelihoodMLECalib \log{ L(\mathbf{y}|\theta) } = -\frac{n}{2} \log{2\pi\sigma_{\varepsilon_i}^2} - \frac{1}{2} \sum_{i=1}^{n} \Big (\frac{y_{i} -f_{\theta}(\mathbf{x}_i)}{\sigma_{\varepsilon_i}}\Big)^{2} ``` The first part of the right-hand side is independent of $\theta$ which means that minimising the log-likelihood is basically focusing on the second part of the right-hand side which essentially corresponds to the weighted least squares distance with the weights set to $\lbrace \psi_i=\sigma^{-2}_{\varepsilon_i} \rbrace_{i\in[i,n]}$. Their values depend on the underlying model assumptions, and this discussion is postponed to another section (this is discussed in [](#calibration_minimisation)). More details on least squares concepts can be found in many references, such as {cite}`Borck1996, Hansen2013`.