# Rejection ABC algorithm

Rejection ABC is the simplest version of the ABC approach. Its origins date back to the 1980s 
{cite}`rubin1984bayesianly` and it is possible to see a nice introduction of 
the rejection algorithm as originally applied to a problem with a finite countable set $\mathcal{V}$ 
of values in {cite}`marin2012approximate`. In this specific case, it only consisted in two random 
draws: the parameter values according to their prior then the model prediction according to the 
parameter values just drawn. If the result was an element of the reference dataset, the 
configuration was kept.

Things become more complicated when considering continuous sample spaces, since there is no such 
thing as strict equality when considering stochastic behaviour (without even discussing the numerical 
issues that have to arise at some points). This implies the need for two important concepts

- a distance metric in the output space, denoted $\rho(.,.)$;

- a tolerance parameter, denoted $\delta$, which determines the accuracy of the algorithm.

Unlike the simpler discrete case quickly introduced above where the aim is to have strict equality 
between the predictions and the reference data, here the accepted configurations would be those 
fulfilling the following condition 

```{math}
\rho(\mathbf{z},\mathbf{y}) \le \delta
```

where $\theta_{T}$ is the configuration under study drawn from the prior $\pi(\theta)$ and 
$\mathbf{z}$ is the model predictions generated from $f_{\theta_T}$ once run on the reference 
dataset. This was firstly used in the late nineties, as can be seen in {cite}`pritchard1999population`.

```{warning}
One should recall that the 
uncertainty model is defined on the residuals, as stated in {eq}`epsilonCalib` and that residuals are 
usually considered normally distributed, as in {eq}`eq_observationCondThetaNorm`. Disregarding the origin of 
these residuals, as discussed in [](#calibration_reminder), if the model one is providing is 
deterministic, the calibration will focus on a single realisation of the observation without 
uncertainty consideration. In this case, the model prediction must be modified to include a noise 
representative of the residuals hypotheses {cite}`van2018taking`.
```

This methodology shows that accepted configurations are not directly sampled from the true posterior 
distribution $\pi(\theta|\mathbf{y})$ but rather come from an approximation of it that can be 
written $\pi(\theta|\rho(\mathbf{z},\mathbf{y}) \le \delta)$. Two interesting asymptotic regimes can 
be highlighted:

- when $\delta \rightarrow 0$: the algorithm is exact and converges to the true posterior
$\pi(\theta|\mathbf{y})$;

- when $\delta \rightarrow \infty$: the algorithm ignores the reference data and simply returns 
the original prior $\pi(\theta)$.

There are many different versions of this kind of algorithm, among which one could find an extra step 
using summary statistics $S(.)$ to project both $\mathbf{z}$ and $\mathbf{y}$ onto a lower dimensional 
space. In this version, the configurations kept are drawn from $\pi(\theta|\rho(S(\mathbf{z}),
S(\mathbf{y})) \le \delta)$.

Finally, another possible way to select the best representative sub-sample might be by using a 
percentile of the analysed and computed set of configurations. Although, mainly recommended for 
high-dimensional cases (i.e., when $n$ becomes large), this solution might work as long as one 
keeps an eye on the residuals distribution provided by the *a posteriori* estimated parameters. Indeed, 
if no threshold is chosen but a percentile is used, the requested number of configurations will 
always be obtained in the end, but the only way to check whether the uncertainty assumptions are 
valid is to assess how closely the predictions match the full reference dataset.