# Rejection ABC algorithm Rejection ABC is the simplest version of the ABC approach. Its origins date back to the 1980s {cite}`rubin1984bayesianly` and it is possible to see a nice introduction of the rejection algorithm as originally applied to a problem with a finite countable set $\mathcal{V}$ of values in {cite}`marin2012approximate`. In this specific case, it only consisted in two random draws: the parameter values according to their prior then the model prediction according to the parameter values just drawn. If the result was an element of the reference dataset, the configuration was kept. Things become more complicated when considering continuous sample spaces, since there is no such thing as strict equality when considering stochastic behaviour (without even discussing the numerical issues that have to arise at some points). This implies the need for two important concepts - a distance metric in the output space, denoted $\rho(.,.)$; - a tolerance parameter, denoted $\delta$, which determines the accuracy of the algorithm. Unlike the simpler discrete case quickly introduced above where the aim is to have strict equality between the predictions and the reference data, here the accepted configurations would be those fulfilling the following condition ```{math} \rho(\mathbf{z},\mathbf{y}) \le \delta ``` where $\theta_{T}$ is the configuration under study drawn from the prior $\pi(\theta)$ and $\mathbf{z}$ is the model predictions generated from $f_{\theta_T}$ once run on the reference dataset. This was firstly used in the late nineties, as can be seen in {cite}`pritchard1999population`. ```{warning} One should recall that the uncertainty model is defined on the residuals, as stated in {eq}`epsilonCalib` and that residuals are usually considered normally distributed, as in {eq}`eq_observationCondThetaNorm`. Disregarding the origin of these residuals, as discussed in [](#calibration_reminder), if the model one is providing is deterministic, the calibration will focus on a single realisation of the observation without uncertainty consideration. In this case, the model prediction must be modified to include a noise representative of the residuals hypotheses {cite}`van2018taking`. ``` This methodology shows that accepted configurations are not directly sampled from the true posterior distribution $\pi(\theta|\mathbf{y})$ but rather come from an approximation of it that can be written $\pi(\theta|\rho(\mathbf{z},\mathbf{y}) \le \delta)$. Two interesting asymptotic regimes can be highlighted: - when $\delta \rightarrow 0$: the algorithm is exact and converges to the true posterior $\pi(\theta|\mathbf{y})$; - when $\delta \rightarrow \infty$: the algorithm ignores the reference data and simply returns the original prior $\pi(\theta)$. There are many different versions of this kind of algorithm, among which one could find an extra step using summary statistics $S(.)$ to project both $\mathbf{z}$ and $\mathbf{y}$ onto a lower dimensional space. In this version, the configurations kept are drawn from $\pi(\theta|\rho(S(\mathbf{z}), S(\mathbf{y})) \le \delta)$. Finally, another possible way to select the best representative sub-sample might be by using a percentile of the analysed and computed set of configurations. Although, mainly recommended for high-dimensional cases (i.e., when $n$ becomes large), this solution might work as long as one keeps an eye on the residuals distribution provided by the *a posteriori* estimated parameters. Indeed, if no threshold is chosen but a percentile is used, the requested number of configurations will always be obtained in the end, but the only way to check whether the uncertainty assumptions are valid is to assess how closely the predictions match the full reference dataset.