English Français

Documentation / User's manual in Python : PDF version

XI.5. Approximate Bayesian Computation techniques (ABC)

XI.5. Approximate Bayesian Computation techniques (ABC)

This section covers methods grouped under the acronym ABC, which stands for Approximate Bayesian Computation. The core idea is to perform Bayesian inference without explicitly evaluating the model likelihood function. For this reason, these methods are also refered to as likelihood-free algorithms [wilkinson2013approximate].

As a reminder, the principle of the Bayesian approach is summarized in the equation , where is the conditional probability of the observations given the parameter values , is the a priori probability density of (the prior), and is the marginal likelihood of the observations, which is constant. For more details, see [metho].

From a technical perspective, the methods in this section inherit from the TABC class (which itself inherits from TCalibration, in order to benefit from all standard features). Currently, the only implemented ABC method is the Rejection algorithm, presented in [metho], whose implementation is provided through the TRejectionABC class described below.

The usage of the TRejectionABC class can be summarised in a few key steps:

  1. Prepare the data and the model:

    • The parameters to be calibrated must be instances of classes inheriting from TStochasticAttribute;

    • Select the assessor type and construct the TRejectionABC object with the appropriate distance function (see Section XI.5.1).

  2. Set the algorithm properties:

    • Define optional behaviours;

    • Specify the uncertainty hypotheses via the dedicated methods (see Section XI.5.2).

  3. Perform the estimate and analyse the results:

    • Run the estimate process;

    • Extract the results and visualise them with the standard plotting tools (see Section XI.5.3).

XI.5.1.  Constructing the TRejectionABC object

The constructors available for creating an instance of the TRejectionABC class are detailed in Section XI.2.3. As a reminder, the available prototypes are:


# Constructor with a runner
RejectionABC(tds, runner, ns=1, option="")
# Constructor with a TCode
TRejectionABC(tds, code, ns=1, option="")
# Constructor with a function using Launcher
TRejectionABC(tds, fcn, varexpinput, varexpoutput, ns=1, option="")
        

Details about these constructors can be found in Section XI.2.3.1, Section XI.2.3.2, and Section XI.2.3.3 respectively for the TRun, TCode, and TLauncherFunction-based constructor. In all cases, the number of samples must be specified, as it represents the number of retained samples in the final posterior distribution. In our implementation, the total number of computations is determined by the chosen percentile value (see Section XI.5.2.1).

This class provides one specific option, which can be used to modify the default value of the a posteriori distribution returned by the algorithm. Two possible choices are available for obtaining the single-point estimate that best represents the distribution:

  • Mean of the distribution: this is the default option;

  • Mode of the distribution: the user must specify "mode" in the option field of the TRejectionABC constructor.

The default solution is straightforward, whereas the second requires internal smoothing of the distribution in order to obtain the best estimate of the mode.

The final step is to construct the TDistanceLikelihoodFunction, a mandatory step that must always immediately follow the constructor. Although ABC methods are Bayesian, they are likelihood-free: instead of directly computing the likelihood, they calculate the distance between the observed data and the simulated data for several prior samples, until the best-fitting distribution is identified. Consequently, the TDistanceLikelihoodFunction object can be constructed via the setDistance method (following the prototype presented in Section XI.2.2.1). Advanced users also have the option to define a custom distance by following the prototype:


setDistance(distFunc, tdsRef, input, reference, weight="")
        

Warning

If the reference dataset is compared against a deterministic model (i.e. a model with no intrinsic stochasticity), it is necessary to explicitly specify the uncertainty hypotheses. This is done via the method described in Section XI.5.2.2.

XI.5.2. Defining the TRejectionABC algorithm properties

Once the TRejectionABC instance is created along with its TDistanceLikelihoodFunction, a few additional methods can be used to fine-tune the algorithm parameters. These methods are optional, since each property has a predefined default value. The available options are described in the following sub-sections.

XI.5.2.1. Defining the percentile

The first method discussed here is straightforward: the principle of rejection is to keep the best-tested configurations. This can be done either by applying a threshold value on the distance results (called in [metho]) or by retaining a fixed fraction of the tested configurations, defined through a percentile . The TRejectionABC method implements the latter approach. By default, the percentile is set to

. To modify this value, the user can call setPercentile, whose prototype is


setPercentile(eps)
          

where the argument eps specifies the fraction of configurations to be kept.

An important consequence is that the total number of configurations evaluated is computed as follows:

where is the number of retained samples in the final posterior distribution, as defined in the constructor (see Section XI.5.1).

XI.5.2.2. Introducing noise for deterministic function

As previously explained, when comparing your reference dataset to a deterministic model (i.e., a model with no intrinsic stochastic behaviour), the user can explicitly specify his own uncertainty assumptions. This can be done by calling setGaussianNoise, whose prototype is


setGaussianNoise(stdname)
          

The idea is to inject random noise (assumed Gaussian and centered) into the model predictions, using internal variables from the reference dataset to define its standard deviation. The only argument is a list of variables, formatted as "stdvaria1:stdvaria2" , where each element corresponds to a variable within the reference TDataServer. These values provide the standard deviation for each observation point (for example, to represent experimental uncertainty).

This solution allows to:

  • define a common uncertainty (applied generally across all observations in the reference dataset) by simply adding an attribute with a TAttributeFormula, where the formula is constant;

  • use experimental uncertainties that are provided along with the reference values;

  • store all hypotheses within the reference TDataServer object. For this reason, we strongly recommend saving both the parameter and reference datasets at the end of a calibration procedure.

Warning

The number of variables in the stdname list must match the number of model outputs. Even in the special case where calibration involves two outputs, with one having no associated uncertainty, a zero-valued attribute should still be added for that output if the other requires an uncertainty model.

XI.5.3. Looking at the results

Finally, once the computation is complete (using the standard estimateParameters method), the results can be examined in two main ways (in addition to inspecting the final datasets):

No additional options or visualization methods are specific to the TRejectionABC implementation; these functions behave exactly as described in their respective sections.

An example is also provided in the use-case section (see Section XIV.12.4).