11.6.4. Investigating the quality of the samples through diagnostics and plots

Computing iterations does not guarantee that the chains have reached convergence and are properly sampling the posterior distribution. Although there is no exact way to prove convergence, several techniques can help assess the quality of the samples. Before plotting and analyzing the results, it is mandatory to check these diagnostics to determine whether the results are reliable or if the algorithm should be run for additional iterations (see Running the estimate, exporting and loading chains, and continuing the calculation). These techniques serve two main purposes:

Ensuring convergence of the chains: Convergence means that the chains have stabilized around the same region. This must be checked by the user using the trace plot (see Drawing the trace). It is also recommended to check the stability of the acceptance ratio, which should typically lie between 20% and 50%, using the acceptance ratio plot (see Drawing the acceptation ratio). Finally, the user must define the burn-in (also called warm-up), i.e., the number of initial iterations discarded before the chain stabilizes. This is done with the setBurnin method:

setBurnin(burnin)

This method takes a single integer argument that specifies the number of iterations to remove (non-converged iterations).

When multiple chains are initialized (at least 4), it is also possible to compute the Gelman–Rubin statistic, which compares intra- and inter-chain variances (see Checking for convergence with the Gelman–Rubin diagnostic). Values close to 1 indicate good convergence.

Ensuring approximate independence of posterior samples: Because Markov chains generate dependent samples, there is a risk that successive samples are correlated and do not explore the posterior distribution effectively. To assess this, the Effective Sample Size (ESS) can be computed with the diagESS method (see Thinning the chains with ESS), which provides the equivalent number of independent samples. This method also suggests an appropriate lag value, i.e., the number of iterations to skip before selecting the next uncorrelated sample. The lag is set using the setLag method:

setLag(lag)

The method takes a single integer argument that specifies the lag value.

Warning

Setting lag and burn-in values with the setLag and setBurnin methods will affect most drawing methods, and a line will indicate which default cut and lag were applied to produce the plot. The only exception is the residuals plot, since the a posteriori residuals are computed during estimateParameters, before any burn-in or lag is applied. To re-estimate residuals with a specific lag or burn-in, use the estimateCustomResiduals method (see Estimate custom residuals). To remove the current cut and lag, use the clearDefaultCut method, which requires no arguments and clears both lag and burn-in:

clearDefaultCut()

Another important point is that these methods may discard a significant number of samples. The burn-in length and lag value should therefore be chosen carefully to balance two goals: keeping enough points to ensure a good representation of the posterior distribution, while also guaranteeing convergence and independence of the samples.