Approximate Bayesian computation (ABC), a type of likelihood-free inference, is a family of statistical techniques to perform parameter estimation and model selection. It is increasingly used in ecology and evolution, where the models used can be too complex to be handled with standard likelihood techniques. The essence of ABC techniques is to compare simulation outputs to observed data, in order to select the parameter values of the simulations which best fit the data. ABC techniques are thus computationally demanding. This constitutes a key limitation to their implementation.
We introduce the R package ‘EasyABC’ that enables one to launch a series of simulations from the R platform and to retrieve the simulation outputs in an appropriate format for post-processing. The ‘EasyABC’ package further implements several efficient parameter sampling schemes to speed up the ABC procedure: on top of the standard prior sampling, it implements various algorithms to perform sequential (ABC-sequential) and Markov chain Monte Carlo (ABC-MCMC) sampling schemes. The package functions can furthermore make use of parallel computing.
The R package ‘EasyABC’ complements the package ‘abc’ which enables various post-processing of simulation outputs. ‘EasyABC’ makes several state-of-the-art ABC implementations available to the large community of R users in the fields of ecology and evolution. It is a freely available R package under the GPL license, and it can be downloaded at http://cran.r-project.org/web/packages/EasyABC/index.html.
Approximate Bayesian computation (ABC) is increasingly used in ecology and evolution (Beaumont 2010). In these fields, many models are too complex to be handled with standard likelihood or Bayesian techniques, since the computation of the model likelihood can be intractable or prohibitively costly in computing time. In such cases, ABC techniques, a type of likelihood-free inference, can be used to estimate the model parameters. They consist in (i) simulating a very large number (approximately millions) of times the model, with parameter values drawn from a prior distribution, (ii) comparing the simulation outputs to the observed data, using some summary statistics computed from both data and simulations, and (iii) retaining those simulation results and their corresponding parameters for which the predictions differ from the data by less than a threshold value. The best fit simulations are those that differ from the data by less than a threshold value. And the retained parameter values form an approximation of the posterior distribution of the model parameters. Subsequent steps of model checking and model selection are also possible (and recommended) in ABC applications (Csilléry et al. 2010).
In the last decade, a number of improvements to this basic ABC scheme have been proposed (reviewed in Marin et al. 2012). Some of them focus on post-processing of the posterior distribution, either through local linear regressions (Beaumont, Zhang & Balding 2002) or nonlinear methods (Blum & François 2010). Such post-processing as well as model checking and selection tools have been implemented in the R package ‘abc’ (Csilléry, François & Blum 2012). Other types of improvement have been proposed. Among them, the use of sequential parameter sampling scheme (ABC-sequential) and the coupling to Markov chain Monte Carlo (ABC-MCMC) have received much attention. Sequential Monte Carlo and Markov Chain Monte Carlo algorithms are well known procedures in computational statistics, which have been used outside of the ABC framework for a long time and which have relatively recently been introduced in the ABC context (Marjoram et al. 2003; Sisson, Fan & Tanaka 2007). These algorithms are more efficient, since they preferentially sample the interesting areas of the parameter space (of high likelihood).
Some of these technical schemes are currently available in two toolboxes (ABCtoolbox, Wegmann et al. 2010 and ABC-SysBio, Liepe et al. 2010). ABC-SysBio is designed for a specific class of models encountered in the field of systems biology, while ABCtoolbox is a generic platform written in C++. These toolboxes are very useful for the scientific community, but they do not benefit from the various advantages of a R package, as being easy to use and install, cooperative, and easy to pipeline with other R tools, as the one developed in the package ‘abc’ (Csilléry, François & Blum 2012). Furthermore, new algorithms have been proposed since the publication of the above toolboxes, and would gain at being widely available to the scientific community.
We present a new R package called ‘EasyABC’ that enables one to launch a series of simulations from the R platform and to retrieve the simulation outputs in an appropriate format for post-processing. The simulation code has to be a R function or a binary executable file respecting some minimal compatibility constraints. The ‘EasyABC’ package further implements several efficient parameter sampling schemes to speed up the ABC procedure: on top of the standard prior sampling, it implements various algorithms to perform sequential (ABC-sequential) and Markov chain Monte Carlo (ABC-MCMC) sampling schemes. The package functions can furthermore make use in parallel of several cores of a multi-core computer. EasyABC has been tested on Linux and Windows 32. It works with versions of R ≥ 2.15.0.
Three main types of ABC algorithms are implemented: the standard rejection algorithm, called ABC-rejection (Pritchard et al. 1999), sequential sampling schemes (ABC-SMC), and coupled to MCMC schemes (ABC-MCMC, see Hartig et al. 2011 for a more detailed presentation).
The function ‘ABC_rejection’ enables one to launch a rejection scheme, which consists in simply sampling parameter values in the prior distribution. The user specifies n, the number of simulations to be performed, a list detailing the prior distribution of the parameter values, the target summary statistics and the tolerance (being the proportion of simulations whose parameters are retained). Four types of prior distributions are currently implemented in ‘EasyABC’: uniform, normal, lognormal and exponential distributions. If the target summary statistics and the tolerance are not specified, ‘ABC_rejection’ will solely launch simulations with model parameters drawn from the prior distribution and store the simulation outputs. Otherwise, ‘ABC_rejection’ will perform a basic rejection procedure (Pritchard et al. 1999). More refined post-processing can be performed by pipelining ‘EasyABC’ with the R package ‘abc’ (Csilléry, François & Blum 2012), as demonstrated in the package's examples and in the package's vignette.
A sequential sampling of the parameter space was first proposed by Sisson, Fan & Tanaka (2007). This sequential sampling consists in performing several steps of ABC. During the first step, a standard ABC-rejection procedure is used. From this initial ensemble of simulations, a coarse approximate posterior distribution is derived. The results of this first step are then used to concentrate sampling effort on the relevant parameter space, and the procedure goes on for a number of steps. This procedure enables one to concentrate the simulations in the zones of the parameter space of high likelihood, while correcting for the unequal sampling of the prior distribution. It thus enables one to speed up the ABC procedure by having a larger percentage of simulations close to the data. Beaumont et al. (2009) evidenced a bias in Sisson, Fan & Tanaka's (2007) method and proposed a correction to this bias. Other sequential algorithms have been proposed (Toni et al. 2009; Drovandi & Pettitt 2011; Del Moral, Doucet & Jasra 2012; Lenormand, Jabot & Deffuant 2012). Four of these sequential algorithms are available in the package ‘EasyABC’ and can be selected with the option ‘method’ in the function ‘ABC_sequential’, the ones of Beaumont et al. (2009), Drovandi & Pettitt (2011), Del Moral, Doucet & Jasra (2012) and Lenormand, Jabot & Deffuant (2012). The function ‘ABC_sequential’ computes the final approximate posterior distribution, while intermediary results can be stored thanks to the option ‘verbose’. We demonstrate the use of ‘ABC_sequential’ in Fig. 1 with a toy example, where one can see the progressive improvement in the approximation of the posterior distribution. Note however that the ABC-SMC algorithm will not converge exactly to the true posterior distribution due to the approximation error controlled by the tolerance. Several other examples are included in the help pages of the package, as well as in the package's vignette, including a recent stochastic ecological model (Jabot 2010) which has been incorporated in the package.
Another way of speeding up the ABC procedure is to embed it in a Markov Chain Monte Carlo scheme (MCMC). A MCMC is started with an initial, random parameter combination. At each step of the algorithm, a new simulation is launched with new parameter values (proposal) that are randomly drawn close to the current parameter values. This proposal is accepted if the distance between this simulation and the data is below a threshold value called the tolerance. More details about this scheme can be found in Marjoram et al. (2003) and Wegmann, Leuenberger & Excoffier (2009) Three ABC-MCMC algorithms are coded in the package ‘EasyABC’ and can be selected with the option ‘method’ in the function ‘ABC_MCMC’, the ones of Marjoram et al. (2003) and Wegmann, Leuenberger & Excoffier (2009), and a slight modification of Marjoram et al. (2003) in which the tolerance and the proposal range are determined by the algorithm via an initial calibration step, following the modifications of Wegmann, Leuenberger & Excoffier (2009). We demonstrate the use of ‘ABC_MCMC’ in Fig. 1 with a toy example.
Conclusion and perspectives
The ABC-sequential and ABC-MCMC schemes enable a considerable speed up of the ABC procedure, as shown by the various authors who proposed such schemes (Marjoram et al. 2003; Sisson, Fan & Tanaka 2007; Beaumont et al. 2009; Toni et al. 2009; Wegmann, Leuenberger & Excoffier 2009; Drovandi & Pettitt 2011; Del Moral, Doucet & Jasra 2012; Lenormand, Jabot & Deffuant 2012). The different methods present various advantages, so that the choice of an optimal method is likely to depend on the precise characteristics of the model studied. However, growing evidence suggests that, whatever the method used, the gain in efficiency brought by the refined ABC-sequential and ABC-MCMC schemes is likely to always be large, so that ABC users should prefer ABC-sequential and ABC-MCMC over the ABC-rejection scheme.
The R package ‘EasyABC’ enables efficient ABC schemes to be launched from R. It can thus be easily pipelined with existing R tools, among which the package ‘abc’ for post-processing (Csilléry, François & Blum 2012). It should facilitate the dissemination of state-of-the-art approximate Bayesian computation implementations in ecology and evolution. We direct the readers to the package's vignette for a more detailed tutorial.
This work has been supported by the Irstea project DynIndic and by the French National Research Agency (ANR) within the SYSCOMM project DISCO (ANR-09-SYSC-003). We thank two anonymous reviewers and the handling editor for their useful suggestions that helped us improve both the R package and this manuscript.