Sensitivity analysis of Repast computational ecology models with R/Repast

Abstract Computational ecology is an emerging interdisciplinary discipline founded mainly on modeling and simulation methods for studying ecological systems. Among the existing modeling formalisms, the individual‐based modeling is particularly well suited for capturing the complex temporal and spatial dynamics as well as the nonlinearities arising in ecosystems, communities, or populations due to individual variability. In addition, being a bottom‐up approach, it is useful for providing new insights on the local mechanisms which are generating some observed global dynamics. Of course, no conclusions about model results could be taken seriously if they are based on a single model execution and they are not analyzed carefully. Therefore, a sound methodology should always be used for underpinning the interpretation of model results. The sensitivity analysis is a methodology for quantitatively assessing the effect of input uncertainty in the simulation output which should be incorporated compulsorily to every work based on in‐silico experimental setup. In this article, we present R/Repast a GNU R package for running and analyzing Repast Simphony models accompanied by two worked examples on how to perform global sensitivity analysis and how to interpret the results.


Antonio Prestes García | Alfonso Rodríguez-Patón 1 | INTRODUCTION
The computational ecology is a relatively young field which relies extensively on mathematical computational methods and models for studying ecological and evolutionary processes. It is based on the construction of predictive and explanatory models as well as the quantitative description and analysis of ecological data (Helly, Case, Davis, Levin, & Michener, 1995;Petrovskii et al., 2012). The continuous growth of computational power available for and end users, the existence of tools, and the constant increment of empirical data available makes viable for many scientists to develop and simulate tremendously complex models from their desktops. In addition, the intrinsic characteristics of ecological processes, maxim their temporal and spatial scale (Dieckmann, Law, & Metz, 2000), converts the task of carrying out controlled experiments a physical impossibility. Hence, in most cases, the only feasible alternative is to simulate the process in order to make experiments spanning the full length of ecological and evolutionary scales. The computational ecology has its roots from the successful results achieved from mathematical ecology which has proven to be an essential tool for understanding the complexities which arise from ecological interactions.
It is widely accepted that simple models with a small number of state variables and parameters provide best generalizations than the complex ones (Evans et al., 2013;Smith, 1974) with a clear distinction between simulation models and theories as separate entities handling different kind of problems. It has been recently questioned the correctness of the idea the simple models lead to generality in ecology (Evans et al., 2013). We believe that the parsimony principle This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. must always be taken into account when developing models, but this has a different meaning depending on the modeling formalism we are using. Simplicity does not have the same meaning when the referred modeling formalism is a deterministic ordinary differential equation (ODE) or when it is applied to agent-based modeling, as long as every modeling techniques has its own idiosyncrasy and constraints. The agent-based modeling is a flexible and versatile abstraction where the whole system under study is described or formalized by its component units, which facilitates a more natural description of a system and the comprehension of individual properties leading to the emergent phenomena (Bonabeau, 2002).
The agent-based models (AbM) are much more fine-grained than their whole-population aggregated counterpart, and as a consequence, they tend to be more complex requiring more equations, parameters, and processes in order to represent the same phenomenon. That is, not intrinsically a problem or a quality but simply a constraint imposed by the modeling formalism in use, and it is up to the modelers to find the correct trade-off between the purpose of the model and the level of details which should be part of the model structure.
The AbM is being established progressively as a mainstream and valuable tool for modeling complex adaptive systems in many distinct areas of knowledge, ranging from social science, economics to any flavor of computational and systems science such as biology, ecology, and so on (Grimm & Railsback, 2005). The reason is, among other things, the relative ease with which detailed structural information can be incorporated into a model without the constraints of other methodologies (Hellweger & Bucci, 2009). Nonetheless, the possibility of incorporating many details comes with the cost of models with a high-complexity level, containing many rules and parameters for which the exact values are, in many cases, hard or impossible to determine experimentally, that is what is known as parameter uncertainty. When used in the context of ecological systems, the agent-based modeling is also known as individual-based modeling (IbM; Grimm & Railsback, 2005).
The distinctive aspect defining what is an IbM is that individuals are represented by discrete entities and they also have a property or state variable which are unique in the population being simulated (Berec, 2002). Hence, IbM is a valuable abstraction for simulating populations, communities, or ecosystems capturing the individual variability, randomness, and their complex dynamics. It is a bottom-up approach where the system under study is modeled using mechanistic explanations on the interacting system parts (Ferrer, Prats, & López, 2008). Therefore, the global behavior shown by the system as a whole is an emergent property derived from the local rules defining the individuals, which is particularly useful for testing different hypothesis or phenomenological explanations for the individual processes in order to verify which of them are producing the global observed behavior (Pascual, 2005). Moreover, differently from aggregate models, it is customary that IBM have a large number of state variables and parameters which in most cases are hard or directly impossible to elucidate experimentally leading to many levels of uncertainty in this kind of models. In order to tackle with the uncertainty and for making robust predictions, we have to use a sound methodology for applying what-if analysis to check how stable are the model outputs when varying the input parameters (Thiele, Kurth, & Grimm, 2014). There exist a large set of mathematical tools for analyzing the model output which are known generically as sensitivity analysis. Normally, applying these techniques are cumbersome, requiring much effort from modelers, hindering the throughout analysis of computational models.
According to Thiele et al. (2014) most of the individual-based models published, it tends to omit the systematic analysis of model output, mainly because modelers normally do not have the specific knowledge to implement the required methods. Therefore, it seems to be clear that the availability of simple and user-friendly tools for experiment design and analysis would greatly help modelers to improve the formal quality of their models.
In other scientific fields, which are strongly rooted on an extensive experimentalism, it is practically impossible to conduct any kind of research without a well-designed experimental setup and a further statistical analysis and hypothesis test. Perhaps the reasons are that these experimental fields already have a complete and mature toolbox for design and evaluation of experiments (Little & Hills, 1978;Myers & Well, 1995), leaving no room for deviation from these standards. On the other hand, silico-based experiments are still on early stage and verification and validation procedures are not well established yet. In addition, the real value of a computational model depends much on the ability of other researchers to reproduce and enhance the results elsewhere; in other words, results must be reproducible. Hence, in order to achieve reproducibility, research methods should be stated clearly and should preferentially being backed by standard methods and software tools.
Bearing this in mind, we introduce R/Repast a GNU R package for running Repast (North et al., 2013) models from GNU R environment as well as for carrying out global sensitivity analysis on the model results. In the following sections, we will contextualize the problem providing a basic background for understanding what is being addressed in this study and we will also provide a basic description about the package functionalities. Finally, we will show three worked examples on how the package can help modelers to make the conclusions drawn from model results much more robust. The first example explores the basic aspects of bacterial conjugation process. The second is an individual-based implementation of the classic predator-prey model enclosed as part of the standard Repast Simphony distribution. Finally, the last example was developed ex professo for this study and it is an instance of common pool problem in the context of two plasmids "sharing" the genes required for the expression of conjugative system.

| Model development
Model development is an iterative and objective-driven activity, and the first step required to develop a model is having a clear and ideally unambiguous statement about the model purpose. Therefore, every experimental study carried out using modeling and simulation should follow the experimental life cycle based on the successive sequence of four cyclic steps, starting from (1) conjecture, which defines the model purpose and why the model is being developed; (2) design phase, where the model is translated to some runnable implementation; (3) experiment step, which means the execution of model following a well-established plan oriented to confirm or reject the initial conjecture; and finally, the (4) analysis step, where the data generated in the previous step is analyzed with a sound methodology which will generate new insights, uncover model flaws, and iteratively improve the initial conjecture and design (Box & Draper, 1987). A simple graphical representation of these four iterative steps is shown in Figure 1.
Part of design phase consist in converting the model equations and rules to a computer code implementation. Currently, there are several frameworks available for developing individual-based models.
These frameworks are designed to address some specific requirement such as usability (Tisue & Wilensky, 2004), flexibility or scalability (Luke, Cioffi-Revilla, Panait, Sullivan, & Balan, 2005;North et al., 2013), or support to multiple paradigm, such as AnyLogic (Emrich, Suslov, & Judex, 2007). Certainly, the most widespread framework in ecological modeling is NetLogo (Tisue & Wilensky, 2004) which is considered to provide an easier development environment based on extensions to Logo paradigm especially suited for those which are not much familiar with modern programming languages. One of the main drawbacks of NetLogo is the scalability. NetLogo tends to show some performance issues when simulating a large number of agents. On the other hand, Repast Symphony framework has a steep learning curve but provides a fast and flexible java-based environment with many interesting features for simulating large-scale computational ecology models. These features include, among others things, the integration with Weka, exporting the model output to R environment, support for running distributed batch simulations, and some built-in facilities for parameter sweeping (North et al., 2013). Finally, Mason is, in some extent, very similar to Repast but less mature than it is; it has been designed focusing on providing faster execution speeds (Luke et al., 2005). Of these frameworks, only AnyLogic provides integrated sensitivity analysis capabilities, whereas the other frameworks NetLogo, Repast, and Mason, which are all free software, do not have built-in support to sensitivity analysis.
The Repast framework is widely used in many different fields for building individual-based simulation models of dynamic processes (Gutfraind et al., 2015;Tack, Logist, Noriega Fernández, & Van Impe, 2015;Watkins, Noble, Foster, Harmsen, & Doncaster, 2015). In addition, Repast also has a framework for high-performance computing using the C++ programing language with similar conceptual entities as those found in Repast-java. Repast also has support for running GNU R code (Crawley, 2007;R Core Team, 2015) from inside the user interface, but until now, it has not been feasible to run Repast models from R environment for controlling model in order to implement experimental designs, calibration, parameter estimation, and sensitivity analysis, therefore hindering a throughout and comprehensive validation of individual-based models developed using Repast Simphony.

| Sensitivity analysis
Because of sensitivity analysis is a broad and complex subject, a throughout discussion would be lengthy and out of the scope of this work. Instead, we will try to provide a more amenable and practical approach keeping the discussion at a general level but rigorous enough to let the practitioners gain the knowledge required to understand, apply, and interpret the results. For a more detailed review, please refer to Saltelli, Tarantola, Campolongo, andRatto (2004) andPianosi et al. (2016). It is interesting to start the discussion providing the exact meaning of some the many expressions which are used commonly in the analysis of models. There are several terms used in the context of sensitivity analysis for which is important to provide the formal meaning. For instance, the jargon of sensitivity analysis includes model calibration and parameter estimation which many times are used as they were equivalent, even though they are different objectives. Other terms such as uncertainty analysis, omitted variable bias, objective function, or cost function are also important part of SA lexicon.
Generally speaking, the objective of SA is to understand the effect of varying input factors on the model output (Saltelli et al., 2004).
Under this very general statement, we have a wide range of methods and techniques which are suitable for distinct kinds of models. In order to improve this definition, it is convenient to provide a more formal definition to the entity which is the target of SA: the model. Formally speaking, a model is a functional relation between a number k of input factor, also called independent or predictor variable and the output variable, sometimes referred as dependent or response variable (Box & Draper, 1987) as depicted by the expression η = f(x 1 , x 2 , … , x k ), being η is the average value of response variable considering any specific setting for the input factors x i . Therefore, the value of a single model run is given by y = f(x 1 , x 2 , … , x k ) + ϵ, where ε is difference between the value of y and the expected value E(y) = η. The error ε is consequence of stochasticity introduced by design in the structure of model to capture the population variability. Finally, recognizing that most real-world models usually have more than one response variable, the structure of an individual-based model M can be generalized for n outputs as can be seen below F I G U R E 1 The iterative model development life cycle. This figure shows the relationship between the modeling phases and their associated tasks when applied to an individual-based model Therefore, being y i some output of model M, the model calibration process consists in comparing these outputs to some reference values (Zeigler, Praehofer, & Kim, 2000) which are normally, in the case of ecological or biological studies, experimental or observed data. The target of calibration process is minimizing the discrepancies between simulated and reference values. The function used for computing how far y i output is from the reference values is known as objective function or cost function. There are many options for implementing the objective function and the only requirement is that the return of objective function should be inversely proportional to the quality of fit, being zero the return value for the perfect fit. Common implementations for objective function are based on the definition of acceptable ranges, least squares, or even a combination of both. For instance, let y i be the output of some hypothetical model M, assuming this variable represents the net reproductive rate R 0 . The reference values R v for the output variable must fall between 0.8 and 1.2; hence, any y i value within this interval is considered to have a perfect fit, bearing this in mind the cost function could be given by the following expression That is, what is known as categorical calibration criteria (Thiele et al., 2014). The main drawback of this approach is that it does not provide any information about how far is the response value from the reference value. A better alternative is to apply some distance function d(y i , R y ) to the output and the reference values, even standalone or in combination with categorical calibration. The most commonly used metric is some of the multiple forms of squared deviation, but any distance function can be alternatively employed as long as two properties hold: d(y i , R y ) = 0 if x i and R y are equal and d(y i , R y ) > 0 when x i and R y are not equal.
While calibration is a general term, meaning fundamentally the comparison of some value to a reference value, the term parameter estimation has a more subtle and specific goal. The parameter estimation is normally considered an inverse problem because the objective is finding the values for the model parameters providing the best adjustment to the reference values. In other words, knowing the expected values for response variable, the target is estimating the suitable values for the model parameters. Usually, the terminology parameter refers to the constants which are part of models with clear distinction between parameters and independent variables (Beck & Arnold, 1977), for instance, in the growth differential equation shown below the model parameter would be only the growth rate r and the independent variable the time, but for the purpose of this study, we consider indistinctly the model constants and independent variables as being parameters.
The two main objectives of sensitivity analysis are understanding how robust are the model results considering the existing uncertainties and quantifying the effect of input factors on the variance of output (Law, 2005;Pianosi et al., 2016;Saltelli et al., 2004). The intrinsic characteristics of individual-based models which relies on mechanistic descriptions favors the production of models with many subprocesses, state variable, and parameters. The design is normally based on incomplete knowledge resulting in several levels of uncertainties in the model parameters, in the model response variables, and in the model structure itself. The model structure is also related to the identifiability problem where not all model parameters can be uniquely estimated. The sensitivity analysis can be also used to assess the effect of model structure on the output considering the alternative model implementations as being another parameter. This can be useful for analyzing the omitted variable bias, which basically means that some parameter of model can be over or underestimated because another important parameter was not included in the model structure. The sensitivity analysis can be carried out letting the parameters varying over the full range of parameter space or restricted to a small region close to the average value, respectively, referred as global sensitivity analysis and local sensitivity analysis. Sensitivity analysis can also be performed varying one factor at a time (OAT) leaving all others fixed or varying all factors at the same time (AAT). The application of second method is required in order to capture interaction between parameters and nonlinear effects.
The central point of SA methodology is the estimation of sensitivity indices or coefficients. The sensitivity coefficients allow the quantitative comparison of the contributions from distinct parameters to the model output. In its classical form (Beck & Arnold, 1977), the sensitivity indices are defined as the first derivative with respect to some model parameter x i . Considering the general model y = f(X), being X the parameter vector of size k, the sensitivity index S i is given by It is also important to take into account that the partial derivatives can have different units, hence can be necessary to scale them in order to make them comparable. In this approach, input factors are perturbed one-at-a-time, being that measure of sensitivity suitable for local SA (Pianosi et al., 2016).
Several methods to estimate sensitivity indices which are adequate for global sensitivity analysis are available, such as metamodeling approach (Happe, Kellermann, & Balmann, 2006), correlation-based methods, regression-based methods, Fourier amplitude sensitivity test (FAST; Xu & Gertner, 2011), for a more in-depth discussion, please refer to Thiele et al. (2014), Saltelli et al. (2004), Saltelli (2008), Pianosi et al. (2016 and Pujol et al. (2015). The Figure 2 shows how the different methods for assessing the importance of input factors in simulation models are related, also including screening techniques (Saltelli, Andres, & Homma, T. 1995;Bettonvil & Kleijnen, 1996). In this study, we will focus on those methods based on the variance decomposition which are suitable for a wide range of situations, including those which are commonly found in individual-based models, such as nonlinear mappings between input factors and outputs variables ( , 2006). In addition to first-order effects, the variance decomposition methods also allow the quantification of second-order effects sometimes referred as total-order effects. Total-order effect indices are useful for the assessment of the interaction between factors which cannot be expressed by a simple linear superposition.
One of main drawbacks for applying variance decomposition methods on large spatially explicit individual-based models is the requirement of very high number of model evaluations in order to produce consistent results (Herman, Kollat, Reed, & Wagener, 2013). An alternative approach, in those cases where it is impractical or computationally unfeasible a fully quantitative analysis, is the application of the Morris screening method. The Morris method delivers qualitative information allowing to rank the importance of input factors requiring lees model evaluations, which in some case can one order of magnitude be inferior to the Sobol method (Saltelli, 2008).
The Sobol is a method for sensitivity analysis based on the decomposition of the variance of model output and is particularly suitable for discovering the effect of high-order interactions between input factors. The interaction means nonlinearity where the total effect of two input factors x 1 and x 2 on the model output Y are not equivalent to the sum of the individual effects. The general form of sensitivity indices for Sobol methods is shown in Equations 1 and 2, respectively, the first-order and total-order indices.
where the terms V i and V(Y) are, respectively, the variance contribution attributed to the ith parameter and the total variance. The expression V(Y) − V i represents the total variance with exception of the variance which is generated by the parameter i. The total-order index S Ti is the contribution of all input parameters but one, the ith parameter, and hence estimating the effect of that parameter on the variance reduction (Saltelli, 2008).
The total variance V(Y) for a model with n input parameters can be expressed as shown in Equation 3 as long as the orthogonality of input factors precondition holds.
being V(Y) the total variance from model output and the compo-  (Morris, 1991). The method is considered to be more effective when the number of most significant input parameters are a small subset of model parameters (Saltelli et al., 2004).
The original work of Morris (1991)  Therefore, considering a model with k input parameters and being (1) Sequantial Bifurcation  = (x 1 ,x 2 , … ,x k ) any value from the region of experimentation Ω, the elementary effects are calculated according to the Equation 4.
The region of experimentation Ω is a grid defined by the number of k input factors and by the p discrete levels for every parameter. The recommendations for the values of p and Δ are, respectively, that the first should be an even number of levels and the second calculated by the expression Δ = p/(2(p−1)) (Morris, 1991;Saltelli et al., 2004). The These three metrics can be used to extract valuable information about the model behavior, in addition to ranking the input factors. For instance, a low value of μ and a high value of μ * , points that the input factor under scrutiny, possibly has a nonlinear behavior having different signs in function of the system trajectory (Saltelli et al., 2004). A high value of μ indicates that the input has a monotonic effect on the model output.
The sensitivity analysis methods require significant samples from input space in order to provide reliable results. It is customary to choose between some experimental design (Hicks, 1993) for generating the collection of input parameters needed by evaluating the model and allocating the variance contribution of every model parameter. The most generally applied sampling schemas are based on random sampling, full factorial designs, or Latin hypercube sampling.

| OVERVIEW OF R/REPAST PACKAGE
In the previous sections, we had seen some fundamental ideas on model building and the role occupied by sensitivity analysis methods in the iterative modeling life cycle. We have also introduced the basic principles of sensitivity analysis focusing on two main techniques namely the Morris Elementary Effects Screening (Morris, 1991) and the Sobol GSA method for variance decomposition (Saltelli, 2008).
Both methods have a wide range of applicability, making them suitable for their use in the analysis of individual-based models. These methods require the model to be evaluated many times with a different set of input parameters, making completely impractical undertaking a manual analysis introducing individual parameters manually on a graphical user interface. The Repast is an extremely flexible framework for object-oriented development of AbM using Java language, but it lacks model analysis tools. On the other hand, the GNU R is a superb open-source tool for data analysis with a vast and active community developing and adding new methods to the core R system.
Bearing this in mind, we introduce our package R/Repast which bring together the best of both worlds. Roughly speaking, the R/Repast package have two main objectives: (1) Provide an interface for running Repast models from R and gathering the simulation data generated and (2)

| Design
The R/Repast was intended primarily for invoking Repast Simphony models from inside GNU R environment. Additionally, the package contains more high-level and value-added features for experimental design and experiment analysis to address the specific need of individual-based models. The underlying implementation idea is to provide a set of turnkey features for facilitating the task of applying the sensitivity analysis to models. Functionally, the package consists of four modules which interoperate together for instantiate and running the Repast code inside R. These four components are (1) the Repast Integration Broker, (2) the Repast Integration Engine, (3) The R Integration wrapper, and finally, (4) the R API for Experiment design. A schematic view of package architecture is shown in Figure 3.
The R/Repast integration broker and the R/Repast engine are both written in java code and are required for instantiating and loading the Repast Simphony model in batch mode. The R/Repast engine contains also the required hooks for transferring the model output data from Java to R environment. The engine can transfer data from aggregated dataset defined by the modeler on the Repast model. An aggregated

| 8817
PRESTES GARCÍA And ROdRÍGUEZ-PATÓn dataset is a Repast Simphony entity used to collect data about the simulation model agents which can be used for plotting or saving the model output data to a file using a file sink. A File Sink is Repast component for saving simulation data to a file. The aggregated datasets use some kind of aggregate operation, such as counting, averaging, summing, or any other used defined aggregate operation (North et al., 2013;North et al., 2005). The R integration wrapper is the R code for linking together the R and Repast subsystems. This module consist of several wrapping functions for encapsulating the Java code calls implemented using the rJava package (Urbanek, 2016). These functions are prefixed with the [Engine] keyword and, although exported in the R/Repast package, they are not intended for general use.

| The R/Repast R API
The module entitled R/Repast R API is the primary entry point for the user-defined code and relies on the subsystems mentioned previously for providing three group of functionalities for facilitating modelers to analyze the simulation output. These group functionalities are the following: • Execution and control of Repast Simphony code.
• Basic functions for experimental design.
• High-level functions for a complete experiment in one call.  Table 2.
Finally, the third group contains the "Easy" API functions. These functions are intended to provide a complete method implementation which is accessible using just one R function call. The user has to provide the directory location where the Repast model is installed, the objective function, and the parameters relevant to the specific method. The currently available Easy API methods are presented in Table 3. The objective function is a user-defined R function over the model output for calculating and returning a cost metric for the simulation outputs of interest. The return of objective functions is the target for the application of the analysis method.

| The objective function interface
The

| EXAMPLES OVERVIEW
In the next sections, we will provide examples on how the R/Repast can help modelers on the analysis of their simulation models. Three T A B L E 2 The experimental setup Application Programming Interface functions. These functions are used for experimental design, parameter calibration, and sensitivity analysis

Function name Description
AddFactor (f, l, k, b, u) Creates the parameter collection for the experimental setup. The function requires the data frame (f) where parameter will be added, if this parameter is not provided, a new data frame will be created. The second parameter (l) is the random function used internally, the default value is runif which will be the valid choice in many cases, the next parameter (k) is the name of factor, the value provided must match some parameter defined in the repast model. (ODD) will be given for facilitating a general idea about these models. The ODD is a protocol (Grimm et al., 2006(Grimm et al., , 2010 which has been proposed as a standard way to specify and describe individual-based models. A brief description on the model structure and parameters will be given in order to allow the readers to understand the kind of These functions are the preferred entry point for the eventual users. These "Easy" functions lump together a complete experiment task in just one call, reducing the number of lines of code required

Function name Description
Easy.Stability (d, o, t, f, s, r, v, F) Evaluate the behavior of model output in order to determine the minimum required number of replication of the chosen experimental setup. The function accept the following parameters: the model installation directory (

| EXAMPLE 1: BACTOSIM
Normally, one of the advantages of using individual-based models for biological or ecological processes is the possibility of incorporating variability at an individual level. Therefore, unlike deterministic model, in order to get trustworthy results, the simulation must be repeated a number N of times to achieve stable value on the output variance. The objective of the first example is to show the application of a simple method for finding the minimal number of replications of a simulation model which is required for the variance of response variables become stable, converging to a common value. A straightforward way to determine the output stability has been suggested in Thiele et al. (2014) and Lorscheid, Heine, and Meyer (2012)

| Model description
The model description follows the ODD protocol for describing individual-based models (Grimm et al., 2006(Grimm et al., , 2010. The model is implemented in java language using Repast Simphony agent-based simulation framework (North et al., 2013).

| Purpose
The objective of this model is the assessment of the best strategy for modeling and implementing the conjugation rule which provides the best fit to experimental data and better captures the most plausible process structure.

| Entities, state variables, and scales
The Finally, the environment will hold the initial nutrient concentration for every lattice cell. In the model initialization, a fixed amount of substrate particles will be distributed evenly over all lattice sites.

| Process overview and scheduling
The dynamics of bacterial conjugation is modeled as the execution of following set of cellular processes: the cellular division, the T4SS pili expression, the shoving relaxing which avoid bacterial cells to overlap and allow a more realistic colony growth, and the conjugation process. The state variable update is asynchronous. The order of execution of this process is shuffled to avoid any bias due to a purely sequential execution of model rule base, see Figure 5. The conjugation process is modeled in three different ways with respect to the time when conjugation event is most prone to happen, and the results are compared. Thus, the conjugation is defined by two variables: the value of intrinsic conjugation rate (γ 0 ), which determines how many

| Design concepts
Basic Principles: Three models differing in the way the conjugation rule is implemented and their results compared to the available experimental data. The best strategy can be used to build models which could serve as a predictive tool for synthetic biology and to explore some aspects which are hard to observe directly in experimental studies of plasmid spread. The key points of this model lies on the idea of the existence of a local or intrinsic conjugation rate, which has been termed γ 0 . Sensing: All process defined over the agents implicitly sense the local environment and the close neighborhood for their decisions.
Interaction: Bacterial cells interact with their nearby individuals for nutrient access, cellular division, mate pair formation, and plasmid transfer.
Stochasticity: Stochasticity is introduced at individual level for all cellular process sampling a normal deviate and fitting the value to corresponding process.
Collectives: No collectives are taken into account in this model.
Observation: The output target variables will be saved at intervals of 1 min of simulated time.

| Initialization
The simulation model is initialized with a population of plasmid-free (R) and plasmid-bearing (D) cells according to input parameters. The agents are placed randomly within a circular surface centered over the lattice central position. The radius of circle where agents are placed is calculated as function of N 0 in order to be consistent to the desired initial cell density (Zhong, Droesch, Fox, Top, & Krone, 2012). The simulation environment is also initialized with a number of nutrient particles in order to support the half of the estimated number of cellular divisions, and the rationale behind it is to capture the intercellular competition for nutrient access.

| Model analysis
The objective of stability analysis is to find the minimum number all analysis methods will return a list holding three objects, namely the experiment, the object, and the charts. The experiment contains simulation parameters and results, the object is method specific, and finally, the charts are pregenerated graphs for the method results. 3 The method will generate automatically one chart for each model output. 4 One of the output chart of model is shown in Figure 7 for the variable named X.Simulated. As can be observed, the coefficient of variation of these variable decreases as the sam- Simulation output stability

| Entities, state variables, and scales
The model comprises three entities or agent types, the wolves, the sheep individuals, and the grass. These agents evolve in a computational domain of a 50 × 50 units with periodic boundaries, representing a large portion of space. The agents are positioned in a continuous bidimensional space and are free to move. On the other hand, the grass agent is placed in a discrete grid.

| Process overview and scheduling
The agents are defined by the execution of a set of processes depicting the agent movement and search of food source, the consumption of food, the process incrementing the agent reserves, the reproduction, and finally, the death process driven by predation or starvation. The fundamental idea behind the model formulation is that both predator and prey individuals incrementing their "energy" levels by predation or by consuming the available grass, respectively. Both agent types search for their food in the current patch where they are placed. The agents move a unit of space at time selecting randomly the heading.
The individual-based version of this model is a spatially explicit representation and have a few parameters more but is still very succinct. The list of model parameters are shown in Table 5.
The original formulation of Lotka-Volterra consists in a system of two differential equations with four parameters, namely the predator and the prey growth rate, the effect of predator on the prey growth, and finally, the effect of prey on the predator growth as can be seen in Equation 9.
There is a conceptual correspondence between the predator c 2 and prey c 1 growth rates with the model parameters wolfreproduce and sheepreproduce as well as between the parameter wolfgainfromfood and the constant c 4 .

| Model analysis
The implementation code for the Morris screening exercise is shown in Figure 8 and, as has been mentioned in the previous example, we have the same sequence of steps, starting with the library loading and the selection of the random seed. Subsequently, we define the objec- The chart of μ versus σ for model output is shown in Figure 10.
It seems to provide very similar results, and the only significant difference is the contribution of grassregrowthtime. The input parameter was considered important by μ versus σ, but here, it has a negative value. In order to interpret this sensitivity measure, we must recall that μ * takes the absolute values of elementary effects.
Therefore, the elementary effects of grassregrowthtime possibly has effect of opposite sings depending on the values of that input parameter.
Finally, we have the Figure 11 showing the chart of μ * ver-

| Purpose
The objective of this model is to explore the conditions where two plasmids can coexist in a population competing for a common resource required for their horizontal transfer. The common resource is the set of genes required for conjugation because one of the two plasmid genes has lost these genes.

| Entities, state variables, and scales
The model uses two entity types, namely the agents representing the bacterial cells and a ValueLayer, which is a Repast specific struc-

| Process overview and scheduling
Every bacterial cell in this model is abstracted as the execution of a series of successive processes capturing the basic tenets of bacterial life cycle. These processes are the nutrient uptake, the bacterial cell growth, the division, and the conjugation. The input parameters required for initializing the model are shown in Table 6.

| Design concepts
Basic Principles: The plasmid dispersion depends on an intricate balance between metabolic costs associated to horizontal and   (Rozkov et al., 2004), but there is no consensus on the valid ranges of metabolic costs imposed by the conjugative process. Therefore, in this model the shortterm dynamics of two plasmid system P1 and P2 is simulated.
The plasmid P1 is a complete conjugative plasmid containing all genes required for horizontal transfer and the plasmid P2 is a cheater, which having lost its conjugative genes, depends on the T4SS system from the plasmid P1. In other words, the model is used to assess how large should be the cost difference required for the lack of conjugative apparatus become a true competitive advantage making P2 dominate over P1.
Emergence: The colony growth pattern, the population distribution, and the dominance of a plasmid over another on the bacterial population are global properties arising from local properties defining the agent behavior and the interaction constraints.
Adaptation: All agents adapt their growth rate, as well as the conjugation rates, according to the local availability of nutrient.
Fitness: The bacterial cells infected by any plasmid are considered to behave less efficiently than the plasmid-free cells. The fitness of plasmid-bearing cells is explicitly specified by the cost input parameters. Objectives

| Model analysis
The global sensitivity analysis using the Sobol variance decomposition method for the T4SS common pool model is shown in Figure 12. We can observe the same sequence of steps which has T A B L E 6 The input parameter collection for the conjugative plasmid common pool model The Figure 13 shows the first-and total-order indices for the model output P1. That output is the average number of bacterial cells infected just by the plasmid P1. As can be observed, the most important input parameter is the bacterial cell doubling time followed by the probability P(γ 0 ). This is an expected result as the rule for the conjugative transfer requires the bacterial cells have achieved a value rounding the 70% of cell mass at division. Other interesting aspect to note is the negative values of first-order indices. Obviously, the sensitivity indices should not be negative. This is a consequence of a small sample size, and to correct the problem, we must increase it. The other important input factors for the plasmid P1 output are, in order of importance, the probability P(γ 0 ), the cost of plasmid P2, and the cost of plasmid P1, both with similar sensitivity indices.
The first-and total-order indices for the model output showing average population size of plasmid P2 can be seen in Figure 14. It is possible to appreciate again that the sensitivity indices show that the most important factor is the length of cellular cycle. The reason is simple, and can be attributed to the fact that plasmid P2 alone is only transferred vertically and depends on the plasmid P1 for horizontal transmission, being both aspects related to the cell cycle. Following in importance the doubling time, we have the cost of plasmid P1, the cost of plasmid P2, and the probability P(γ 0 ), being the sensitivity index of P1 cost, noticeably higher than the other two indices. This could be attributed probably because the plasmid P2 requires a significant cost difference in order to outcompete the plasmid P1 which transfer vertically.
Finally, in the Figure 15, we have the sensitivity indices for the output of model accounting for the average population size of bacterial cells infected by both plasmids. The importance of factors is consistent with the explanations for the previous sensitivity indices. Again, the most important model parameter is the doubling time of bacterial cell followed by the P1 and P2 cost parameters and by the probability P(γ 0 ).

| CONCLUSIONS
The ecological modeling is a complex subject which can be normally perceived as being simpler than it actually is. Specifically, the individual-based models are subject to many levels of uncertainty, which means that it is hard to get completely fixed the values of model inputs, the model structure, and the outputs. Normally, there is no complete experimental or observational data to construct mechanistic descriptions of individual, and therefore, many assumptions and simplifications must be made in order to implement a model. The same is true regarding the input values, which are particularly critical in the case of the ecology of microorganism, as normally, just very few input parameters are directly observed and the most of them are estimated from whole population experiments. Therefore, it is always important to bearing mind that modeling is an iterative task which must incorporate compulsorily some what-if analysis of model outputs. Several methods exist for assessing the uncertainty and for estimating the relative importance of input parameters in the model output. We have provided here and overview on those methods which are based on the variance decomposition because they have a wider application scope and are specifically suitable for their use on individual-based models. These methods, although conceptually simple, are computationally intensive and can be somewhat hard to apply because the required tools are either unavailable or they do not provide an easy integration pattern. Roughly speaking, the sensitivity analysis methods require the generation of large sample of the parameter space and the model evaluation for each of them which, of course, makes the manual execution an infeasible option.
The in-silico experimentation is becoming a vital tool for understanding complex phenomena in a way that cannot be performed F I G U R E 1 4 Results of Sobol variance decomposition method for T4SS common pool model. The graph shows sensitivity measures containing the first-order (a) and total-order (b) indices for bacterial population infected by P2 plasmid without modeling. The effective application of computational ecology methods requires a high level of proficiency in many diverse domains of knowledge which sometimes are neither feasible nor practical. Therefore, it is indispensable to have a ready-to-use arsenal of reusable computational tools for modeling and analysis. In this study, we have introduced the R/Repast package and showed how it can help modelers to improve the robustness and quality of individual-based models results using the functionalities inside the package for analyzing systematically the model outputs. The package can save much effort for modelers by providing simple wrappers for complex methods within a simple and consistent API. We hope that these R/Repast functionalities can facilitate enormously the systematic analysis of individual-based models implemented in Repast.

ACKNOWLEDGMENTS
This study was supported by the European FP7-ICT-FET EU research project: 612146 (PLASWIRES "Plasmids as Wires" project) www.

CONFLICT OF INTEREST
None declared.

NOTES
1 Not to be confused with population mean and standard deviation.
2 Also known as relative standard deviation given by CV = σ/μ which provides a normalized version of the standard deviation expressed relatively to the output mean.