SimAdapt: an individual-based genetic model for simulating landscape management impacts on populations

Authors

  • François Rebaudo,

    Corresponding author
    1. Biodiversité et évolution des complexes plantes-insectes ravageurs-antagonistes, IRD-BEI-UR072, Gif-sur-Yvette Cedex, France
    2. Laboratoire Évolution Génome et Spéciation, CNRS-LEGS-UPR9034, Gif-sur-Yvette Cedex, France
    Search for more papers by this author
  • Arnaud Le Rouzic,

    1. Laboratoire Évolution Génome et Spéciation, CNRS-LEGS-UPR9034, Gif-sur-Yvette Cedex, France
    Search for more papers by this author
  • Stéphane Dupas,

    1. Biodiversité et évolution des complexes plantes-insectes ravageurs-antagonistes, IRD-BEI-UR072, Gif-sur-Yvette Cedex, France
    2. Laboratoire Évolution Génome et Spéciation, CNRS-LEGS-UPR9034, Gif-sur-Yvette Cedex, France
    Search for more papers by this author
  • Jean-François Silvain,

    1. Biodiversité et évolution des complexes plantes-insectes ravageurs-antagonistes, IRD-BEI-UR072, Gif-sur-Yvette Cedex, France
    2. Laboratoire Évolution Génome et Spéciation, CNRS-LEGS-UPR9034, Gif-sur-Yvette Cedex, France
    Search for more papers by this author
  • Myriam Harry,

    1. Laboratoire Évolution Génome et Spéciation, CNRS-LEGS-UPR9034, Gif-sur-Yvette Cedex, France
    Search for more papers by this author
  • Olivier Dangles

    1. Biodiversité et évolution des complexes plantes-insectes ravageurs-antagonistes, IRD-BEI-UR072, Gif-sur-Yvette Cedex, France
    2. Laboratoire Évolution Génome et Spéciation, CNRS-LEGS-UPR9034, Gif-sur-Yvette Cedex, France
    3. Facultad de Ciencias Naturales y Biológicas, Pontificia Universidad Católica del Ecuador, Quito, Ecuador
    Search for more papers by this author

Summary

  1. Simulation models are essential tools in landscape genetics to study how genetic processes are affected by landscape heterogeneity. However, there is still a need to develop different simulation approaches in landscape genetics, so that users may dispose of additional programs to explore further the impact of land-use and land-cover changes on population genetics.
  2. We developed a spatially explicit, individual-based, forward-time, landscape-genetic simulation model combined with a landscape cellular automaton to represent evolutionary processes of adaptation and population dynamics in changing landscapes, using the NetLogo environment.
  3. This simulation model represents a unique tool for scientists and scholars looking for a practical and pedagogical framework to explore both empirical and theoretical situations.

Introduction

Evolutionary ecologists have increasingly recognized the importance of local adaptive genetic variation to predict population response to changing environmental conditions (Manel & Segelbacher 2009). During the last 5 years, the field of landscape genetics has developed numerous simulation modelling approaches to better understand population adaptation in natural conditions (Balkenhol & Landguth 2011), in particular through backward-time simulation models allowing landscape-genetic inferences (e.g. ibdsim, Leblois, Estoup & Rousset 2008; splatche2, Ray et al. 2010; simssd, Legendre, Borcard & Peres-Neto 2005; see Hoban, Bertorelle & Gaggiotti 2012 for review). In parallel, population-genetic, forward-time, simulation models (e.g. cdpop, Landguth & Cushman 2010b; simupop, Peng & Kimmel 2005; nemo, Guillaume & Rougemont 2006; quantinemo, Neuenschwander, Hospital & Goudet 2008; see Hoban, Bertorelle & Gaggiotti 2012 for review) have been developed to explore and test empirical hypotheses in silico. However, in a context where equilibrium is the exception, especially in landscapes under anthropogenic disturbance (Epperson et al. 2010), the possibility of modifying landscape characteristics in most simulation software remains rudimentary or only accessible to skilled programmers (e.g. cdpop, splatche2). If the identification of appropriate spatial and temporal scale is still a challenge (Balkenhol et al. 2009), a flexible spatially explicit landscape submodel for landscape genetics is required for realistic simulations to be studied and compared (Balkenhol & Landguth 2011).

To contribute filling this gap, we developed a landscape-genetic, forward-time, simulation model in which users can easily and explicitly represent landscape changes over space and time, thereby simulating the genetic dynamics of individuals in landscapes with different management scenarios (see Table 1 for a comparison of simadapt with splatche2 and cdpop). The main innovations of our model are the dispersal submodel (behavioural model allowing complex movements, see Supporting information) and the natural selection submodel including any number of loci under selection (most traits are known to be the result of multiple loci). Our software uniqueness relies on the coupling of two submodels: a population-genetic submodel taking advantage of an individual-based approach (see Landguth et al. 2010), and a cellular automaton, landscape submodel. The later allows the integration and exploration of complex landscape scenarios, empowered by the large literature existing on land-use and land-cover changes (e.g. Parker et al. 2003; Crespo-Pérez et al. 2011). The simulation model is user-friendly to target a broad audience of scientists [graphical user interface (GUI), see Fig. 1], cross-platform and freely available on the openabm website (http://www.openabm.org/models/). It is written in the high-level language netlogo (Wilensky 1999), which confers easily extensions and navigation through the annotated source code (see Railsback & Grimm 2012 for an introduction to individual-based models and netlogo). In addition, it generates output files directly readable by most used and up-to-date population genetics computer programs (see Excoffier & Heckel 2006 for a selected list), and can be run from the free statistics software r (R Development Core Team 2012; see Thiele, Kurth & Grimm 2012, and Appendix S1, Supporting information). Here, we describe the model and the corresponding implementation, and exemplify its potential uses.

Table 1. Features of cdpop (v1.2.08beta), simadapt and splatche2 (v2.01)
SoftwareOperating System and interfaceModelData typeDispersal modelPopulation dynamicsNatural selectionLandscape dynamicsSensitivity analysis moduleOutput files
  1. GUI, graphical user interface.

cdpop (Landguth & Cushman 2010b)Linux, Windows (GUI and console)Individual-based modelAllele frequencies, mtDNA optionDispersal distance functions (sex-specific)Fluctuating population size: carrying capacity, birth and death rates.One or two biallelic loci (% offspring mortality per habitat type)Input files loaded at defined generations: landscape resistance (mating and dispersal)Option to perform multiple runs and explore parameters

Input files for Structure, GenePop, Genalex.

csv raw files and build-in analyses

simadapt Any OS with Java virtual machine (GUI, console and r package)Individual-based modelSTRs, SNPsMigration rate and dispersal distance (behavioural submodel)Logistic growth modelAny number of biallelic loci per habitat type (s and h coefficients)Independent submodel (cellular automaton): natural selection, carrying capacity, landscape resistance with build-in flexible scenarios of land-use and land-cover changes.Dedicated software within NetLogo to explore parameters with multiple runs (behaviour space)

Input files for Structure, FSTAT, Arlequin, GenePop, Geneland.

csv raw files and virtual sampling model

splatche2 (Ray et al. 2010)Linux (console), Windows (GUI and console)Coalescent modelDNA sequences, SNPs, STRs, RFLPsMutation rateLogistic growth modelNoInput files loaded at defined generations: carrying capacity, landscape resistanceOption to perform multiple runs

Input files for Arlequin, Nexus.

Build-in ascii/bmp files (analyses)

Figure 1.

Graphical user interface from simadapt in different operating systems. simadapt has a simple layout of options divided into areas such as: (1) working directory; (2) landscape features; (3) genetic features; (4) simulation features; and (5) visualization.

Components of SimAdapt

SimAdapt simulates the evolution of both neutral and adaptive genotypes of diploid, sexually reproducing individuals introduced in a landscape. It accounts for the transmission of genes according to Mendelian inheritance laws, dispersal and adaptation to local conditions. This model combines a landscape cellular automaton submodel (CA) and an individual-based, spatially explicit submodel (IBM). Each time step in the model corresponds to one generation for individuals (with non-overlapping generations).

Cellular automaton

The CA, represented by an area with closed boundaries, and which dimensions are user-defined, includes a three-layer georeferenced information system (allowing text files integration from classic GIS programs) to characterize the landscape: (i) the available resources (carrying capacity for individuals), (ii) the landscape resistance (permeable or semi-permeable barriers for migration) and (iii) the type of habitat (for natural selection). The landscape characteristics can vary over space and time with five different predefined scenarios: L1) no changes in the type of habitat; L2) random changes; L3), L4) changes to an adjacent habitat type (using the Moore -8 nearest neighbours- and the von Neumann-4 nearest neighbours-neighbourhood (von Neumann 1948), respectively); and L5) transition towards a homogeneous landscape consisting of one habitat type. With basic programming skills, additional scenarios can be easily defined with the possibility to set habitat type changes over time through the GUI (see Fig. S2 in Supporting information).

Individual-based model

The IBM represents the individuals living in the landscape (parameters of spatial dynamics and local adaptation). Initially, individuals are located either at a given set of coordinates or homogeneously over the landscape. They are characterized by a dispersal capability affecting the spatial pattern (see Supporting information for a detailed description and Epperson et al. 2010 for a discussion). Briefly, individual can move from one cell to another located in the eight nearest neighbours up to a maximal dispersal distance (i.e. individuals potentially move across multiple cells). Each cell is characterized by its resistance (from permeable to impermeable cells), and each individual is characterized by its dispersion capabilities. The decision whether to move to another cell is based on a fixed probability (i.e. migration rate), and the destination cell is chosen randomly among potential destinations (see Fig. 2).

Figure 2.

Dispersal representation of individuals across the landscape. An individual is located in initial cell with a dispersion capability of 100 and a maximal dispersal distance of two cells. Potential destinations are represented in dark grey and movements with arrows.

Markers under selection

Reproductive fitness of individuals is the result of the habitat-dependent selection. The population-genetic submodel within the IBM assumes one or several bi-allelic loci under selection per habitat type. Alleles are either counter selected (represented by ‘0’) or selected (represented by ‘1’) in each habitat type. Being adapted to a given habitat is considered advantageous while the individual is in this habitat, but harmful in others habitats. This allows the program to simulate multiple loci which is important to analyse selection and speciation (Epperson et al. 2010). Following the notation used by Hartl & Clark (2007), the value s is the selection coefficient against the homozygous G00 genotype (where the indices refer to the first and second allele at the locus k under selection) and h is the degree of dominance of the 0 allele.

Consequently, at the locus k associated with habitat type b1, while b2 representing another habitat type, the relative fitness w of genotypes G11, G10 and G00, for individuals located in habitat type b1 and b2, respectively, are:

display math(eqn 1a)
display math(eqn 1b)
display math(eqn 1c)

A multiplicative model with no epistasis is assumed, and the selective value of a given genotype is the product of the selecti (Wade et al. 2001):

display math(eqn 2)

with I an individual on habitat type b1.

For each location (grid cell) in the landscape, individuals (the parental generation) produce N offsprings, with N following a logistic growth function based on the number of individuals in the considered location, a user-defined intrinsic growth rate and a carrying capacity defined by the habitat resources. For each offspring, two parents are drawn randomly among parental individuals at the same location, proportionally to their fitness, and a gamete is generated from each parent (allowing the genealogy to be reconstructed from output files). The genetic transmission follows the Mendelian inheritance laws, assuming free recombination between loci.

Neutral markers

In addition to the set of loci involved in adaptation, n neutral independent loci (microsatellites typically) can be considered. Z alleles can be present simultaneously in the population at each locus (at the initialization step, alleles are chosen in a normal distribution with a user-defined standard deviation conditioning on the number of alleles and the expected heterozygosity, see Fig. S6 in Appendix S1, Supporting information; alternatively, users can choose biallelic loci with ‘0’ and ‘1’ to represent SNPs). Mutational events, at a rate of μ mutations per allele per generation, replace allele z by allele + 1 or by allele z−1, according to a classical stepwise mutation model. As for loci under selection, Mendelian inheritance and free recombination between loci are assumed for the neutral loci.

Model output

The characteristics of each individual are stored in a file including habitat type and coordinates during time (see Table S2 in Appendix S1, Supporting information for an exhaustive list of characteristics). To allow direct comparison between simulated and experimental data through specialized programs, the simulation model includes an empirical sampling module. A number of recollection points per habitat type are defined by the user, each of them with a given number of sampled individuals (see Zurell et al. 2010 for a discussion on sampling in simulation models). The output file contains the microsatellite genotypes of sampled individuals in a format that can be processed by most used population genetics software including genepop v4.1 (Rousset 2008), arlequin v3.1 and v3.5 (Excoffier, Laval & Schneider 2005), strucure v2.3.3 (Pritchard, Stephens & Donnelly 2000), geneland v3.3 (Guillot, Mortier & Estoup 2005) and fstat (Goudet 1995).

Implementation

A complete description and documented verification of the simulation model following the Overview, Design concepts, Details (ODD) protocol for describing individual- and agent-based models (Grimm et al. 2006, 2010) and the executable code are provided as Supporting information (Appendices S1 and S2), along with SimAdapt validation. Basic validation tests were performed to make sure that the software results are consistent with classical population-genetic models, including the comparison of (i) alleles frequencies at loci under selection with an analytical model of infinite population size, (ii) heterozygosity measures at neutral markers with theoretical expectations in a single panmictic population and (iii) fixation index between populations as a function of time with theoretical expectations in an island model (see Supporting information). The code is documented and structured to be modified and extended with basic programming skills. If a reasonable simulation size should not include more than 10,000 individuals in a 100 grid cells landscape with 100 neutral markers and 100 loci under selection in most computers, our simulation model has no imposed limits and allows any users to facilitate the process of getting the first insights to explore landscape-genetic complexity and to visualize local adaptation.

Study example

We present here an example to provide a brief overview of the simulation model functionalities and coupling with existing programs. We simulated 100 individuals with = 10 microsatellites loci introduced in one grid cell of a landscape composed of two habitat types with one locus under selection per habitat type (habitat types b1 and b2, see Fig. 3 and Supporting information for parameterization). Simulations were repeated 100 times for 100 generations, and output sampled for each habitat type. Three different habitat configurations were tested (see Fig. 3a): C1) random location of habitat types with landscape scenario L2; C2) blocks of habitat types with scenario L1 (i.e. no changes); and C3) isolated habitat types with scenario L1. These three basic scenarios represented C1) a rapidly changing heterogeneous landscape, where individuals have to adapt constantly, C2) a spatial transition between two distinct habitats (e.g. field/forest; urban area/natural area) and C3) a more heterogeneous landscape arising from habitat fragmentation. They were provided to give a basic understanding of the program functionality and performance, but also to illustrate the ability to integrate complex landscapes changing over space and/or time. In Fig. 3b, we mapped the distribution of allelic frequencies at a locus under selection using all individuals. Using output files containing microsatellites data, we generated basic genetic analyses with arlequin, with the assumption that individuals located in different habitat types correspond to distinct population. We then used geneland to visualize the correlation between habitat types and population structure (see Fig. 3c). Simulation examples can be reproduced through the GUI (<40 s per simulation in a laptop using MS Windows Vista, 4Go RAM, CPU 3.06 GHz).

Figure 3.

Example of simadapt use with (a) three landscape configurations build in the graphical user interface (GUI) referred as C1, C2 and C3 with b1 in light grey and b2 in dark grey representing two different habitat types and the cross the initial locations of individuals; (b) simulation results showing the frequency of one allele at a locus under selection of the first habitat type after 100 generations, ranging from zero in black to one in white; (c) population assignment analysis based on neutral markers using geneland, with each color representing a different population and each cross a point sampled.

Conclusion

Thanks to its modularity, this software represents a unique tool to explore the interactions between gene flow, population dynamics, selection and landscape management, to link landscapes and adaptive genetic variation (Parisod & Holderegger 2012). Relying on a coupled individual-based model and cellular automaton, this framework allows the integration of complex patterns including spatially and temporally explicit landscapes described at an accurate level. This singular coupling favours the identification of land-use and land-cover change management scenarios driving population structure. From theoretical cases to empirical studies, it should facilitate our understanding of landscape genetics and represents a promising tool for scientists and scholars willing to explore ecosystems subject to anthropogenic disturbance and social-ecological systems complexity.

Users are encouraged to visit and submit comments to the simadapt web page hosted by the openabm consortium (http://www.openabm.org/models/).

Acknowledgements

This work is part of the project ADAPTANTHROP ANR-097-PEXT-009- supported by the French Agence Nationale de la Recherche. This research was also in part supported by the IDEEV French institute (project GEN-SPAT) and the University Paris-Sud 11 (Attractive project). ALR was partially supported by the European program FP7/2007-2013 through the Marie Curie reintegration grant ERG-256507. The authors thank Pierre Gérard for helpful comments.

Ancillary