MEMGENE: Spatial pattern detection in genetic distance data

Authors


Summary

  1. Landscape genetics studies using neutral markers have focused on the relationship between gene flow and landscape features. Spatial patterns in the genetic distances among individuals may reflect spatially uneven patterns of gene flow caused by landscape features that influence movement and dispersal.
  2. We present a method and software for identifying spatial neighbourhoods in genetic distance data that adopts a regression framework where the predictors are generated using Moran's eigenvectors maps (MEM), a multivariate technique developed for spatial ecological analyses and recommended for genetic applications.
  3. Using simulated genetic data, we show that our MEMGENE method can recover patterns reflecting the landscape features that influenced gene flow. We also apply MEMGENE to genetic data from a highly vagile ungulate population and demonstrate spatial genetic neighbourhoods aligned with a river likely to reduce, but not eliminate, gene flow.
  4. We developed the MEMGENE package for R in order to detect and visualize relatively weak or cryptic spatial genetic patterns and aid researchers in generating hypotheses about the ecological processes that may underlie these patterns. MEMGENE provides a flexible set of R functions that can be used to modify the analysis. Detailed supplementary documentation and tutorials are provided.

Introduction

Describing spatial genetic patterns and inferring the ecological and evolutionary processes underlying them is a central task in landscape genetics (Manel et al. 2003; Segelbacher et al. 2010; Storfer et al. 2010). Landscape genetic analyses investigating organism movement have typically associated the genetic distances among individuals or populations at sampling locations with distance-based ecological data measured among the same locations (Epps et al. 2005; Galpern, Manseau & Wilson 2012; Koen et al. 2012; Robinson et al. 2012). This link level analysis (sensu Wagner & Fortin 2013), named for its focus on the links among sampling nodes, is conceptually appealing because it can directly represent the key variables of interest (i.e. the movement of genes as well as the probability of organism dispersal among locations given the ecological context). However, link level analysis has typically used partial Mantel tests and multiple regression based on distance matrices (MRDM) which have accumulated extensive criticism (Legendre & Fortin 2010; Guillot & Rousset 2013).

Here, we describe software that permits a neighbourhood level analysis of genetic data (sensu Wagner & Fortin 2013) where variation in the links among a set of sampling nodes is summarized and mapped back onto each of those nodes. Our software retains this conceptual advantage of comparing among locations when identifying these neighbourhoods.

The objective of our framework is to find and visualize spatial neighbourhoods in genetic distance data. To do this, we combine the following: (i) Moran's eigenvector maps (MEM; related to the early PCNM approach by Borcard & Legendre 2002) a powerful technique for the multiscalar analysis of spatial patterns (Dray, Legendre & Peres-Neto 2006; Griffith & Peres-Neto 2006) with (ii) a regression framework in which genetic distance matrices are regressed against raw predictors (i.e. not transformed into distances) proposed by McArdle & Anderson (2001) to eliminate the issues related to the Mantel regression of distances on distances. MEM has been widely applied to study spatial variation in beta diversity (Dray et al. 2012) and has been recommended as a tool for spatial genetics (Jombart, Pontier & Dufour 2009; Epperson et al. 2010; Manel et al. 2012; Wagner & Fortin 2013). The main product of the software is the MEMGENE variables, which together represent significant spatial genetic patterns at multiple spatial scales. These can be used for visualization of patterns or as variables in other ecological analyses.

A full description of the MEMGENE analytical framework is presented in Appendix S1. Below, we show the results of simulations designed to assess the MEMGENE framework. For further illustration of MEMGENE using field collected data, we also apply MEMGENE to detect and visualize spatial genetic patterns within a woodland caribou (Rangifer tarandus caribou) population in Northwest Territories, Canada (see Appendix S2).

Assessing the framework using simulations

Method

We simulated genetic data with expected spatial genetic patterns to explore the power of the MEMGENE analytical framework. We used the agent-based programming language NetLogo (Tisue & Wilensky 2004) to model sexually reproducing individuals moving and mating across multiple generations according to different levels of landscape connectivity (simulation approach described in Appendix S3). In each simulation, organism vagility and the configuration of landscape features presenting resistance to movement were manipulated to produce distinct spatial genetic outcomes (Fig. 1). We developed five cases: (i) a panmixia model where high vagility (i.e. dispersals crossing the entire landscape) and an absence of features presenting resistance movement should produce no spatial genetic pattern (Fig. 1a); (ii) a uniform model where lower vagility should produce isolation by distance (IBD; Wright 1943) and a spatial genetic pattern that could not be predicted a priori (Fig. 1b); (iii) a fragmented model where despite high vagility the configuration of high resistance features should produce a clustered spatial genetic pattern (Fig. 1c); (iv) a radial model, where three semi-permeable barrier features of different widths should also produce clustering (Fig. 1d) and (v) a river model of a sinuous linear habitat, where low vagility and restrictions on dispersal should produce a spatial genetic gradient following the path of the river (Fig. 1e).

Figure 1.

Five simulations to create spatial genetic patterns using an agent-based landscape genetic simulator. The left column gives the resistance surface used to influence dispersal and subsequent mating (grey pixels have 20× more resistance to movement than white pixels). The centre column demonstrates the meaning of the vagility parameter in the context of the resistance surface, showing a sample of 200 dispersal trajectories for the simulated organisms. The right column illustrates the approximate genetic pattern expected given the input surface and vagility. Note that in the radial case (d), each arm has a different thickness, implying different levels of permeability.

Results

The mean adjusted R2 results for 100 replicate simulations for each of the five cases are shown in Fig. 2, demonstrating that the simulations have generated genetic variation that can be explained by spatial patterns. In the panmixia, uniform, fragmented and radial cases, an equilibrium in adjusted R2, and therefore in the amount of spatial genetic pattern, is reached using these simulation parameters after approximately 25 generations and by 300 generations in the river case.

Figure 2.

The amount of genetic variation explained by spatial patterns (R2adj) for the five simulations over 300 generations. Standard errors in R2adj for the 100 replicates of each simulation were all less than ± 0·02 and were not plotted. MEMGENE analyses were conducted at the plotted generations.

In the panmixia case (Fig. 3a), as expected, the amount of the spatial genetic pattern explained was lowest (R2adj ≈ 0·01 at equilibrium; Fig. 2) and the mapping of the scores at generation 300 did not reveal any particular spatial pattern (although groups of circles with similar size and colour are evident). In the uniform case (Fig. 3b), a particular spatial pattern is again not evident on the maps; however, the considerably higher R2 indicates that much more of this spatial genetic pattern is explicable as might be expected under IBD (R2adj ≈ 0·075 at equilibrium; Fig. 2). In the fragmented, radial and river cases, the expected spatial genetic pattern is clearly discernible at generation 300. In the fragmented case (Fig. 3c), circles of similar size and colour are found in proximity on both MEMGENE1 and MEMGENE2. In the radial case (Fig. 3d), evidence of the three clusters is discernible using MEMGENE1 only, while MEMGENE2 demonstrates that the thinnest (i.e. the most permeable) arm of the radial structure has permitted a weaker scale of genetic pattern to develop among the top two regions (Fig. 3d, black circles). And in the river case (Fig. 3e), the expected gradient emerges in MEMGENE1, while MEMGENE2 reflects a more local pattern.

Figure 3.

Visualizations of the first two MEMGENE variables for generation 2 where genetic pattern should be weak in all cases, and for generation 300 where genetic pattern should be at or near an equilibrium state. Scores of individuals on these variables are superimposed on the resistance surface that generated the data. Circles of a similar size and colour indicate individuals with similar scores (large black and white circles describe opposite extremes on the MEMGENE axes). Only MEMGENE1 and MEMGENE2 are depicted to simplify presentation. In all cases, these two variables together describe the majority of the spatial genetic pattern.

Discussion

MEMGENE is intended for applications where the spatial component of genetic variation is uniquely of interest (e.g. for studying movement and dispersal using neutral markers) and may be particularly useful where a high amount of gene flow is likely and patterns are expected to be cryptic, such as within, rather than between, genetically distinct populations (Epps et al. 2005; Galpern, Manseau & Wilson 2012; Koen et al. 2012; Robinson et al. 2012). In such cases, genetic noise is inherent to the task, and our framework provides a means to capture the spatial signal either for visualization or for inference about ecological processes.

Numerous tools, applying a broad range of methods, have been developed for analysing population and spatial genetic structure (Pritchard, Stephens & Donnelly 2000; Corander et al. 2004; Miller 2005; Chen et al. 2007; Guillot, Santos & Estoup 2008; Jombart, Devillard & Balloux 2010). MEMGENE is most closely allied with tools that use multivariate ordinations of genetic variation, although many of these do not distinguish between spatial and non-spatial genetic variation (e.g. principal component, principal coordinate, and discriminant analyses of genetic variation; Novembre & Stephens 2008; Jombart, Pontier & Dufour 2009). Among ordination techniques, spatial principal component analysis (sPCA; Jombart et al. 2008) may be the most similar to MEMGENE in that it incorporates positive and negative spatial autocorrelation of genetic data and shares the objective of revealing cryptic spatial genetic patterns. Beyond these apparent similarities, however, the two methods are fundamentally different. Please see Appendix S4 for a full discussion of the similarities and differences between sPCA and MEMGENE and a comparison of visualizations based on simulated genetic data sets. Although additional work is required to fully assess the relative performance of these two methods, our assessment suggested that MEMGENE may be more capable at identifying weak spatial genetic patterns in contrast to sPCA.

This ability to perform well when spatial signal is weak is a key advantage of MEMGENE that comes from the use of regression to identify significant spatial genetic patterns. Regression is also an advantage in that it enables an assessment of the amount of genetic variation that is associated with spatial pattern (i.e. adjusted R2). Another key contribution of MEMGENE is the use of the Moran's eigenvector maps (MEM) to describe complex spatial genetic patterns. Together, these features make MEMGENE a powerful tool for detecting statistically significant spatial genetic neighbourhoods.

We anticipate MEMGENE will be used as a tool for identifying weak and cryptic spatial genetic patterns and to generate hypotheses about the landscape processes that may be influencing these patterns. For inference about these causal relationships, the output MEMGENE variables may be useful as dependent or independent variables in subsequent analyses. It is also possible to provide MEMGENE with ecological (e.g. least cost path) distances rather than Euclidean ones when generating the Moran's eigenvector maps. Additional work is required to test the effectiveness of such analyses, but, in theory, this could permit inference about landscape hypotheses directly. In this regard, MEMGENE is both a tool for exploring ecological influences on genetic pattern and for the analysis of genetic data at multiple spatial scales.

MEMGENE software package

The MEMGENE package for R is available for download from the CRAN repository. The package may be installed on any operating system by typing the following at the R command prompt: install.packages(‘memgene’). Included with the package are the tutorials (see also Appendix S5), documentation of R functions (see also Appendix S6), as well as all the simulated and caribou data sets used in this paper.

Acknowledgements

This work was funded by Natural Sciences and Engineering Research Council. We thank the Sahtú Renewable Resources Board and the Renewable Resource Councils of Fort Good Hope, Tulı́t'a, Délı̨nę, and Norman Wells, Northwest Territories, Canada. Genotyping analyses were provided by M. Kerr, C. Klütsch and P. Wilson, Forensic Science Program, Trent University.

Data accessibility

This manuscript describes an R package. All data that appears in the manuscript are included in the R package which is itself currently accessible on the official CRAN repository: http://cran.r-project.org/web/packages/memgene/index.html.

Ancillary