D. G. Jenkins (firstname.lastname@example.org), M. Carey, J. Czerniewska, J. Fletcher, T. Hether, A. Jones, S. Knight, J. Knox, T. Long, M. Mannino, M. McGuire, A. Riffle, S. Segelsky, L. Shappell, A. Sterner, T. Strickler and R. Tursi, Dept of Biology, Univ. of Central Florida, 4000 Central Florida Blvd., Orlando, FL 32816-2368, USA.
Isolation by distance (IBD) has been a common measure of genetic structure among populations and is based on Euclidean distances among populations. Whereas IBD does not incorporate geographic complexity (e.g. dispersal barriers, corridors) that may better predict genetic structure, a new approach (landscape genetics) joins landscape ecology with population genetics to better model genetic structure. Should IBD be set aside or should it persist as the most simple model in landscape genetics? We evaluated the status of IBD by collecting and analyzing results of 240 IBD data sets among diverse taxa and study systems. IBD typically represented a low proportion of variance in genetic structure (mean r2=0.22) in part because many studies included relatively few populations (mean=11). The number of populations studied (N) was asymptotically related to IBD significance; a study with 9 populations has only 50% probability of significance, while one with >23 populations will have 90% probability of significance. Surprisingly, ectothermic animals were significantly (p=0.0018) more likely to have significant IBD than endotherms, which suggests a metabolic basis underlying gene flow rates. We also observed marginally significant effects on IBD significance for a) taxa in general and b) dispersal modes within actively-dispersing endotherms. Other factors analyzed (genetic markers, genetic distances, habitats, active or passive dispersal, plant growth form) did not significantly affect IBD, likely related to typical N. For multiple reasons we conclude that IBD should continue as the simplest reference standard against which all other, more complex models should be compared in landscape genetics research.
Gene flow is central to population genetics because the rate of gene flow into a local population contributes to population success or failure, in concert with local selection (Holt and Gomulkiewicz 1997). The converse of gene flow is genetic isolation, and Isolation By Distance (IBD; Wright 1943) analysis has long been the standard approach to express genetic differentiation as a function of distance. Euclidean distance has been used because IBD is based on the island and stepping-stone models of population genetic structure (Wright 1943, Kimura and Weiss 1964), and distance serves as a simple, estimable proxy for the myriad factors that isolate populations. Given this rich history, much has been theorized and advanced through the years regarding IBD (Slatkin 1993, Rousset 1997, Bossart and Prowell 1998, Bohonak 1999).
Throughout its history, IBD estimation evolved to include more sophisticated calculations of genetic structure (Jensen et al. 2005) that kept pace with advances in statistical methods, increasing computer power and rapidly advancing molecular methods. However, the distance component of IBD estimation continues to be based on straight-line Euclidean distance among populations. With the development of landscape ecology came more advanced approaches to represent landscape complexity (Turner and Gardner 1991). The new discipline of landscape genetics (Manel et al. 2003) fuses landscape ecology with population genetics to incorporate geographic information far more sophisticated than Euclidean distance to explain genetic structure (Guillot et al. 2005, Spear et al. 2005, Holderegger and Wagner 2006, Storfer et al. 2007). Landscape genetics was recently defined as “research that explicitly quantifies the effects of landscape composition, configuration and matrix quality on gene flow and spatial genetic variation” (Storfer et al. 2007) and is likely to become more commonly used to analyze genetic structure. Here we treat this approach as being relevant to metapopulation genetics (Olivieri et al. 1995, Hanski and Gaggiotti 2004), given the original definition of a metapopulation as a “population of populations” (Hanski and Simberloff 1997).
What value then remains for IBD analyses? In one possible view, IBD may be considered as a 20th century paradigm, to be fully replaced by 21st century landscape genetics. In that case, IBD would soon be relegated to historical interest only and omitted from future analyses. Alternatively, IBD may continue to be useful as the simplest baseline method for relative evaluation of more sophisticated and difficult analyses. For example, multi-model inference (Burnham and Anderson 2002) can be used to identify population genetic models that most efficiently capture the most information, relative to other models. In that context, IBD may remain useful as the most simple model for relative comparisons among landscape genetics models. We note that IBD is not a null model, but instead predicts genetic differentiation as a function of distance; a null model invokes no such mechanism and is typically based on randomization (Gotelli and Graves 1996).
Interestingly, we found no synoptic evaluation of empirical IBD analyses in the literature that may help clarify its role in an era of landscape genetics. The purpose of this meta-analysis was to evaluate the published evidence for IBD among diverse organisms to answer three questions: 1) are IBD results sensitive to study methods (genetic markers, genetic distance estimators, number of populations)? If so, results must be interpreted appropriately, and IBD analyses may have limited application. If not, IBD may be generally applicable. 2) What patterns emerge in IBD within and among diverse groups of organisms? Some taxonomic groups may be expected to have more significant and clear IBD patterns than other taxa. We do not explore detailed phylogenetic patterns here, but first compare coarse taxonomic groups. We also compared ectothermic vs endothermic animals and organisms grouped by different dispersal modes. 3) Given the above, what role might IBD analyses fill amidst landscape genetic approaches? Questions 1 and 2 were answered by statistical analyses of collected data; question 3 was answered by consideration of answers to questions 1 and 2 and properties of collected IBD studies.
We collected articles during 2008 from the peer reviewed scientific literature that contained genetic distances among populations and geographic distances or location data for those populations. We make no claims that this sample of the IBD literature adequately represents any given taxon or geographic region. Instead, we consider our collection a substantial sample that should represent general patterns.
The population genetics literature is diverse in methods and data reporting customs. No standard approach exists for reporting genetic and geographic distances, so we had to make decisions when processing data. Numerous papers provided only geographic coordinates or maps, which we converted to Euclidean distances among populations using great circle calculations to accurately represent distances on a spherical Earth. Some studies reported statistical outcomes (e.g. Mantel tests) but did not provide genetic or geographic distance data – in those cases we analyzed Mantel test outcomes only but could not compute IBD further.
Genetic distances were reported in the literature as a variety of markers and statistics, though many report pairwise FST values among populations. For those data sets, we calculated genetic distance as [FST/(1−FST)] (Rousset 1997) to standardize values among studies. Because RST, ΦST, and θST are analogues of FST (Halliburton 2004), we calculated equivalent formulae for those genetic distance estimators. We also analyzed data sets reported using Nei's genetic distance (D) data, but without recalculation as above.
We calculated IBD using the IBD Web Service (IBDWS; Jensen et al. 2005), as did some other authors. Statistical analyses in IBDWS included Mantel tests of significant correlation between the [FST/(1−FST)] and loge(distance) matrices (Rousset 1997, Jensen et al. 2005), and results of reduced major axis (RMA) regression (slope, and the coefficient of determination, r2). RMA regression is a form of model II regression and most appropriate when the independent variable (geographic distance in our analyses) includes error (Hellberg 1994, Sokal and Rohlf 1995). RMA regression is standard in IBDWS and appropriate here because we estimated some distances from published maps or geographic coordinates, and thus included error. Regressions analyzed the relationship between [FST/(1−FST)] and loge(distance) per Rousset (1997).
Slope coefficients are not often reported in IBD analyses, apparently related to concerns with non-independence of points. However, we found that published IBD plots often presented the points and slope graphically, and we considered slope to be an important measure of effect (as in any regression). The correlation coefficient r is sometimes reported in IBD analyses; we were interested in r2 because landscape genetics approaches may be viewed as an effort to increase variance “explained” (r2) relative to Euclidean IBD outcomes.
To answer question 1, we tested the hypotheses that IBD outcomes (Mantel p value, slope, r2) varied a) with the number of populations studied (N), b) among genetic markers (e.g. allozymes, microsatellites, etc.), or c) among genetic distance estimators. Analyses of Mantel p values were based on IBDWS analyses and values reported in the literature, whereas analyses of slope and r2 were based on IBDWS analyses alone. We tested for the effect of study scope (N) using regressions, where N was the independent factor and Mantel p value, slope, or r2 were the dependent factors. We selected among several regression models using Akaike information criteria (AIC; Burnham and Anderson 2002). We tested for potential effects of genetic markers and genetic estimators on IBD outcomes in two ways: 1) with Mantel p values, slope, or r2 as continuous data (by factorial ANOVA, with markers, estimators, and marker X estimator interaction as factors), and 2) with Mantel p values represented as binary data (p<0.05 or p>0.05), where markers and estimators were tested by χ2. As an additional test that controlled for potential variation in analytical method, we also conducted these same analyses on IBDWS outcomes alone.
To answer question 2, we tested the hypotheses that IBD outcomes (Mantel p value, slope, r2) varied among taxa, dispersal modes of all organisms (active or passive), metabolic categories of animals (ectotherm or endotherm), or habitats (marine, terrestrial, freshwater). We used factorial ANCOVA for these analyses, where the covariate was N studied.
Finally, when considering the potential future value of IBD (question 3), we computed regressions of the relationship between Mantel p values and IBD r2 and compared regression models using AIC. We also computed a multiple regression to predict IBD r2 by Mantel p values and N. Throughout, variables were transformed as necessary for homogeneity of variance, and statistical analyses were computed with SPSS v16.
We obtained 240 data sets that were analyzed by Mantel tests; of those, 143 data sets were analyzed using IBDWS by us or by authors. Most data sets were recent, based on microsatellite markers and FST (Fig. 1). Though the data may not represent the full history of IBD analyses, they do represent the status of modern IBD analysis around the time that landscape genetics began.
Overall, the average Mantel p value was 0.166, the mean number of populations studied (N) was 11.1, the mean IBD r2 value was 0.22, and the mean slope [(FST/(1−FST)):loge(km)] was 0.81. Of the study parameters we analyzed, only N significantly affected Mantel test p values; significance did not depend on the genetic markers, genetic distance estimators, or a marker X distance interaction in the sampled studies (Table 1). The number of populations (N) was inversely related to Mantel test p values, as one may expect; more populations in a study tended to contribute to lower Mantel test p values. The inverse relationship was significant but not highly predictive of IBD significance (r2=0.104). However, logistic regression of binary Mantel test significance was significant (p<0.001) and fairly predictive (64% correctly predicted) (Table 1). The logistic prediction indicated that >9 populations were needed to achieve >50% probability of significant IBD, >17 populations were needed to achieve 75% probability of significant IBD, and >24 populations were required to achieve 90% probability of significant IBD (Fig. 2). Despite significant effects on Mantel test results, N did not significantly affect IBD r2 or slope (Table 1).
Table 1. Summary of statistical analyses on IBD data. Significant outcomes (p<0.05) are highlighted in bold; marginally significant outcomes are in italics.
(2) Taxa (N)=amphibians (33), reptiles (28), birds (33), fish (42), mammals (45), arthropods (12). The significant interaction for Mantel p values was due to reptiles; when reptiles were removed from analysis, taxa remained significantly different (p=0.042) but interaction was n.s.d. (p=0.468).
(3) Dispersal mode (N)=active (191) or passive (53).
(4) Active dispersal modes within homeotherms (N)=active flying (39), active swimming (18), and active walking (21).
(5) Active dispersal modes within poikilotherms (N)=active flying (13), active swimming (48), active swimming and walking (8), passive (42).
(6) Animal metabolism (N)=poikilotherm (112; 41 n.s.d., 71 sig. Mantel p) or homeotherm (78; 42 n.s.d., 36 sig. Mantel p).
(7) The covariate (number of populations studied) was significant (p<0.05) in all ANCOVAs.
Concordant with results for Mantel tests, IBD slopes and r2 values (from IBDWS analyses) also did not differ significantly among genetic markers or genetic distance estimators. In addition, we found no significant interaction between genetic markers and genetic distance estimators, meaning that IBD significance did not depend on the combination of genetic markers and genetic distance estimators used. Based on the significant effect of N on IBD statistical significance, we included log10(N) as a covariate in subsequent analyses of biological variables.
Among the analyzed biological variables, taxa differed marginally for Mantel test p values when assessed as binary data (i.e. p<0.05 or p>0.05) or continuous variables (Table 1). This weak effect was revealed with a factorial ANCOVA that included the interactive effects of habitat on taxa differences, though habitats did not differ for IBD outcomes even after accounting for taxonomic effects and N. The interaction effect was due to reptiles, which occurred in all habitat categories (terrestrial, amphibious, and aquatic), as revealed by re-analysis after removing reptiles from the data (Table 1). Marginal differences among taxa for Mantel test outcomes did not translate to significant differences among taxa for IBD slopes or r2 values (Table 1). Grouped more broadly, animals and plants also did not have significantly different Mantel test outcomes (ANCOVA, p=0.576; not listed in Table 1).
Several surprises were found in comparisons. IBD outcomes were not significantly different between active and passive dispersers (Table 1). Most passive-dispersing data sets represented non-animal taxa (especially plants), but no significant differences were observed among plant habits (herb, shrub, tree) for IBD outcomes (Table 1). However, ectothermic animals were significantly more likely to have significant IBD than endothermic animals (χ2, p=0.018; Table 1). Within actively-dispersing ectotherms, we compared dispersal modes (walking, flying, swimming) for IBD outcomes but found no significant differences. A marginally significant difference was observed among dispersal modes of actively-dispersing endotherms, but this relatively weak effect did not translate to significant effects on slope or r2.
Finally, Mantel test p values were significantly and negatively related to IBD r2 values, as might be expected, and this relationship co-varied with N (Table 1, Fig. 3). Overall, IBD studies conducted with more populations were more likely to obtain a significant correlation between genetic and geographic distances (significant Mantel p value), but were less likely to obtain a predictive IBD trend (i.e. greater r2; Fig. 3).
Isolation by distance analyses are based on linear distances between populations and are the simplest possible model to characterize metapopulation genetic structure. A simple model may not be expected to respond to interactive details among diverse data sets, and we found that IBD results did not respond to multiple factors. For example, IBD results were insensitive to the choice of genetic marker or genetic distance estimator. Rather than indicating that IBD is robustly diagnostic across study systems, we view this result as indicating that IBD is a fuzzy instrument due to its simplicity and the often-small study scope used. Many IBD studies have not included sufficiently large numbers of populations to be assured of detecting IBD if it exists, as evidenced by ~55% probability of significant IBD for the mean number of populations studied (N=11). Studies that included more populations had greater probability to observe significant IBD, but we found relatively few such studies.
We found it interesting that predictive capability (r2) for IBD results was inversely correlated (though not strongly so) with N. We hypothesize that IBD analyses fail to capture nonlinear effects of complex landscapes and historical processes (e.g. postglacial dispersal pathways) on genetic structure as more populations across a landscape were included in a study. Given the advent and rapid growth of landscape genetics approaches (Manel et al. 2003, Storfer et al. 2007), this is obviously not a novel hypothesis. However, no synoptic meta-analysis of empirical IBD outcomes has existed to support this conceptual and methodological transition.
The contrast between the effects of N on Mantel p and IBD r2 indicates the promise of landscape genetics for better understanding metapopulation genetic structure. Landscape genetics models that incorporate factors beyond Euclidean distance (e.g. potential dispersal barriers) are likely to increase the variance represented (i.e. attain greater r2) to essentially fill in the empty right-hand wall of Fig. 3 and enable strongly predictive models of population genetic structure among many populations. We predict that a similar, future meta-analysis of landscape genetics outcomes will find less contrast between Mantel p and “explained” variance due to N, as was observed in this study of IBD outcomes.
Phylogenetic and physiological traits appeared to underlie IBD patterns, if our crude taxonomic and metabolic categories were any indication. Clearly, a more sophisticated and sensitive approach to phylogenetic signatures (Webb et al. 2002) is needed to more fully explore this possibility, given the availability of robust supertrees among diverse taxa. Whereas endothermic animals were essentially equiprobable for significant or nonsignificant IBD, ectothermic animals were nearly twice as likely to exhibit significant IBD. The difference between ectotherms and endotherms for IBD results suggests a metabolic basis for gene flow and population genetic structure. Ectotherms must modulate activity and/or location based on external temperatures, and are necessarily constrained to disperse within strict temperature limits and temporal windows of opportunity (Janzen 1967, Ghalambor et al. 2006). Endotherms may be less constrained by those same conditions, and thus significant IBD for a endothermic organism may be more heavily weighted to processes other than climatic conditions, such as physical/chemical habitat, food web structure, etc. If so, then population genetic structure of ectothermic animals may be more strongly affected by climate change than endotherms, but endotherms may be able to adjust ranges with climate change more readily (assuming other factors such as habitat fragmentation are not important). Ectotherms and endotherms have been compared for range geometry and evolutionary rates (Martin and Palumbi 1993, Pfrender et al. 1998) but macroecological differences among diverse ectotherm and endotherm lineages for gene flow have not been examined to date. Such a macroecological investigation will need to address body size because it may express potential ectotherm-endotherm differences in combination with standardized temperatures (Gillooly et al. 2001) and because it affects dispersal distances among active dispersers (Jenkins et al. 2007). Analyses of study scale effects (Gaston and Blackburn 1996) and phylogeny (Webb et al. 2002) will also be valuable to understand and predict effects of broad-scale factors (e.g. climate change, fragmentation) on genetic structure of diverse organisms.
Different dispersal modes (flying, crawling, swimming) contributed to variation among active dispersing endotherms for Mantel p values, but this effect did not translate to IBD slope or r2. These traits should be influenced by phylogenetic effects, but further analysis with an even greater data set will be needed to parse the effects of taxa (e.g. birds, bats, and insects) separate from dispersal modes (e.g. flying). At this point, only weak evidence exists for differences among very different dispersal modes for metapopulation genetic isolation. We do not expect this statement to persist after landscape genetics approaches have been more fully applied.
Overall, our analyses do not present a compelling case for IBD as a stand-alone analytical approach to describe population genetic structure of metapopulations; this 20th century method does not suffice in the 21st century. Indeed, IBD is typically but one of several analyses in most population genetics papers we collected, though these studies did not apply landscape genetics approaches. The relative insensitivity of IBD may be related to the generally low predictive ability of many published IBD outcomes (>50% of r2 values were <0.20). In other words, it is difficult to detect subtleties when the picture is blurry.
Despite its shortcomings, we recommend that IBD analysis continue to be a vital component of landscape genetics studies. The shortcomings we report here are partly an effect of relatively few studied populations. As more populations are studied in large-scale landscape genetics efforts, IBD patterns may be better resolved. The logistic model presented here for IBD significance as a function of N (Fig. 3) may serve as one guideline for adjusting sampling scale to obtain more predictive IBD results, and may help guide landscape genetics analyses as well.
Isolation by distance should serve as the baseline for evaluation of more complex landscape genetics models that may exceed IBD's ability to represent spatial pattern in genetic structure. The extent that a landscape model exceeds an IBD model is the important point, and thus reveals the value of IBD to landscape genetics. We do not suggest that the basic theory underlying IBD will be overthrown, but rather that the actual dispersal distances among populations will become better estimated, so that the simple proxy (Euclidean distance) can be surpassed by more realistic dispersal pathways. Multimodel inference approaches (Burnham and Anderson 2002) should be used to compare alternative landscape genetic models to IBD, and we suggest that this approach be considered as a requirement for landscape genetics analyses. In this manner, IBD will persist as part of 21st century analyses of metapopulation genetics, but will do so in a role that highlights the added strength of landscape genetics approaches and relates 21st century approaches to those of the 20th century for continuity.
We thank Eric Hoffmann, Doug Bruggeman, and an anonymous reviewer for helpful comments, Andy Bohonak and colleagues for providing IBDWS as a valuable online tool for IBD analyses, and all the authors of papers we meta-analyzed for their hard work in collecting and publishing their results.