Recent studies have increasingly implicated deep (pre-Pleistocene) events as key in the vertebrate speciation, downplaying the importance of more recent (Pleistocene) climatic shifts. This work, however, has been based almost exclusively on evidence from molecular clock inferences of splitting dates. We present an independent perspective on this question, using ecological niche model reconstructions of Pleistocene Last Glacial Maximum (LGM) potential distributions for the Thrush-like Mourner (Schiffornis turdina) complex in the neotropics. LGM distributional patterns reconstructed from the niche models relate significantly to phylogroups identified in previous molecular systematic analyses. As such, patterns of differentiation and speciation in this complex are consistent with Pleistocene climate and geography, although further testing will be necessary to establish dates of origin firmly and unambiguously.
Here, we integrate the Pleistocene speciation question with niche conservatism ideas to develop a novel hypothesis regarding the Pleistocene speciation question. We use as a test-bed the Schiffornis turdina complex, which has been analyzed recently in terms of molecular phylogeography by one of us (Nyári 2007). This neotropical frugivorous bird constitutes a complex, long treated as one overly inclusive species taxon, shows concordant variation in qualitative vocal (note structure, number of notes, note frequency range) and mitochondrial molecular characters (ND2, COI, and cyt b genes), suggesting that six to seven well-supported (i.e., high bootstrap support under various manipulations) species can be recognized within the complex. The molecular studies provide detailed geographic information on the occurrence of each molecular phylogroup (Nyári 2007). Here, our goals are to (1) examine the extent of ecological niche conservatism in the group as a whole (at least in spatial dimensions) using spatial stratification methods presented previously (Peterson and Holt 2003); (2) integrate current-climate ENMs with new, fine-scale Last Glacial Maximum (LGM) Pleistocene climate summaries to estimate a LGM potential range for the complex; and (3) test the consistency of molecular phylogroups with Pleistocene refugial distributions—that is, whether the spatial distribution of Pleistocene refugia has explanatory power regarding historical patterns of speciation, as reflected in molecular differences, providing a novel perspective on the question of vertebrate speciation in the Pleistocene.
Occurrence data for the S. turdina group were drawn from data associated with natural history museum specimens (see Acknowledgments), for a total of 227 unique occurrences across the geographic range of the complex, covering all named forms and all molecular phylogroups. We assigned geographic coordinates to textual locality descriptions by means of reference to online gazetteer databases (Alexandria Digital Library Project, http://middleware.alexandria.ucsb.edu/client/gaz/adl/index.jsp), achieving a spatial precision of ∼0.1′ of latitude and longitude. Of these sites, we identified 38 localities that correspond to samples for which molecular sequence data were included in the recent molecular analysis (Nyári 2007), which form the basis of our testing for agreement between molecular and ecological datasets (see below).
Climate data for the present day (1960–1990) were drawn from the WorldClim climate archive (Hijmans et al. 2005a). In particular, in view of the broad latitudinal range of the complex under analysis and concerns regarding the effects of opposite timing of seasonality in Northern and Southern hemispheres, we used a subset of the “bioclimatic” coverages: annual mean temperature, mean diurnal range, maximum temperature of warmest month, minimum temperature of coldest month, annual total precipitation, precipitation of wettest month, and precipitation of driest month. Although some workers preselect a very small suite of variables prior to analysis (Huntley et al. 1995), we prefer to include more dimensions, and allow the evolutionary computing algorithm to seek out important variables and sets of variables for a given model. All analyses were developed at a spatial resolution of 0.04° to match the approximate spatial accuracy of our georeferencing.
The GCM data had a spatial resolution of 2.8°, or roughly 300 × 300 km. Surfaces were created at 0.04° spatial resolution via the following procedure. First, the difference between the GCM output for LGM and recent, preindustrial, conditions was calculated. These differences were then interpolated to the 0.04° resolution grid using the spline function in ArcInfo (ESRI, Redlands, CA) with the tension option. Finally, the interpolated differences were added to the high-resolution current climate datasets from WorldClim and LGM bioclimatic coverages created. This procedure has the dual advantage of producing data at a resolution that is relevant to the spatial scale of analysis and of calibrating the simulated climate change data to the actual observed climate data.
GARP is an evolutionary-computing method that builds ecological niche models based on nonrandom associations between known occurrence points for species and sets of GIS coverages describing the ecological landscape. Occurrence data are used by GARP as follows: 50% of occurrence datapoints are set aside for an independent test of model quality (extrinsic testing data), 25% are used for developing models (training data), and 25% are used for tests of model quality internal to GARP (intrinsic testing data). Distributional data are converted to raster layers, and by random sampling from areas of known presence (training and intrinsic test data) and areas of “pseudoabsence” (areas lacking known presences), two datasets are created, each of 1250 points; these datasets are used for rule generation and model testing, respectively (Stockwell and Peters 1999).
The first rule is created by applying a method chosen randomly from a set of inferential tools (e.g., logistic regression, bioclimatic rules). The genetic algorithm consists of specially defined operators (e.g., crossover, mutation) that modify the initial rules, and thus the result are models that have “evolved”—after each modification, the quality of the rule is tested (to maximize both significance and predictive accuracy) and a size-limited set of best rules is retained. Because rules are tested based on independent data (intrinsic test data), performance values reflect the expected performance of the rule, an independent verification that gives a more reliable estimate of true rule performance. The final result is a set of rules that can be projected onto a map to produce a potential geographic distribution for the species under investigation.
Following recent best-practices recommendations (Anderson et al. 2003), we developed 100 replicate random-walk GARP models, and filtered out 90% based on consideration of error statistics, as follows. The “best subsets” methodology consists of an initial filter removing models that omit (omission error = predicting absence in areas of known presence) heavily based on the extrinsic testing data, and a second filter based on an index of commission error (= predicting presence in areas of known absence), in which models predicting very large and very small areas are removed from consideration. Specifically, we used a soft omission threshold of 20%, and a 50% retention based on commission considerations; the result was 10 “best subsets” models (binary raster data layers) that were summed to produce a best estimate of geographic prediction.
ECOLOGICAL NICHE CONSERVATISM
We developed two sets of tests of ecological niche conservatism across the range of the group using spatial stratification methods—one, presented elsewhere (Peterson and Holt 2003), splits available occurrence data simply by their distribution across space, whereas the other splits occurrence data by their lineage membership (Nyári 2007). Specifically, in the first tests, we divided the values of latitude for the 227 unique occurrence points into quartiles (“bins”; Fig. 2), and tested for predictivity among suites of points corresponding to these regions. As such, we conducted an N− 1 jackknife of the four bins, and used each possible set of three bins to predict the distribution of the species in the fourth bin, and tested for predictivity better than that expected at random in that region. These tests were developed exclusively in the present-day climate context without inclusion of phylogeographic information, and so test only the idea that the distinct lineages that make up the complex occur under a consistent ecological regime across their range in the neotropics.
Details of the tests of predictive ability and spatial consistency of ecological niches are as follows. Within the test area, we calculated the proportion of the quartile area predicted present at each threshold of the GARP model. We also assessed success in predicting each independent test point at each threshold. We then calculated a one-tailed cumulative binomial probability associated with that level of predictive success at that proportional area predicted present (Anderson et al. 2003). We did not employ more complex approaches to model validation, such as the receiver operating characteristic (Fielding and Bell 1997) owing to concerns regarding emphases of such approaches that do not necessarily focus on prediction of the entire distributional area (Anderson et al. 2003). Under the assumption that niche stability across space will be indicative of potential niche conservatism in the lineage through time, ecological niche models that passed this test of predictive ability across broad, unsampled regions in the present day were then explored further as to their implications in the Pleistocene LGM.
Second, using information on lineage membership of particular populations, we developed a similar test of niche conservatism. Seven lineages were recovered in detailed molecular phylogeographic studies of S. turdina (Nyári 2007). We divided occurrence points based on their respective phylogeographic affiliations (see Fig. 1); because only a relatively small number of localities were genotyped in the molecular studies, we grouped the remaining occurrence localities based on known morphological and plumage breaks corresponding to subspecies boundaries (Peters 1979; Snow 2004). These seven subsets of the available occurrence data ranged in sample size 6-76 sites.
We developed GARP models based on all seven possible sets of six phylogroups, and tested the ability of each replicate model to anticipate the geographic distribution of the seventh phylogroup. In light of the historically limited distributions of particular lineages, we circumscribed testing areas based on a buffer around the known occurrences of the testing phylogroup with a radius equal to the longest axis of the distribution of that phylogroup. Within this testing area, we repeated our calculations of the proportion of area predicted present at each threshold of the GARP model and success in predicting each independent test point at each threshold, as described above.
RETRODICTION OF LGM DISTRIBUTIONS
Once predictivity was confirmed in the present day, we developed an overall model for the present-day distribution and ecology of the complex based on the full complement of occurrence points available for the complex. We used the same protocols for ENM development as described above, but projected the resulting best-subsets models onto the LGM climatic coverages described above. The result was a picture of areas at LGM matching habitable present-day climates, or effectively a picture of Pleistocene refugia for the complex, from the point of view of the present-day ecological requirements of the species (Rice et al. 2003).
We tested the consistency of geographic extents of molecular phylogroups with those of Pleistocene refugial distributions. That is, we asked (statistically) whether the geographic position and continuity of areas of adequate climate conditions had explanatory power above and beyond that expected by chance regarding distributions of molecular phylogroups. We used a statistical test that was designed explicitly to consider that the numbers of georeferenced and genotyped occurrence points would be relatively small compared to the overall pool of occurrence data. Specifically, (1) we identified the 38 occurrence points for which the phylogroup was known, which were the geographic coordinates of samples included in the earlier molecular study (Nyári 2007). (2) We reduced that set to the 11 points that occurred within reconstructed Pleistocene LGM refugia, under the assumption that a population presently occurring in a Pleistocene LGM refugium is the same lineage as that which occurred there at LGM; we here ignore those points not falling into Pleistocene LGM refugia, as they represent putative subsequent expansion and cannot unambiguously be identified with a particular refugium. (3) We connected phylogroups and refugia for each sample in linked pairs (e.g., phylogroup 1 – refugium a), counting the number of such pairs as an observed value of coincidence. (4) To understand coincidence that would be expected were no phylogroup-refugium association to be present, we randomized refugia with respect to phylogroups to generate a distribution of 100 null coincidence values; we then compared the observed value from (3) to this distribution to obtain a probability value for the comparison.
VISUALIZATION OF LGM BARRIERS TO DISPERSAL
The field of phylogeography has placed considerable weight on detection of interruptions of gene flow (i.e., barriers) (Avise and Walker 1998), but has not often been able to characterize those barriers ecologically. Here, we explore the possibility of visualizing barriers to gene flow by means of inspection of associations between LGM climate data and LGM projections of present-day ENMs. Specifically, as an example, we focused on a disjunction that apparently formed at LGM between the eastern and western portions of the Amazon Basin, which was noted in our earlier analyses of diverse forest taxa (Bonaccorso et al. 2006), as well as in our present analyses of Schiffornis. Within a transect linking putative refugia on either side of this barrier, we combined the ENM prediction with the LGM climate data; the resulting output grid summarizes all unique combinations of input raster values. Then, we used simple bivariate plots to visualize environmental features across this LGM barrier, comparing areas predicted potentially present by all replicate ENMs with areas predicted either absent or at low levels of suitability (i.e., < 3 of 10 replicate ENMs predicting present).
We first established that the S. turdina assemblage is conservative in its ecological niche characteristics across its broad geographic distribution. In light of this broad distribution, we considered that the most difficult challenge for predictivity among portions of the species' distribution (Peterson and Holt 2003) would be that of stratifying the known distribution latitudinally.
The result of the spatial stratification and jackknife procedure was that two of the four partitions (Fig. 2)—and curiously the 2 (A and D) that are most extreme latitudinally—were predicted statistically better than random expectations across all prediction thresholds (all P < 10−5). The other two partitions were also predicted statistically significantly better than random expectations, but in one case (partition C) excepting at the lowest (i.e., broadest area predicted) threshold, and in the other case (partition B) only at intermediate thresholds. Overall, though, the picture is one of ecological niche conservatism in the group, as distributional patterns of even the extreme northern and extreme southern populations are predicted well by the ecological niche characteristics of the remainder of the species' distributional area.
Partitioning occurrence data by lineage showed similar results. Two of the seven lineages (clades 3 and 4 in Fig. 1) had very small sample sizes (<10 points) of occurrence data, and so were omitted from analyses. However, of the remaining five lineages, three showed predictivity statistically better than random expectations at all thresholds (clades 1, 5, and 6; all P < 0.05); one showed predictivity statistically better than random expectations 9 of 10 thresholds (clade 7; P < 0.05); and predictions for clade 2 were statistically significant (P < 0.05) only for 2 of 10 thresholds. Hence, the predictions resulting from spatial stratification by lineage membership showed almost universal predictive ability, confirming the earlier results of ecological niche stability across this clade.
We then explored the projection of present-day ENMs to LGM climates under both the CCSM and MIROC GCM climate models (Fig. 3). Here, we see that the overall reconstructed distributional limits of the complex were not dramatically different at LGM, but that habitable areas were more fragmented and discontinuous than at present. In particular, we observed reduced continuity of the species' potential distributional area across the Amazon Basin, with eastern and western sectors of the species' distribution being isolated by a northwest-to-southeast swath of less-suitable conditions.
Spatial coincidence of samples belonging to particular molecular phylogroups with particular putative Pleistocene refugia was close (Fig. 4). For example, among the 11 test points available, the four samples that were part of the molecular phylogroup that corresponds to the Guyanan Shield all fell within the one LGM refugium in that region. Indeed, of the 11 test points available, in one case a particular refugium included sample localities from two phylogroups, and in one case localities for a particular phylogroup fell into two distinct refugia. Comparing this degree of correspondence between phylogroup membership and refugium against correspondences in 50 randomizations of refugium with respect to phylogroup, the correspondence between phylogroup membership and refugium placement is better than random, so it appears that climate-reconstructed refugia indeed have significant predictive power regarding phylogroup structure (P < 0.02).
Finally, we characterized the climatic characteristics of one LGM barrier between refugia in the eastern and western portions of the Amazon Basin (Fig. 5). Highly suitable habitats can be observed to be most related to high precipitation through the year, avoiding coldest minimum temperatures, and avoiding areas that dry out significantly in any part of the year. These features generally coincide with the characteristics of the evergreen lowland rainforest in which this complex is distributed.
By integrating ecological niche characteristics drawn from the environmental characteristics of known occurrences of the complex with phylogeographic and phylogenetic information from molecular genetic studies, we can derive a more refined image of driving forces that led to the distributions and discontinuities among extant taxa. In this study, we documented significant niche conservatism over the entire present-day distribution of the complex (Fig. 2), and showed that, at LGM, the distributional area of the complex fragmented into several areas corresponding to presumptive Pleistocene refugia (Fig. 3). These areas are separated by less-suitable areas, with environments that can be identified and characterized as barriers to gene flow (Fig. 5).
This study is not without its limitations. In particular, here, we battle with issues of spatial resolution—very narrow barriers (e.g., rivers in the Amazon Basin) may simply not be “visible” in our analyses, even though they may be important to the biogeography of the group under question. Recent long-term ecological studies of Amazonian avifaunas have demonstrated clearly the effects of habitat fragmentation, even on quite-fine scales, on the population biology of species such as those analyzed here (Ferraz et al. 2003). As such, the effects of the coarse resolution of our LGM projections must be borne in mind in interpretating results.
An additional complication arises due to the appearance of nonanalogous climate conditions when ENMs are projected across major climatic changes. That is, if ENMs are trained in the present, but climatic conditions in the LGM include sets of conditions not manifested at present, then modeling approaches will have unknown or unpredictable behavior in predicting into those areas (Pearson et al. 2006). Still, given that topographic features surrounding the Amazon Basin and adjacent to the Mesoamerican portions of the distribution of the S. turdina complex provide cooler conditions than the complex's lowland distribution, these problems are probably of less concern here than for forward projections to still-warmer climates that are likely to appear over the next century (Pearson and Dawson 2003; Araújo et al. 2005).
Finally, in interpreting these model results, it must be borne in mind that they provide only static pictures of environmental suitability across landscapes (Soberón and Peterson 2005). As such, LGM “refugia” are hypothetical only—if dispersal limitations are sufficient, certain refugia may have been inaccessible to a species (Araújo and Pearson 2005), effectively leading to discords between potential and actual distributional areas (Peterson 2003a; Soberón and Peterson 2005). Such complications require care and thought in interpretation what would appear to be error in the form of overly broad predicted areas.
This study adds to a growing body of literature documenting conservatism of ecological niche characteristics across short-to-medium periods of evolutionary time (Peterson et al. 1999; Andreas et al. 2001; Peterson 2003a), although several examples of nonconservative ecological niche evolution have also been documented (Peterson and Holt 2003; Graham et al. 2004a; Knouft et al. 2006). We do caution, however, based on extensive experience, about the effects of developing ENMs in overly dimensional environmental spaces and the degree to which this overfitting can produce the appearance of nonconservatism (Fitzpatrick et al. 2006; Broennimann et al. 2007): regardless of conservatism, if dimensionality of the environmental space so far outstrips the sample sizes used to train the ENM, models are unlikely to be able to predict accurately among areas or across time periods. Regardless, niche conservatism has many important implications, and can provide a key tool in understanding historical biogeography (Wiens 2004; Wiens and Graham 2005a; Peterson 2006).
The references cited in the Introduction reflect a strong current in beliefs about vertebrate speciation—that Pleistocene climatic fluctuations were not major generators of current vertebrate species diversity. Rather, most authors argue for pre-Pleistocene origins for most species (Klicka and Zink 1997; Drovetski and Ronquist 2003), perhaps owing to broad use of poorly calibrated molecular clocks in many molecular studies (Ho et al. 2005; Peterson 2007). Although the few molecular dating efforts not based on such molecular clocks and rather based on coalescent analyses have indicated younger—Pleistocene—speciation events (Griswold and Baker 2002; Carstens et al. 2005; Jennings and Edwards 2005), this point has not been addressed broadly with independent sets of evidence, and as such we address it herein.
We present here a very simple test that only begins to shed light on this question. We identify Pleistocene potential climatic refugia for species taxa in the S. turdina complex, and show that they have significant explanatory power regarding what is known about the spatial distributions of phylogroups within the complex. Although we do not as yet have data regarding climate patterns prior to LGM, and as such cannot develop more detailed tests of the timing of diversification of Schiffornis within the Pleistocene, our results do show that Pleistocene climate patterns are at least relevant to the question. That is, if Pleistocene fragmentation events can explain much of the geography of species' distributions, why is it necessary to appeal to older climate phenomena, ignoring the massive, global climate phenomena that characterized the Pleistocene?
Pleistocene climatic fluctuations are known to have occurred in repeated hot–cold cycles that approached a binary condition, with extremely short transitions between the two (Dansgaard et al. 1993). As such, our use of LGM climate data quite simply reflects the availability of LGM simulations—to our knowledge, no continent-wide simulations have been developed that allow direct comparisons among glaciation events within the Pleistocene. Already, a Last Interglacial (135,000 years before present) dataset is in preparation (R. J. Hijmans, pers. comm.) for addition to these analyses, but considerable additional climatic information will be necessary before we can use niche modeling tools to pin down dates more precisely than what we have achieved in this article.
Clearly, this question will require much more in the way of experimentation and exploration, but this analysis can be taken as a first step toward a more general, broadly based answer. Finally, here, we explore techniques for visualization of LGM barriers to gene flow. The ENM approaches allow reconstruction not just of the pattern of fragmentation of ranges, but also of the ecological correlates of range restriction and distributional barriers. This ecological interpretation of vicariant patterns opens doors to new insights and new questions that have generally been out of the realm of possibilities.
Associate Editor: K. Crandall
We thank M. Papeş for her continual help with technical GIS problems, E. Bonaccorso for helpful reflections and insights, and R. Guralnick and E. Waltari for comments on the manuscript. R. J. Hijmans kindly developed and made available the LGM climate datasets. We are grateful to the following institutions for providing locality data from specimens under their care: Louisiana State University Museum of Natural Science; Academy of Natural Sciences, Philadelphia; Field Museum of Natural History; American Museum of Natural History; Museo de Zoología, Facultad de Ciencias, Universidad Nacional Autónoma de México; U. S. National Museum of Natural History; and University of Kansas Natural History Museum. Marina Anciães kindly provided additional georeferenced localities from the National Museum of Brazil. We acknowledge the PMIP2/MOTIF data providers and the Laboratoire des Sciences du Climat et de l'Environnement for providing access to the GCM data (data downloaded on 1 March 2006).