The past and future influence of geographic information systems on hybrid zone, phylogeographic and speciation research

Authors


Nathan G. Swenson, Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA.
Tel.: +1 520 626 3336; fax: +1 520 621 9190; e-mail: swenson@email.arizona.edu

Abstract

Over the past two decades geographers have developed an increasingly sophisticated technology termed a geographic information system (GIS). A GIS has the ability to store, map and analyse spatial data. The powerful analytical capabilities of a GIS could serve to enhance our understanding of the spatial component of the evolutionary process. In particular, phylogeographers, hybrid zone and speciation researchers could benefit enormously from incorporating this sophisticated technology from the discipline of geography, as they have done so readily from other disciplines (e.g. genetics). Indeed, an increasing number of researchers in these fields are beginning to include GIS analyses into their research programmes. Some of this integration has taken the form of analysing the spatial relationship between populations and hybrid zones. Several other researchers have also begun to incorporate GIS into their work through the use of GIS-based niche models. These models estimate a multidimensional niche for a species using known geo-referenced populations and digital climate maps. Here, I review the recent integration of GIS and GIS-based predictive niche models into the above evolutionary sub-disciplines. I also describe evolutionary analyses that could be further enhanced through the implementation of GIS.

Introduction

Evolutionary inquiries typically concern a change in genotype in relation to space, time and/or the environment. Spatial patterns in particular have fuelled several evolutionary hypotheses and debates. For example, the relative influences of geography and ecology on the process of speciation have long been, and remain, contentious issues in evolutionary biology (Howard & Berlocher, 1998; Coyne & Orr, 2004). Questions such as: (1) are the majority of species generated while in allopatry, parapatry or sympatry?; and (2) are environmental backdrops or biotic interactions responsible for species divergence? are routinely analysed by evolutionary biologists. Yet, such analyses are often conducted without the implementation of powerful technology designed to analyse spatial and environmental relationships in concert. Evolutionary biologists have largely performed quantitative genetic analyses of divergence and qualitative assessments of the spatial proximity of diverging species or populations. Less focus has been placed on quantitatively analysing the spatial, environmental and genetic aspects of diversification. This is beginning to change with the increasing use of quantitative spatial analyses over the past 5–10 years.

Concurrently, with the technological revolution in the field of genetics, geography experienced its own revolution through the implementation of digital mapping and analyses. In particular, geographic information systems (GIS) have facilitated powerful analyses in many different disciplines through their ability to: (1) integrate large data bases with the geo-referenced locations from which the data were collected; and (2) rigorously and efficiently quantify spatial patterns. These two important properties that a GIS provides have been assimilated successfully by civic planners, geologists, wildlife biologists, conservationists and to an increasing extent ecologists. One of the remaining disciplines that will largely benefit from this powerful technology is evolutionary biology. Here, I briefly review the historical link between geography and evolutionary biology, provide a recent history of quantitative GIS analyses of spatial data used in hybrid zone, phylogeographic and speciation research and highlight potential ways in which GIS can be further integrated into these fields.

Historical link between geography and evolution

The historical link between geography and evolution runs deep with many of our original theoretical advances stemming from observations of the spatial proximity of closely related forms (Darwin, 1859; Wallace, 1876). Indeed, the geographic distributions of closely related species have been a major source of evolutionary hypotheses regarding species diversification (e.g. Darwin, 1859; Wagner, 1873, 1889; Wallace, 1876; Jordan, 1905, 1908; Allen, 1907; Mayr, 1963; Remington, 1968; Haffer, 1969). Intra-specific spatial distributions have also played a large role in generating evolutionary hypotheses. An obvious example of this is the pioneering work of Wright (1943) regarding genetic isolation with distance. Wright’s work, much of which had a spatial component, quickly became a cornerstone of population genetics theory (Hartl & Clark, 1997). Researchers in this field are now rapidly incorporating more and more sophisticated quantitative spatial analyses into their work (Epperson, 2003). Ultimately, the result of the underlying interest of evolutionary biologists in spatial patterns has been the tradition of verbal qualitative interpretations for the patterns found in published maps of populations or species ranges.

Perhaps the best, and most extensive, example of our long standing interest in spatial patterns in evolutionary biology is Ernst Mayr’s classic book (Mayr, 1963). It seems clear upon inspection of Mayr’s works that he was greatly influenced by maps of species distributions and their utility in providing insights into the evolutionary process. For example, of the 58 figures used by Mayr (1963), 35 (approximately 60%) are maps. The majority of these maps are found in his chapter on geographic speciation. These maps were generally used as qualitative support for his central thesis of allopatric speciation and were never quantitatively assessed.

Ultimately, large- and small-scale geographic patterns have played a key role in developing evolutionary hypotheses. As evolutionary biology becomes more rigorous and quantitative in its genetic and statistical techniques, it must also become more quantitative in its approach to geographic analyses. Recently, some evolutionary biologists have embraced GIS to conduct powerful quantitative geographic analyses and the number of publications resulting from this integration is rapidly increasing (Fig. 1). This suggests that not only is geography still central to many contemporary evolutionary hypotheses, but that the geographic component of these hypotheses can be quantitatively addressed successfully using a GIS.

Figure 1.

 A histogram of the number of hybrid zone, phylogeographic and speciation publications that use GIS through 2006. Note that a number of publications that may have used a GIS to generate map figures may be omitted, because the authors did not cite the use of a GIS in their methods.

What is a GIS and what is a GIS-based predictive niche model?

As with most technological advances, GIS arose out of a necessity to tackle a novel problem. In the 1960s, the Canadian Department of Forestry and Rural Development recognized a need to inventory and analyse the natural lands within the Canadian Borders. In response, Roger Tomlinson and colleagues designed a system that would store spatial information and more importantly it would also analyse and visualize spatial data (Tomlinson, 1984). The result was the first GIS, the Canada GIS, and the development of the three central properties that now broadly define a GIS: (1) storage; (2) analysis and (3) visualization of spatial information (DeMers, 2000).

The utility and power of a GIS in biological research lies primarily in its ability to simultaneously store and analyse large data sets containing spatially explicit information. A GIS operates by linking tables of information called attributes to points, lines, polygons or grid cells with known spatial locations. For example, a single map (a map layer) may contain multiple points located on the globe with each point on the map corresponding to a row in a data table with multiple attribute columns. Therefore, a point has potentially multiple levels of information beyond simple geographic coordinates. These levels of information can be mapped, queried, and analysed. Further, a GIS allows for two or more maps to be overlaid. An overlay allows the researcher to determine which attributes of one map correspond to another. For example, a point file representing the location of an animal and its anatomical traits could be overlaid with a digital map of annual precipitation. The data underlying the two maps is then linked allowing for correlation analyses of trait by environment relationships. The geographic distance separating the points in these analyses could also be determined within the GIS allowing one to assess the degree of spatial autocorrelation in their data.

Spatial data may also be converted from one format to another inside a GIS. For example, one dimensional point data can be converted into two dimensional polygon data using a variety of methods (Fig. 2). Each of these methods are easily repeatable inside a GIS with each having different assumptions and uses. Point, line and polygonal maps, termed vector maps, may also be converted into gridded maps, termed raster maps, inside a GIS with each grid cell having a known dimension, geographic locality and attribute value.

Figure 2.

 A series of methodologies for converting zero dimensional points to two dimensional polygons: (a) a conversion using minimum and maximum latitude and longitude to define a rectangle; (b) a minimum convex polygon method where the internal angles of the polygon are less than 180 °. This method is also occasionally referred to as a convex hull; (c) a maximally convex polygon with the internal angles can be greater than 180 °; and (d) a frequency weighted method where a percentage of the most clustered points are included in a minimum convex polygon. In this example, I use 80% of the points that are in the closest proximity. This method relies on a frequency distribution often used in radio tracking studies whereas the others are free from this constraint and are therefore not sensitive to sampling intensity.

Aside from correlation analyses, queries and attribute linking, a GIS can perform many other spatial analyses. For example, a GIS can generate raster maps that have grid cells containing values that describe the cost of moving to that cell to another. This is called a cost surface or a friction surface. This type of technique has several uses ranging from where to build roadways to what was the most likely dispersal route out of a Pleistocene refuge. Cost surfaces may be generated from existing raster maps, yet they may also be generated from vector data. For example, a point data file may be used to generate a raster map surface through interpolation. A GIS can estimate the values of grid cells between two known point locations using a variety of techniques. The most common approach is a distance-based interpolation, where the value estimated decays with distance from the value at the point. This type of technique could be, and has been, used to estimate the spatial distribution of climatic and genotypic data in a region from a set of point measurements.

Another powerful facet of most GIS platforms is that they allow the user to write scripts in a variety of programming languages allowing for specialized analyses. Scripting may be done for things as simple as iterating an existing GIS analysis multiple times or for more complex analyses such as simulating the expansion of a species range. Both are extremely powerful in their own right. Iteration allows for repeated analyses of large numbers of map layers and simulation techniques allow for more rigorous spatial prediction and null modelling. I foresee the latter becoming much more prominent in the literature in the coming years.

Lastly, I will discuss one particular GIS application that has rapidly been assimilated into ecological and evolutionary research. This method is called predictive niche modelling. Predictive niche models use known geo-referenced localities of a focal species and raster climate maps to generate a predicted distribution map of the focal species. There are now a multitude of algorithms that are used for predictive niche modelling (e.g. bioclim: Nix, 1986; GARP: Stockwell & Nobel, 1992; MaxEnt: Phillips et al., 2006) some of which seem to perform better than others (Elith et al., 2006). Despite their differential performance, they all follow a simple central principle. The method is to take the known geo-referenced point occurrences of the focal species and extract the climatic information from each of those localities. The algorithm then will use the observed information to determine the probability that a species will be found on any point along each climatic axis. This information is then combined for all climatic axes to generate a predicted climatic envelope in which the species is likely to occur. This prediction is then projected onto a study region either for the present time or for a historical time when maps of all the relevant climatic variables are available (Fig. 3).

Figure 3.

 A highly simplified example of a GIS-based predictive niche model. On the top left is a map layer of known point occurrences of the focal organism (grey points). These points are then overlaid with a series of raster climate maps. In this process, the model algorithm determines where the organism is known to occur along each climatic axis. This information is then used to generate a multi-dimensional model, rule, of the probability of occurrence of the organism along each axis (middle section). This rule is then used to estimate where the species may occur in the study region. There are two types of error that can occur in this process that are not shown here. The first is omission error where the species is predicted to be absent by the model when it is known to occur in that location from the observed data. Omission rates are generally very low (<0.05%) using most models. The second type of error is commission error where the species is predicted present in locations where it does not actually occur. This type of error is difficult to detect unless a portion of the known occurrences data set is omitted from the model formation. This omitted data can then be used to test for commission error rates.

While we are interested in the predicted distributional map that a niche model will generate, we must also quantify the accuracy of the prediction and the climatic variables most important in determining species ranges. To do this, researchers will often exclude half of their known data set (climatic data or point observation data) from the modelling process and use it as an independent data set to test the predictive accuracy of the model. These excluded data sets are referred to as test data sets. Predictive niche models have generally been found to be successful in predicting the presence of test data points. They therefore tend to have low numbers of false negatives or omission errors. Conversely, predictive niche models often over-predict the range of a species. These false positives are called commission errors. Commission errors may arise from historical barriers to dispersal or through negative biotic interactions both of which are not incorporated into niche models. True commission errors can be difficult to detect if the test data set is geographically restricted to only a portion of the realized species range. Thus the user of predictive niche models has to discern for himself or herself what is an acceptable commission error rate. In some instances, a large amount of commission may be desired due to a geographically restricted data set, while at other times the same rate of commission may not be desired. Some predictive niche models are known to have low commission error rates (i.e. MaxEnt), while others have high commission error rates (i.e. GARP) when used on the same data set. This has lead some to conclude that MaxEnt may be a preferred niche model (Elith et al., 2006), but this could be debated if the user desires high rates of commission. This aspect of predictive niche models is both an asset and a complication for various research questions (see Future Directions and Challenges). The user is advised to consult the niche modelling literature in order to refine or improve their choice, implementation, and interpretation of alternative niche models.

In the following sections, I will discuss examples of GIS and GIS-based niche models being used in evolutionary research. The GIS-based niche model examples use the bioclim algorithm to predict species distributions. The reason for this bias is that bioclim was the original niche model used in these fields and has traditionally been the most widely used. Recent evidence has suggested that alternative models (i.e. MaxEnt) may be more robust or accurate in predicting ranges (Elith et al., 2006). Thus my usage of bioclim examples is due to the history of its usage in these fields and not due to my promotion of bioclim as a preferred algorithm.

The recent integration of GIS into evolutionary biology

Hybrid zones

A prevalent focus in hybrid zone research through time has been the relative importance of intrinsic vs. extrinsic forces in determining the location and maintenance of hybrid zones. There is an obvious spatial component to this line of enquiry as most hypotheses make clear predictions regarding the distribution of hybrid zones in space and their relationship, or lack thereof, with spatial gradients in the environment. Therefore, it is no surprise that the integration of GIS into hybrid zone research to date has focused on the question of what are the intrinsic and extrinsic factors important in determining the location and maintenance of hybrid zones.

The history of GIS becoming integrated into hybrid zone research began with Kohlmann et al. (1988). In their study, Kohlmann et al. attempted to test if extrinsic forces could explain the location of parapatric distributional boundaries among chromosomal races within the Australian grasshopper, Caledia captiva. To do this, they used one of the original predictive niche models originally developed in the mid-1980s called bioclim (Nix, 1986). The authors used bioclim and ordinations to show that precipitation limited the distributions of races on an east-west axis and temperature limited the distributions on a north-south axis. An overarching message from this work is that hybrid zone locations can be shown to be governed by extrinsic factors using GIS-based predictive niche models. Further, niche model analyses can generate new hypotheses regarding the physiology of the constituent species in hybrid zones.

Despite the interesting work of Kohlmann et al. (1988), over a decade passed until Kidd & Ritchie (2000) revived the integration of GIS into hybrid zone research. The evolutionary problem that was posed in this line of research was, whether hybrid zones in the Ephippiger ephippiger complex result from primary divergence or secondary contact between populations expanding out of refugia? The spatial patterns predicted by primary divergence vs. secondary contact in this study regarded the spatial concordance of allozymes and morphological and behavioural traits. It was predicted that allozymes and traits are predicted to be more dissimilar in the hybrid zone, if secondary contact was the mechanism responsible. The basic idea is that species should be more genetically dissimilar in a hybrid zone, as compared nonhybrid zone populations, if they have diverged in allopatry. If they have primarily diverged in the hybrid zone then populations in the hybrid zone should be more genetically similar than nonhybrid zone populations. The approach used by Kidd and Ritchie concentrated on using the three main attributes of a GIS: data storage, analysis, and visualization and therefore represents a more prototypical GIS approach as compared to GIS-based predictive niche modelling. Specifically, they used a GIS to assign trait and genetic attributes to geo-referenced populations. The GIS was then used to interpolate trait and genetic data between populations to generate a continuous data surface. The interpolation method that they used in this initial work was a distance-based method where the closer the predicted grid cell is to a known population in Euclidean distance, the more similar the predicted value for that cell is to that of the known population. Therefore, a grid cell lying precisely between two known populations with two separate trait values will have the average value of those two populations. The authors iterated this interpolation method for each trait. These trait maps were used to show that some traits showed dissimilarity in the hybrid zone thereby supporting the secondary contact hypothesis. Next the trait maps were used in a Principle Components Analysis, an analysis that is now easily implemented in most GIS software packages, in order to generate multivariate maps that could be analysed for discontinuities. This multivariate approach found general support for the secondary contact hypothesis by showing parental populations outside of the hybrid zone clustered in multivariate space. As noted above, the GIS work carried out by Kidd and Ritchie is a more classical approach taking full advantage of everything a GIS can provide a research programme.

Phylogeography

The field of phylogeography has seen the largest degree of GIS usage to date as compared to hybrid zone and speciation research. Phylogeography is by definition a spatial discipline. Thus, it is logical that researchers in the field should be prone to taking advantage of GIS not just for making maps, but also for storing and analysing spatial data. Indeed this is quickly becoming the case, prompting expanded discussions of what a ‘phylogeographic information system’ would entail (Kidd & Ritchie, 2006). By providing a platform for spatial analyses, GIS software will permit the field of phylogeography to move into a more quantitative realm in regards to central questions such as the prevalence of phylogeographic concordance and inferring historical refugia and expansion routes. Here I discuss two separate examples that I feel best represent the breadth, but not the depth, of GIS being incorporated into phylogeographic research. Unfortunately, this results in the omission of extended discussions of other equally important examples of GIS usage in phylogeography (e.g. Kambhampati et al., 2002; Rice et al., 2003; Martínez-Meyer et al., 2004a,b; Peterson et al., 2004; Davis, 2005; Armstrong, 2006; Bond et al., 2006; Rissler et al., 2006; Weaver et al., 2006). I apologize in advance for their omission. The first example I will discuss uses a GIS-based predictive niche model to examine palaeodistributions. The second example I will discuss uses a GIS to generate and store large phylogeographic break databases for spatial analyses of clustering.

An initial example of a GIS being utilized in phylogeographic research comes from one of the many exciting articles analysing species divergence along gradients in the Australian Wet Tropics (Schneider et al., 1998, 1999; Schneider & Moritz, 1999; Moritz et al., 2000). To my knowledge the system in this region that has received the most thorough GIS-based analysis is that of the land snail Gnarosophia bellendenkerensis (Hugall et al., 2002). Hugall and colleagues address a question that is well represented in the literature. That is, how have species responded to various palaeoclimates and can this information be used to understand the mechanisms generating intra-specific divergence that lead to speciation? In this article, the authors argue that spatially explicit quantitative analyses would provide a substantial advance in phylogeographic research programmes. In support of their argument, the authors used the predictive niche model bioclim (Nix, 1986) to build a climatic envelope for their study organism using contemporary data. This climatic envelope was then projected onto a reconstructed palaeoclimate at three different points in time for their study region in order to predict where the focal species would have been distributed. This approach makes the potentially very tenuous assumption that physiological constraints have remained largely stable through time. The results showed a substantial reduction and fragmentation of the range of the species during the Last Glacial Maxima suggesting the presence of multiple refugia. A combined investigation of the bioclim models and a nested clade analysis indicated a fair amount of overlap in the distributional analyses generated from each analysis. Lastly, the authors quantified the correlation between the change in area predicted to be occupied by their study organism through time and a likelihood estimate of the exponential growth parameter from their genetic data. This quantitative analysis found a fairly strong positive association (= 0.76) suggesting the spatial modelling and phylogenetic analyses were complimentary. This synergism between a predictive niche model and phylogenetic analyses is promising and suggests further analyses testing this approach may prove informative in trying to discern the location and influence of refugia in generating intra-specific divergence and speciation. Despite this promise, there are several conceptual issues that may face this approach including the lability of physiological tolerances and biotic interactions through long periods of time, the inherent difference between an observed realized niche and the potential fundamental of a species and the reliability of predicted historical climate. I will discuss these issues further below.

A second example of GIS being incorporated into phylogeography comes from a comprehensive spatial analysis of phylogeographic break clustering performed by Soltis et al. (2006) in the eastern United States. In this work, Soltis et al. (2006) performed an exhaustive literature search and then sub-sampled the phylogeographic breaks reported in the literature and mapped their presence as lines in a GIS. The goal of this exercise was to first quantify whether or not phylogeographic breaks show higher densities in some regions of the study region and second to quantify whether these areas of clustering coincide with prominent features of the landscape. To do this, they used a GIS function that calculated the number of lines, representing the phylogeographic breaks, within grid cells throughout their study region. Next, they performed a spatial null modelling procedure that threw down the observed lines at random onto the study region so that their observed length and shape were maintained. Their observed spatial position and orientation of the lines was not maintained. This was repeated 20 times and the authors quantified whether or not the observed grid cell had a density of phylogeographic breaks that was higher than expected given the random distribution. They found that only one grid cell, located in the middle of their study region, had a higher than expected number of breaks. The authors considered this to be evidence for ‘pseudo-congruence’ of phylogeographic breaks (Cunningham & Collins, 1994), because the observed grid cells generally did not show significant clustering when confronted with a randomization test, the breaks were not all aligned in the same cardinal direction and the grid cell that did show significant clustering did not coincide with any obvious physiographic feature of the landscape. The first two justifications are surely valid, yet it is still unclear how one actually quantifies the generality of phylogeographic break orientation within a study region. The third line of evidence for phylogeographic pseudo-congruence will need further exploration as phylogeographic concordance would be expected to occur at geographic mid-points between glacial refugia, even if there is a general lack of a physiographic feature in the region. An example of this is the Gulf Coast of the southeastern United States (Swenson & Howard, 2005). Despite this minor point of contention, which becomes less important when the other two lines of support are present, the recent work of Soltis et al. (2006) far exceeds any analysis of phylogeographic concordance to date. Not only is the data set much more comprehensive than previous analyses, but it also conceptually advances the field greatly by introducing to my knowledge the first spatial null modelling procedure to test for phylogeographic concordance. Such null modelling analyses should become a central component in quantitatively testing for phylogeographic concordance in the future.

Speciation

The use of a GIS to study speciation has been rare. This is surprising, considering the substantial debate surrounding the relative importance allopatric, parapatric and sympatric speciation in generating biodiversity. Therefore, I will discuss the few examples of which I am aware. The preexisting research articles that have used a GIS to study speciation have used similar approaches to those used to study phylogeography that I have described above. Because the approaches have been similar and because phylogeographic and speciation research can be so tightly linked I had difficultly placing preexisting work into one of the two categories. None-the-less, below I discuss two examples that I feel have a strong speciation component.

The first example stems from the use of the GIS-based predictive niche model bioclim (Nix, 1986). Building upon preexisting techniques for determining the geographical mode of speciation from the amount of range overlap between closely related species (Lynch, 1989; Chesser & Zink, 1994; Barraclough et al., 1998; Barraclough & Vogler, 2000), Graham et al. (2004a) used bioclim to further investigate the amount of ecological overlap between five species of Dendrobatid frogs in Ecuador and thereby attempted to determine the mechanisms responsible for speciation within this clade. Specifically the authors used bioclim models to predict the geographical ranges of the focal species and their ancestors to infer the geographical mode of speciation by quantifying range overlap. Next, the authors used the climatic envelopes generated for each species by bioclim and performed a Principle Components Analysis to determine the degree of overlap amongst closely related species in multivariate environmental space. Of the seven sister comparisons in the study, the authors found that four sister lineages separated in multivariate environmental space. Two of these cases suggested ecological divergence while in allopatry and two suggested ecological divergence while in parapatry. Although this method of inferring geographical modes of speciation from range overlap of closely related species can be problematic (see Losos & Glor, 2003), the relatively novel approach used by Graham et al. (2004a) for determining ecological overlap between closely related species (see also Peterson et al., 1999) provided potential insights into the prevalence of ecological divergence in the process of speciation. In this regard, it is a good first step towards using a GIS to better understand modes of speciation.

Although this was a good first step, future research will be needed that explicitly confronts the conceptual issues raised by Losos & Glor (2003) regarding inferring speciation patterns from ranges (Fig. 4). Further, we will also need to confront the conceptual problem that it may not be possible to know the fundamental niche of a species from a GIS-based niche model that uses observational data representing the realized niche. It is well known, and well described in basic ecology texts, that the observed distribution of a species does not necessarily describe its entire potential niche and therefore its potential distribution. Processes such as negative biotic interactions and historical contingencies, can limit the distribution of a species. While this issue is often paid lip service in the literature, it is generally not adequately confronted. I believe this is the most substantial issue facing the field of GIS-based niche modelling. For example, imagine a sister species pair where one species competitively excludes the other from its range due to an earlier colonization of the region. Thus the late colonizer is restricted to an alternative habitat. If this early colonizer was removed, the sister species could potentially occupy both habitats due to a shared preference. Thus the observed distributional data show that the sister species occupy dissimilar habitats when in actuality one could potentially occupy both habitats. An example of this is provided in Fig. 5. Ultimately, this hidden layer of complexity would cause a niche modeller to erroneously conclude that these species had diverged ecologically when in fact they had not. There is no easy way around this problem without explicit experimental investigations of the fundamental niche.

Figure 4.

 A graphical model of how geographic range overlap between sister species allows for prediction of the geographical mode of speciation as presented by Berlocher (1998). The bottom panel shows that such analyses can often produce ambiguous results and therefore should be approached with caution (Losos & Glor, 2003).

Figure 5.

 A graphical model showing how two different models of fundamental niche differentiation can produce the same observed pattern of realized niches. (a) This panel depicts a shared preference model of the fundamental niche (also known as a competition-tolerance trade off niche model), where species both perform best in the same environment in the absence of the other species. Competitive dominance in the species represented by a solid line and enhanced tolerance in the species represented by the dashed line allows the two species to segregate and co-exist along a gradient; (b) this panel depicts a distinct preferences model of fundamental niches where the two species perform best in different locations along the gradient even in the absence of the other species. Thus distinct fundamental niches allow species to segregate and co-exist by using different parts of a resource gradient; (c) this panel depicts the observed distributional pattern in nature that could be the outcome of panel a or b. If the observed distribution was derived from scenario b, then a GIS-based niche model may potentially predict the past, present and future distribution of a species with a reasonable level of accuracy. If the observed distribution was derived from scenario a, then a GIS-based niche model will fail to produce accurate predictions of the past, present and future distribution of a species. Adapted from Wisheu (1998).

The second example of GIS usage in speciation research, I will present, comes from the palaeobiology literature. Palaeobiology has recently seen a rapid increase in GIS analyses in their literature. Some of this is likely due to the strong history of GIS being incorporated into geological research, but it is also likely due to data availability. Specifically large efforts have focused on geo-referencing fossil locations and storing this data in file formats conducive to GIS analyses. The most obvious example of this effort comes from FaunMap (Graham et al., 1996). Concurrently, a group of researchers have begun to generate GIS compatible digital maps of the palaeodistribution of landmasses and plate movements compiling it all into a package called paleogis. This software allows users to plot their fossil data on palaeolandscapes in order to allow for more accurate inferences of things such as geographic range size. Rode & Lieberman (2004, 2005) have taken advantage of this new technology to test hypotheses regarding speciation rates, extinction rates and geographic range size. In the example I will discuss, Rode & Lieberman (2005) used a group of Devonian crustaceans called phyllocarids. Initially, the authors plotted the point locations of phyllocarids at different points in time using different digital maps available in paleogis and geo-referenced fossil locations. These point locations were then converted into two-dimensional polygons using a standardized method. This allowed the authors to estimate the range size of each of the phyllocarids extant at that point in geological history. Of course, this makes the assumption that species ranges are continuous and uniform within the polygonal range. Next, the authors estimated the speciation rate of phyllocarids by taking the difference between the natural log of the number of phyllocarids at the start of a time period (i.e. Givetian) and the natural log of the number of phyllocarids at end of that same time period and then dividing the difference by the length of that time period. The authors then asked how well does the median range size of the constituent species’ at the start of a particular time period predict the speciation rate during that time period. By analysing multiple time periods, the authors found that the median range size of the species’ at the start of a time period was strongly positively correlated with the speciation rate (= 0.81; Rode & Lieberman, 2005). One mechanism that could account for this correlation is that larger ranges are more likely to be fragmented by physical barriers and therefore more likely to speciate allopatrically, a prediction made by Provincial Diversity Theory (Rosenzweig, 1975, 1978, 1995). The work of Rode & Lieberman (2005) provides a nice first example of how GIS can be used to address large-scale questions regarding species diversification. I fully expect future macro-evolutionary research, which takes advantage of data such as paleogis to be quite fruitful.

Future directions and challenges

Hybrid Zones

The published literature concerning the integration of GIS and hybrid zone research is sparse. To my knowledge, there have been nine articles that use GIS-based technology to analyse hybrid zones (Kohlmann et al., 1988; Kidd & Ritchie, 2000, 2001; Ritchie et al., 2001; Adams et al., 2003; Jones & Searle, 2003; Swenson & Howard, 2004, 2005; Tauleigne-Gomes & Lefebvre, 2005; Swenson, 2006). Although this number is quite low, I expect the future integration of GIS into hybrid zone research to be productive. One reason for this is the strong tradition of reporting the geographic locations of hybrids in the literature due to interest in the width and location of hybrid zones. Further museum collections, many of which are rapidly becoming geo-referenced (Graham et al., 2004b; Guralnick et al., 2006), contain natural hybrids that provide long-term records of hybrid zone locations. This information is also occasionally supplemented with long-term survey data that has recorded the location of hybrids for taxa such as birds (USGS, 2004). Thus, there is a large amount of spatial information regarding hybrid and parental locations already available for analyses across multiple taxa. As such, I believe there are multiple unexplored potential avenues for potential evolutionary analyses in general and hybrid zone analyses in particular. These avenues have been unexplored due to a lack of the sophisticated technology necessary to store, visualize and analyse spatial data. Here I will briefly outline how a GIS may be used to overcome these barriers. I will also discuss potential challenges that will face this integration.

The first article to analyse a hybrid zone using a GIS concerned the use of a predictive niche model to better understand the structure and maintenance of a grasshopper hybrid zone (Kohlmann et al., 1988). Despite the ability of predictive niche models to provide a large amount of information concerning hybrid zone structure, this method has only been used once since (Swenson, 2006). The way in which predictive niche models can be useful to future hybrid zone research is through their ability to test the assumptions underlying various theories regarding hybrid zone structure and maintenance. For example, two prominent theories make predictions regarding the link between parental species and their hybrids with the environmental backdrop. A tension zone model of hybrid zone formation suggests that the combined effects of selection against hybrids, parental species dispersal ability and parental densities govern the location and width of a hybrid zone (Barton & Hewitt, 1985). Thus predictive niche models should rarely match the observed distributions of parental species and hybrids, because tension zones are not directly governed by the abiotic environment, but the researcher should be warned that tension zones could potentially be indirectly correlated with the environment through the formation of density troughs (Barton & Hewitt, 1985). A bounded hybrid superiority model of hybrid zone structure predicts that the hybrids should be linked to the environmental backdrop while parental species are competitively excluded from the hybrid zone (Moore, 1977). Specifically, the predicted distribution of the hybrids should not extend beyond the observed range boundaries of the hybrid zone while the predictions of parental species ranges should extend into the observed boundaries of the hybrid zone. This is easily tested using a GIS-based niche model, but the quality of the results will depend on whether or not the researcher is analysing the environmental variables that are the most influential in their system.

A second potential avenue of hybrid zone GIS research that I will outline regards the stability of a hybrid zone through time. Traditionally hybrid zones were thought to be ephemeral with divergence between parental genotypes being so great or so little that hybrids would be heavily selected against or parental genotypes would merge back together respectively (e.g. Remington, 1968). The hybrid zone may also float across the landscape if it is governed by selection against hybrids, dispersal ability and densities of the parental species (Barton & Hewitt, 1985). Alternatively, if the hybrids are not unfit intermediates the hybrid zone may persist through time. An example of this would be the bounded hybrid superiority model of Moore (1977) or the cline model of Endler (1977), where the hybrids are linked to an environment intermediate between the environments that the parental species inhabit. Of course these two types of hybrid zones would predict a shift in the hybrid zone location, if the abiotic backdrop shifted through time. The above hypotheses provide alternative expectations regarding the stability of hybrid zone locations. In order to test the stability of a hybrid zone using a GIS, long-term geo-referenced records of parental species and hybrids are required as well as long-term climatic records. Fortunately there are some hybrid zones that have a long history of spatial monitoring and many countries have long-term weather station data. In particular, many avian hybrid zones have been monitored for decades. By using present day hybrid locations to generate a predictive niche model one could predict the location of the hybrid zone in previous environments and then test the predictions with the historical records of the hybrid locations. If the hybrid zone has been stable due to its link to the environment, then the historical hybrid locations should fall within the predictive niche model output. Conversely, if the hybrid zone has shifted as expected under a tension zone model (Barton & Hewitt, 1985), then the historical hybrid locations should fall outside the predictive niche model output. A preexisting difficulty that will continue to arise in this research trajectory is the possibility of extremely low rates of dispersal that would make the detection of hybrid zones shifts difficult on shorter time-scales (Barrowclough, 1978, 1980). A second difficulty that will need to be addressed is that previous long-term research has shown that year-to-year climatic variability can cause shifts in the abundance of parental species and hybrids in environmentally determined hybrid zones (Britch et al., 2001). Thus hybrid zone, or genotype frequency, shifts on short temporal scales may be difficult to detect without very detailed climatic and demographic data.

The problem of scale in GIS analyses of hybrid zones is not only relevant for the temporal axis, but it is also relevant for the spatial axis. The problem of spatial scale can best be understood with an example from a mosaic hybrid zone (Rand & Harrison, 1989). The problem will arise if a researcher attempts to predict the structure of a hybrid zone using coarse-grained environmental data that may play a secondary role in governing the hybrid zones location. For example, if the structure of the well-studied Gryllus hybrid zone was to be estimated using coarse-grained temperature and precipitation maps to parameterize a predictive niche model, the output may seemingly provide a relatively decent prediction of the hybrid zone (Fig. 6). Yet, detailed research of this hybrid zone has shown that soil heterogeneity is the best predictor the location of hybrids (Rand & Harrison, 1989). The soil maps that would be required to predict a mosaic hybrid zone such as the Gryllus hybrid zone would need to be rather fine scaled to generate a more accurate model of the hybrid zone (Fig. 7). Thus, if the researcher does not have at least some minimal knowledge of the ecology of the species, and the hybrids of interest, they may likely choose the wrong spatial scale and climatic variables for analysis.

Figure 6.

 An example of the problem of spatial scale in predicting a mosaic hybrid zone with a GIS-based predictive niche model. In the top panel displays the result of using coarse scale environmental maps to generate the model that predicted a clinal hybrid zone. The bottom panel displays the result of using fine scale environmental maps to generate a model of the same hybrid zone. The finer scale prediction reveals that the hybrid zone is likely to be a mosaic hybrid zone.

Figure 7.

 Maps of friction surfaces between hypothesized Pleistocene refugia (black points). In each map cooler colours reflect low friction or pathways of little resistance to dispersal out of refugia. Warmer colours reflect high friction or pathways of high resistance to dispersal out of refugia. The top panel represents a Euclidean distance friction surface where friction increased with distance from the refugia. The middle panel is a friction surface based on elevation where high elevations increase friction. The bottom panel is composite friction surface of the Euclidean distance and elevation surface.

Phylogeography

The incorporation of a GIS into the field of phylogeography has largely come in the form of predictive niche modelling. The future for the integration of predictive niche models and phylogeography could be bright due to the increasing amount awareness of modelling techniques and our ability to generate more sophisticated and detailed estimates of historical climates. An approach that has yet to be thoroughly explored would be to code different haplotypes as ‘species’ in predictive niche models to determine whether or not abiotic or biotic interactions play a predominant role in the distribution of each haplotype. Ecological differentiation between haplotypes would provide niche models that predict haplotype boundaries to occur close to their known boundaries, while no ecological differentiation would predict range boundaries to extend beyond the current position of the phylogeographic break. It would prove interesting to know if there are any general outcomes across taxa due to ecological differentiation.

A further set of techniques that should become more commonplace in the spatial analysis toolbox of phylogeographers is the generation of friction surfaces (also referred to as cost surfaces). A friction surface can be defined as an attribute of the Earth’s surface that slows the distribution of an organism (DeMers, 2002). Broadly defined, a friction surface could be the Euclidean distance between two points on a map or the change in elevation, temperature, precipitation etc. between two points on a map. An example of a Euclidean distance friction surface and an elevation friction surface between hypothesized North American Pleistocene refugia is shown in Fig. 7. In the North American example given here, regions of phylogeographic break clustering such as the south-eastern United States may be best defined by a Euclidean distance friction surface suggesting that the clustering of terrestrial phylogeographic breaks in this location is primarily the product of it being the geographic mid-point between two refugia. Conversely, if a study were to find a large number of phylogeographic breaks in the Rocky Mountains, the elevation friction surface would serve as a better predictor of clustering. Alternative friction surfaces, which utilize climatic variables could, and should, be generated in such a study to more fully understand the mechanisms controlling for phylogeographic concordance. Combined, the above two techniques of predictive niche models and friction surface generation may provide very useful and novel information regarding the historical and present day factors influencing the positioning of phylogeographic breaks and past refugia.

Although the usefulness of GIS-based mapping of multiple phylogeographic breaks and predictive niche modelling is clear, there are two major methodological challenges that will need to be addressed in the near future. The first concerns predictive niche modelling. The models assume that the sample points adequately sample and estimate the entire multi-dimensional realized niche of the species, but we do not know the true fundamental niche (Hutchinson, 1957). This is problematic for studies that estimate present day ranges, but it is even more problematic when estimating the ancestral distributions, if the historical and biotic forces that presently restrict ranges were absent. The consequences of this type of scenario would be that predictive niche models would project a species to have an overly restricted ancestral range. This would lead the researcher to erroneously conclude that there was historically less overlap between the ranges of populations or species. Thus caution in the interpretation of niche models is required and experiments that test the physiological tolerances of the study species in the laboratory would prove useful in supporting model predictions.

A second methodological challenge for the integration of GIS and phylogeography will be the generation of appropriate null models to test spatial hypotheses regarding phylogeographic concordance. Spatially explicit null models that generate multiple random distributions of phylogeographic breaks will be required. As mentioned above Soltis et al. (2006) have taken the first steps towards solving this problem by randomizing the location of phylogeographic breaks in their study region to generate a null distribution. An alternative to this approach will be to incorporate algorithms that use cellular automata. This would be done by randomly placing a point or kernal into the study region and allowing it to ‘grow’ at random until it achieved the same area as the observed phylogeographic break. This process would then be carried out for all phylogeographic breaks in the study. The maps would then be summed giving a single map that predicts the number of breaks in each grid cell. This would then be iterated multiple times giving multiple ‘random’ maps of phylogeographic break clustering. This collection of ‘random’ maps would then serve as the null distribution to which the observed map could be compared. Cellular automata algorithms such as the one described here, that are compatible with GIS file formats, currently exist and are beginning to be used to test various biogeographic hypotheses regarding range sizes and latitudinal gradients in species diversity (Jetz & Rahbek, 2001). These methods should be somewhat easily adaptable to the field of phylogeography and would make quantitative assessments of phylogeographic concordance much more rigorous and are therefore strongly encouraged.

Speciation

The study of speciation has often involved the relative importance of allopatry vs. sympatry. Allopatric speciation has traditionally enjoyed more attention, but recent evidence of sympatric divergence due to mechanisms such as host-shifting (e.g. Bush, 1969, 1975) and ecological divergence (e.g. Schluter, 1998) has spawned many research programmes that aim to investigate sympatric speciation. All allopatric models and some models of sympatric speciation contain information and predictions regarding the spatial distribution of sister taxa, hosts, or environments (Coyne & Orr, 2004). It is here where a GIS holds the greatest potential to assist in a speciation research programme. Below, I will briefly outline potential avenues of speciation research that involve a GIS with a special focus on allopatric speciation.

Allopatric models of speciation generate multiple predictions regarding the spatial distribution of species and their range boundaries. For example, Coyne & Orr (2004) outline six lines of geographic evidence of allopatric speciation. One such line of evidence is the geographic clustering of hybrid zones (i.e. suture zones sensu stricto). Previous studies have quantified the amount of spatial clustering of hybrid zones in North America (Swenson & Howard, 2004, 2005; Swenson, 2006), yet regions such as Europe where hybrid zone clustering is likely (Hewitt, 1996, 1999, 2000, 2001; Comes & Kadereit, 1998; Taberlet et al., 1998) and regions where Pleistocene refugia and hybrid zone clustering are less likely (Endler, 1982; Whinnett et al., 2005) have yet to be analysed using a GIS.

Other lines of evidence that would support an allopatric model of speciation involve quantifying the distribution of species range boundaries. One example is the clustering of species range borders along geographic or climatic barriers (Coyne & Orr, 2004). Presently, there are large data sets of digital range maps that are in GIS file formats. These include the mammals and birds of the New World (http://http://www.natureserve.org), a majority of the North American trees species (http://http://www.usgs.gov), and the amphibians of the world (http://http://www.natureserve.org). Each of these data sets has been generated using preexisting hand drawn maps, museum specimens and expert evaluation making them an invaluable resource to conservationists, ecologists and evolutionary biologists. Through the use of such digital maps, it is possible to generate linear representations of the borders of polygons that represent species ranges for thousands of species. The concordance of these borders with digital maps of topography and climate can then be quantified all inside a GIS. In Fig. 8, I present an example of all the mammalian range boundaries in South America where the density of boundaries, the concordance of boundaries with abiotic variables, and the relatedness of concordant boundaries can be quantified and subjected to null modelling analyses inside a GIS (M.D. Weiser & N.G. Swenson, unpublished data). The major challenge in mapping range boundaries will be to determine the spatial resolution at which the range maps are reliable as well as the spatial resolution at which physiographic and climatic maps are reliable (e.g. Qian & Ricklefs, 2004). For example, a large enough grid cell could potentially cover the entire Andean elevational gradient. This would bin multiple range boundaries into one grid cell when in actuality they are spread across that gradient and cluster on much finer spatial scales. Thus care must be taken in future studies to understand and justify the choice of the spatial scale used in a GIS-based evolutionary analysis.

Figure 8.

 A map of the geographic range boundaries of all of the mammals that have at least part of their range in South America. The map was generated inside a GIS using freely available digital range maps (http://http://www.natureserve.org). Separate colours represent separate range boundaries.

Conclusions

Geography has played a major role in generating evolutionary hypotheses in general and maps have often been used to support these hypotheses in particular. Initially, evolutionary biologists were constrained to qualitative assessments of geographic patterns due to a general lack of the sophisticated technology and analytical tools necessary to store and analyse large spatial data sets. Over the last two decades, the tools and technology necessary have become widely available to the public in the form of a GIS. As a result, analyses of geographic patterns are no longer constrained and should no longer be performed qualitatively. Here I have reviewed some of the recent articles in the hybrid zone, phylogeographic and speciation literature that have taken the first steps towards quantitative analyses of geographic patterns and spatial modelling using a GIS. These studies therefore represent the first steps away from qualitative assessments of maps. I have also proposed potential avenues of future GIS-based research in these fields while highlighting pervasive conceptual issues that will challenge us along the way. It is my hope that this article has given evolutionary biologists the necessary background in GIS to inspire them to incorporate GIS into their future research and to take an active part in the rapidly accelerating integration of GIS into hybrid zone, phylogeographic and speciation research.

Acknowledgments

I am extremely grateful for receiving generous funding from the following sources to support my GIS research: the Geospatial Information and Technology Association, Sigma Xi, NSF-EPSCor with Los Alamos National Laboratory, the University of Arizona Institute for the Study of Planet Earth, and the United States Geological Survey Geographic Analysis and Monitoring Program.

I would especially like to thank Dan Howard for having the foresight to encourage me to integrate GIS into my research. I would also like to thank Mike DeMers for introducing me to GIS and for his excitement for exploring the usefulness of this technology in evolutionary biology. I would especially like to thank Jason Pither for extended discussions regarding how a fundamental – realized niche mismatch may be problematic in niche modelling. I thank Mike Weiser and Jason Pither for useful discussions that helped shape my thoughts on the future of GIS in evolutionary biology. Thanks to Doug Soltis and Jason McLachlan for discussions regarding the spatial null models used in their research. This manuscript benefited from comments provided by anonymous reviewers.

Ancillary