Reconstructing ancestral ranges in historical biogeography: properties and prospects

Authors


Abstract

Abstract  Recent years have witnessed a proliferation of quantitative methods for biogeographic inference. In particular, novel parametric approaches represent exciting new opportunities for the study of range evolution. Here, we review a selection of current methods for biogeographic analysis and discuss their respective properties. These methods include generalized parsimony approaches, weighted ancestral area analysis, dispersal–vicariance analysis, the dispersal–extinction–cladogenesis model and other maximum likelihood approaches, and Bayesian stochastic mapping of ancestral ranges, including a novel approach to inferring range evolution in the context of island biogeography. Some of these methods were developed specifically for problems of ancestral range reconstruction, whereas others were designed for more general problems of character state reconstruction and subsequently applied to the study of ancestral ranges. Methods for reconstructing ancestral history on a phylogenetic tree differ not only in the types of ancestral range states that are allowed, but also in the various historical events that may change the ancestral ranges. We explore how the form of allowed ancestral ranges and allowed transitions can both affect the outcome of ancestral range estimation. Finally, we mention some promising avenues for future work in the development of model-based approaches to biogeographic analysis.

Biogeography is the study of the distribution of organisms in space and time (Wiley, 1981), a simple definition that belies the breadth and complexity of the field. Foundational efforts to describe and explain the observed relationships among geology, geography and biology appear in 18th century scientific published works, and a close relationship between biogeography and speciation processes was recognized concomitant with the articulation of evolution through natural selection. The advent of phylogenetic methods, plate tectonic theory and molecular systematics motivated corresponding advances in the field of biogeography, leading to the development of quantitative methods for biogeographic analysis (Lomolino et al., 2004). In recent years, there has been growing interest in model-based parametric approaches to biogeographic inference. These approaches are at an early stage, yet they represent exciting new possibilities for quantitative analysis in biogeography. Here, we summarize several methods that are in current use and explore their respective properties. It is our hope that this review will be instructive, but we do not intend it to be exhaustive.

In particular, we do not address the topic of phylogeography, which is a branch of biogeography that explicitly incorporates genealogical information and focuses on populations comprised by a single species or a few closely related species (Avise, 2000, 2009). However, we note that advances in coalescent-based population genetic models have led to the development of novel statistical methods in phylogeographic analysis (Knowles & Maddison, 2002; Templeton, 2004; Knowles, 2009). Phylogeographic approaches may be used to explore patterns of species distribution and aspects of range evolution (e.g., Weaver et al., 2006; Lemmon & Lemmon, 2008; Pearman et al., 2008). These approaches become indispensable at scales where genealogical information below the species level is important; moreover, better understanding of recent range dynamics could be used to inform models seeking to address events in the more distant past.

The link between range evolution and speciation has long been recognized. Allopatric speciation is the phenomenon by which populations are physically isolated by an extrinsic barrier and subsequently evolve intrinsic reproductive isolation. This is in contrast to sympatric speciation, which is the genetic divergence of a single parent species into daughter populations inhabiting an identical geographic region, so that the daughter populations achieve intrinsic reproductive isolation and speciate in the absence of an extrinsic physical obstacle. Allopatric speciation is generally agreed to be the most common mechanism by which new species arise, whereas the relative importance of sympatric speciation is debated (Platnick & Nelson, 1978).

Allopatry may be achieved when a species is subdivided into two disjunct populations by some event that splits the parent range into two distinct daughter ranges. Speciation then occurs as the populations confined to each daughter range achieve intrinsic reproductive isolation and can no longer interbreed even if they are brought back into contact with one another. If the two daughter populations are large, adaptive evolution may be the primary mechanism by which reproductive isolation is achieved (Hartl & Clark, 2007). The situation where a barrier arises and subdivides the range of a widespread species into two disjoint ranges is generally termed vicariance.

Alternatively, allopatry may be accomplished when a number of individuals undergo dispersal across a pre-existing barrier and colonize a new, isolated area. Here, the dispersing population is likely to be small, and if the area to which it disperses is unlikely to receive new migrants, gene flow is expected to be low, and speciation is expected to occur rapidly. We follow the example of Clark et al. (2008) and refer to this scenario as dispersal-mediated allopatric speciation.

In this paper, we follow the convention of referring to the entire geographic distribution of a taxon as that taxon's range. Areas are geographic units defined for the purpose of analysis, and a range may comprise one or more areas, depending on the assumptions of the method of biogeographic inference being applied. These assumptions are discussed in more detail below.

Multiple explicit optimization techniques have been proposed to infer range evolution in biogeographic analysis. Some of these approaches were developed specifically for problems of ancestral range reconstruction, and others were designed for other problems of character state reconstruction. These methods have varying assumptions, and comparative studies reveal that substantially different results may be obtained by applying different methods to the same dataset (e.g., Clark et al., 2008; Drummond, 2008).

When evaluating the performance of ancestral range reconstruction methods applied to empirical systems, the actual sequence of events leading to the observed ranges cannot be known. Therefore, results must be interpreted in the context of expectations regarding ancestral ranges. In general, it is expected that ancestral ranges should be similar in spatial extent to those of living species (Bremer, 1992; Hausdorf, 1998; Clark et al., 2008). Moreover, lineages that are endemic to a particular area and that arose through a single incidence of dispersal-mediated allopatric speciation cannot have arisen before that area was available to colonize (Price & Clague, 2002). It is desirable for a method to indicate the possibility of each area belonging to the ancestral range, with a negligible prospect of inclusion obtained for areas that could not have been occupied.

1 Overview of methods

1.1 Generalized parsimony approaches

Under a generalized parsimony approach, the reconstruction that requires the fewest changes in character state over the phylogenetic tree is preferred (Maddison & Maddison, 2003). Generalized parsimony assigns a cost for the transformation of any character state into each of the other possible character states. All costs may be equal, or, if there is reason to expect that transitions between some states might occur more frequently than changes between others, character state transitions may be differentially weighted.

Geographic ranges may be coded as multiple binary-state characters to denote the presence/absence of a taxon in each area. This allows the reconstruction of ancestral areas that are polymorphic with respect to range (Hardy & Linder, 2005; Harbaugh & Baldwin, 2007). However, geographic ranges are more often coded as discrete, multistate characters that do not allow for ranges spanning more than one area (e.g., Clark et al., 2008). Here, where the directional costs of transitions between states are weighted equally, no assumptions are made about whether change occurs along a branch connecting an ancestor to its descendant, or coincident with a speciation event at an internal node. If change is presumed to occur along a branch, dispersal to a new area followed by extinction in the original area is implied. In contrast, dispersal-mediated allopatric speciation is suggested if the change is presumed to occur at an internal node (see Fig. 1).

Figure 1.

A hypothetical example of a parsimony approach to ancestral area reconstruction in a two-island system comprising areas A and B. In this example, geographic ranges are coded as discrete, multistate characters that do not allow for ranges spanning more than one area, and the directional costs of transitions between states are weighted equally. Hence, no assumptions are made about whether change occurs along a branch connecting an ancestor to its descendant, or coincident with a speciation event at an internal node. A, Change is presumed to occur along branches. This implies (i) a speciation event within the original area A and subsequent dispersal of the novel species to a new area B. This is followed by (ii) an extinction of the novel species in area A. B, In contrast, dispersal-mediated allopatric speciation is suggested if the change is presumed to occur at an internal node. Here, some individuals from area A undergo dispersal to area B, and speciation occurs shortly thereafter. Dagger, extinction event; dot, species; open arrow, dispersal event.

Such approaches have been criticized for their inability to indicate the probability of estimated ancestral states, as well as for failure to incorporate branch length information (Cunningham et al., 1998; Nielsen, 2002; Huelsenbeck et al., 2003; Nepokroeff et al., 2003). For the latter reason, change is underestimated when it occurs frequently relative to speciation.

When range evolution scenarios are complicated, involving frequent dispersal and extinction relative to speciation and including a large set of areas distributed in a complex pattern across the tips of the tree, these parsimony approaches tend to recover a large number of equally most parsimonious reconstructions. Hence, there may be low potential for these methods to detect phylogenetic signal in datasets with these properties (Nepokroeff et al., 2003; Clark et al., 2008). However, when few geographic areas are considered, the distribution of these areas across the tips of the cladograms is not complex, and there is little reason to be concerned with underestimating the amount of change, these approaches may be of interest (e.g., Ksepka & Clarke, 2009). As indicated by Ree and Smith (2008), high rates of evolution pose a challenge not only for parsimony-based approaches, but also for all inference types.

1.2 Weighted ancestral area analysis (WAAA)

WAAA (Hausdorf, 1998) is an interesting but conceptually flawed method for reconstructing ancestral areas under parsimony. We address it here because it is not infrequently seen in published reports (e.g., Swenson et al., 2000; Outlaw et al., 2007). WAAA seeks to assign higher weight to areas occupied by “ basal” or “early-diverging” lineages, but, as we discuss below, its implementation is predicated on a problematic interpretation of tree topology.

In WAAA, the ranges of extant species and their ancestors are described in terms of a set of predefined unit areas. A worked example is presented in Fig. 2. A ratio of weighted gain steps to weighted loss steps is calculated for each area at each node on a cladogram. Areas for which the value of this ratio is high are considered more likely to be part of the ancestral area than are those for which this value is lower. Ancestral ranges might therefore comprise multiple areas. To reduce the number of areas recovered as ancestral for each node, a minimum threshold value for this ratio may be established.

Figure 2.

A worked example of weighted ancestral area analysis in a two-area system comprising areas A and B, after Hausdorf (1998). A, Each area is optimized onto the tree, under the assumption that it was not part of the ancestral range. Changes are presumed to occur along branches. B, For each area, the weighted gain steps (GSW) are computed. Count the number of times n that the area is gained under the optimization from (1). For each gain i, count the number of nodes xi between that gain and the common ancestor. GSW =∑ni=11/xi. C, Each area is again optimized onto the tree, this time under the assumption that it was part of the ancestral range. D, For each area, the weighted loss steps (LSW) are computed as for (2). E, The probability index (PI) is calculated for each area. PI = GSW / LSW. In this example, the PI for area A is an order of magnitude greater than the PI for area B. This is interpreted to mean that area A is much more likely than area B to be part of the ancestral range. We emphasize that this approach relies on a problematic interpretation of tree structure.

This approach attempts to indicate the relative chances of different areas belonging to the ancestral range of some monophyletic group, while also yielding ancestral ranges that are about the same size as those observed for extant taxa. It relies on the assumption that an area is more likely to be part of the ancestral range when that area is observed to be occupied more frequently by the descendants of that ancestor, and that “basal” branches of the tree are more informative about ancestral ranges (Hausdorf, 1998).

However, it appears that the latter statement contains a misperception regarding tree structure, predicated on a problematic understanding of what it means to be “basal.” We attempt to clarify this point in Figs. 3 and 4. When a lineage has fewer extant taxa than its sister group, it is not correct to consider the species-poor lineage to be “basal” relative to its species-rich sister, because both groups arise simultaneously from their most recent common ancestor, as depicted in Fig. 4 (Crisp & Cook, 2005). Noting that branch length information is not considered in WAAA, it seems that this method relies on the troubling assumption that a trait found in the species-poor sister clade is more likely to represent a primitive condition present in the common ancestor of both groups (Santos, 2007). In fact, Hausdorf concludes his 1998 paper by stating, “If this assumption is wrong, then the results of the weighted ancestral area analysis will be wrong too.”

Figure 3.

A,“Basal” and “derived” as directional terms on a rooted tree. Relative to a reference node (here denoted by a star), if an internal node can be reached by traveling along a one-directional path toward the tips of the tree, then that node is derived. If an internal node can be reached by traveling along a one-directional path toward the root of the tree, then it is basal. If an internal node cannot be reached by a one-directional path, then it can be considered neither basal nor derived relative to the reference node. B, Here, we attempt to clarify what is meant when the term “basal clade” is used. Clade A contains species A1 and A2, depicted in dark grey. Clade B contains species B1–B6, depicted in light grey. The basal divergence between clades A and B is represented by a dark grey dot, and clade A is the less speciose of the two clades subtended by the basal divergence. If A is referred to as the “basal clade,” it should be emphasized that a species in clade A is no more likely to represent a primitive condition present in the common ancestor of clades A and B than is a species belonging to clade B.

Figure 4.

It can be misleading to describe a species-poor clade as “basal” relative to its species-rich sister, because tree balance need not remain constant through time. A, A hypothetical evolutionary tree comprising clade A, depicted in dark grey, and clade B, depicted in light grey. Here, the letters A and B denote members of the respective lineages A and B. B, At time t0, clades A and B originate at a speciation event. Note that both A and B are the same age. C, At time t1, there are the same number of species in clade A as there are in clade B. When only the species that are extant at time t1 are depicted, the tree is balanced; that is, there are the same number of species on each side of every node. D, At time t2, clade A contains one species and clade B contains four species. When only the species that are extant at time t2 are depicted, the tree is not balanced. This scenario should not be interpreted to imply a progression from “basal” on the left to “derived” on the right. E, At time t3, clade A contains four species and clade B contains one species. Compared to time t2, the tree balance is now reversed.

1.3 Dispersal–vicariance analysis (DIVA)

DIVA (Ronquist, 1996, 1997) seeks to model processes of range evolution in a parsimony framework. The ranges of extant species and their ancestors are described in terms of a set of predefined unit areas. Each range might comprise multiple areas, and ancestral ranges are estimated by minimizing the tree length under the specified cost matrices.

In DIVA, dispersal events (Fig. 5: A) incur a cost of one for each area added to a range, and extinction events (Fig. 5: B) cost one for each area deleted from a range. Speciation is assumed to occur in one of two ways. In the first, vicariance may separate a wide ancestral range into exactly two mutually exclusive sets of areas, each of which is inherited by one of the two daughter taxa, as depicted in Fig. 5: C. In the second way, if an ancestral range is restricted to a single area, speciation within the area may give rise to exactly two daughter taxa sharing that same area, shown in Fig. 5: D. (Note that this latter scenario does not explicitly differentiate between sympatric speciation and allopatric speciation occurring within subsections of the area.) In both cases, speciation events cost nothing. DIVA does not include a mechanism by which two daughter species may identically inherit an ancestral range comprising multiple unit areas; instead, secondary dispersal events are required to explain this scenario, as shown in Fig. 5: E.

Figure 5.

The rules by which dispersal–vicariance analysis reconstructs ancestral distributions. A, Dispersal costs one per area added to a distribution. B, Extinction costs one per area deleted from a distribution. C, Where speciation occurs by vicariance separating a widespread ancestor into two mutually exclusive sets of areas, a cost of zero is incurred. D, A species occurring within a single area might speciate within that area, giving rise to two descendants occupying the same area. This event also has a cost of zero. E, When an ancestral species has a range comprising more than one unit area, and each of the two descendant species has the same distribution as the ancestor, the cost is equivalent to the number of secondary dispersals needed for two initially allopatric descendants to come to occupy the same set of unit areas as the ancestor. In this example, a cost of two is incurred. Letters A–D refer to areas. Dagger, extinction event; dot, species; open arrow, dispersal event.

Because the optimal ancestral ranges are those that minimize the number of implied dispersal and extinction events, DIVA is biased against early dispersal and will tend to reconstruct wide ancestral ranges (Ronquist, 1996; Ree et al., 2005). In the manual for DIVA 1.1 (Ronquist, 1996), it is suggested that users who wish to obtain ancestral range estimates similar in size to the ranges observed for extant species should include additional outgroups in the analysis, so that the ancestral node of interest is no longer the root or as close to the root. Alternatively, it is possible to restrict the maximum number of areas allowed for the ancestral range. This latter approach is more often seen in practice; however, restricting the maximum number of areas allowed for the ancestral range could lead to unrealistic discontinuous geographic distributions for basal nodes (Clark et al., 2008; Santos et al., 2009) or a large number of equally parsimonious reconstructions that cannot be meaningfully interpreted (Santos et al., 2009).

1.4 Dispersal–extinction–cladogenesis (DEC) model

The DEC model (Ree et al., 2005; Ree & Smith, 2008) is a continuous-time model for geographic range evolution that enables the inference of ancestral ranges in a likelihood framework. Ranges are described in terms of a set of predefined unit areas, and each range might include multiple areas. Range expansion and contraction events are caused by dispersal into a previously unoccupied area and local extinction within an area, shown in Fig. 6: A and Fig. 6: B, respectively. These are treated as stochastic processes with exponential rate parameters. The expected number of each kind of event along a phylogenetic branch is proportional to branch length.

Figure 6.

The area transitions allowed by the dispersal–extinction–cladogenesis model when reconstructing ancestral distributions. A, Dispersal from an occupied area A into a previously unoccupied area B. B, Extinction in previously occupied area B. C, A species occurring within a single area might speciate within that area, giving rise to two descendants occupying the same area. D, Where the range of ancestral species comprises multiple areas A and B, one of the daughter species may inherit a single area A, and the other inherits the remainder of the ancestral range B. E, Alternatively, where the range of ancestral species comprises multiple areas A and B, one daughter species may inherit a single area A, while the other daughter species identically inherits the ancestral range AB. F, Secondary dispersals are required to explain the situation where an ancestral species has a range comprising more than one unit area and each of the two descendant species has the same distribution as the ancestor. G, Similarly, a widespread ancestral range comprising more than three unit areas cannot be separated into two mutually exclusive sets of multiple areas in a single step. Letters A–D refer to areas. Dagger, extinction event; dot, species; open arrow, dispersal event.

If an ancestral range is limited to a single area, then that area is inherited identically by both daughter lineages, as depicted in Fig. 6: C. If the ancestral range comprises multiple areas, then one of two scenarios is permitted. In the first, shown in Fig. 6: D, one daughter lineage inherits a single area and the other daughter lineage inherits the remainder of the ancestral range. In the second scenario, shown in Fig. 6: E, one daughter lineage inherits a single area while the other daughter lineage inherits the entire ancestral range. These scenarios of range subdivision and inheritance are inferred for the internal nodes of the phylogenetic tree.

Unlike DIVA, DEC does not include a mechanism for the vicariance event in which a species’ range is subdivided into two daughter ranges each comprising multiple unit areas. Under the DEC model, secondary dispersal and extinction events are required to explain such a distribution (Fig. 6: G). The authors suggest that inferring such an event invokes a particular geographic history without consideration of the spatial and temporal context in which the event occurred (Ree et al., 2005). They argue that speciation events involving only single areas are less likely to involve this kind of ad hoc hypothesis. Likewise, DEC does not include a mechanism by which two daughter species might identically inherit an ancestral range comprising multiple unit areas. Secondary dispersal is required to explain this scenario, as shown in Fig. 6: F.

DEC does not include speciation rate as a free parameter. Moreover, dispersal rate does not affect the geographic pattern of divergence within versus between areas. Hence, it is not possible to directly infer instances of dispersal-mediated allopatry, where divergence between areas is expected to follow dispersal, under the DEC model (Ree & Smith, 2008).

Given a phylogeny, a set of observed ranges for the terminal taxa and a DEC model, dispersal and extinction rates are optimized using maximum likelihood, integrating over all possible ancestral states for range inheritance. These rates are then fixed, and the likelihood of the data is iteratively recalculated for each ancestral state at each internal node. This does not condition on assumptions regarding ancestral states elsewhere in the tree. Alternative scenarios for ancestral range inheritance can thereby be ranked by their contributions to the overall likelihood.

It has been suggested that DEC could be thought of as a parametric, extended version of DIVA (Ree & Sanmartín, 2009). Like DIVA, the DEC model allows both dispersal and extinction. However, the costs for rare events like extinction and dispersal have been replaced with rate parameters that may be estimated from the data; a high rate for an event corresponds to a low cost. In the DEC probability model, all unit areas have the same extinction rate, and extinction in each area is independent of events in other areas. This approach is comparable to that taken by DIVA, in which a fixed cost is incurred for each extinction event by which an area is removed from a range, no matter how many areas are occupied. However, in DIVA the cost of extinction is always one, whereas the extinction rate that corresponds to this in the DEC model can vary.

Similarly, the DEC model assumes that the rate of dispersal from a particular unit area in a range is identical for all unit areas, and is independent of dispersal and extinction events in other occupied areas. When a dispersal event occurs, this dispersal event is assigned uniformly at random to one of the available dispersal routes between areas, and is then chosen to succeed with some probability that depends on both the specific route and on time. The chance of selecting a particular connection decreases when more connections are added, but the sum of the rates of dispersal across connections remains constant. This has a somewhat peculiar side-effect; namely, the addition of dispersal routes with zero probability of success can lower the rate of successful dispersal by competing with other dispersal routes that are more likely to lead to success.

For reasons explored in a later section of this paper, DEC tends to reconstruct wide ancestral ranges. Like DIVA, DEC allows ranges that are much larger than those for extant species to be removed from consideration. Additionally, it permits the exclusion of implausible distributions, such as those that are geographically discontinuous or inconsistent with the known history of a region.

Moreover, scale parameters that affect overall dispersal rate may be introduced to limit dispersal between areas. This is intended to accommodate cases where specific biological scenarios motivate particular dispersal regimes. For example, scaling the rate to zero between non-adjacent areas only allows dispersal to occur between areas that are adjacent to one another, and scaling the rate inversely to distance favors short-range dispersal over long-range dispersal (Ree & Sanmartín, 2009). Separate scaling matrices can be introduced for discrete time periods, allowing different expectations for dispersal opportunity through time to be considered.

We note that the DEC model could be extended in various other ways. For instance, the extinction rate could be allowed to vary for each area, so that extinction would be more likely in areas encompassing less favorable environments. However, the probabilistic model used in DEC is intentionally kept simple to reduce the number of parameters, as there is often not enough data available to estimate a larger number of parameters.

1.5 Borrowing models from biological sequence evolution

Likelihood models used to study the evolution of biological sequences have been co-opted for the reconstruction of ancestral geographic ranges (e.g., Nepokroeff et al., 2003; Pereira et al., 2007). Such approaches incorporate a number of assumptions that are germane to problems of sequence evolution, and as we discuss below, these methods can be applied to a subset of biogeographic scenarios for which these assumptions are met. In many situations, however, geographic range evolution is not analogous to the evolution of sequences across phylogeny. In such cases, models borrowed from the study of sequence evolution are not adequate, and meaningful interpretation of results could be difficult to achieve.

Under likelihood approaches, range evolution can be described using a continuous time Markov model of evolutionary change. Areas are taken to be discrete character states, and ranges are restricted to each comprise a single area. A lineage may either switch from one area to another or remain in the same area, according to a certain probability distribution. The probability of changing from one area to another is assumed to depend only on the area currently occupied.

Likelihood approaches can incorporate branch length information and consider the rate of change along each branch. Therefore, they are expected to function more realistically than parsimony methods in situations where change is frequent relative to speciation. Transition rates from each geographical area to every other area can be estimated through maximum likelihood (e.g., Nepokroeff et al., 2003). Given a Markov model with these transition rates specified, a phylogeny with information about branch lengths, and the observed ranges for extant species, the relative probability of each ancestral range can be calculated.

Transition rate estimates come with some associated error, but in order for the relative probabilities for each ancestral state to be valid, it must be assumed that the transition rates are exactly known. Similarly, the phylogeny on which ancestral areas are inferred reflects a consensus for a set of plausible alternative trees, as the true tree cannot be exactly known. As formulated above, these likelihood approaches ignore both phylogenetic uncertainty and uncertainty in transition rate estimates (Ronquist, 2004).

1.6 Stochastic mapping

Bayesian stochastic mapping (Jensen & Pedersen, 2000; Nielsen, 2002; Huelsenbeck et al., 2003) estimates the probability of character state transformations by accommodating uncertainty in the rate of evolution and in phylogenetic relationships. In Bayesian analysis, transition rate is allowed to vary over a range of possible values defined by a prior probability, and a posterior probability proportional to the prior probability multiplied by the likelihood is obtained. When the probabilities of ancestral states are determined, each value for the transition rate is weighted according to its posterior probability. This approach accounts for uncertainty in transition rate estimates.

Likewise, phylogenetic uncertainty can be taken into account by considering a set of plausible alternative trees rather than a single consensus. These trees can be generated by sampling the posterior of a Bayesian phylogenetic analysis using Markov chain Monte Carlo. Instead of obtaining a single transition rate estimate for each tree in this set, each tree would be sampled for all possible values for the transition rate, with each combination of tree and rate being sampled proportionate to its posterior probability. After obtaining a sample of joint probabilities, the marginal distribution for the transition rate can be calculated. Thus, the Bayesian marginal distribution of transition rate accounts for both phylogenetic uncertainty and uncertainty in transition rate estimates.

In all implementations of which we are aware, stochastic mapping of range evolution presupposes that ancestral ranges are restricted to a single area. Moreover, it is assumed that extinctions or vicariance events have not subdivided ancestral ranges (e.g., McGuire et al., 2007; Dacosta & Klicka, 2008). This restriction makes sense in the context of biological sequence evolution, where, for example, it is not meaningful to think of a nucleotide residue occupying states A and T at the same time. However, in studies of range evolution, terminal taxa might have ranges comprising multiple geographic areas. Some authors seek to circumvent this limitation by defining additional composite area states (e.g., Pereira et al., 2007), but this does not correspond to a biologically motivated scenario. Such workarounds highlight the limitation of implementations that have been co-opted from other disciplines and directly applied to the study of range evolution. Precluding the reconstruction of ranges comprising multiple areas is undesirable when attempting to infer ancestral ranges that are expected to comprise multiple areas. However, this limitation could be less problematic if one is interested in the general dispersal patterns of multiple clades among disjunct areas, such as island systems, watershed drainages, or mountain peaks separated by intervening valleys. These types of areas are effectively isolated from one another by barriers to gene flow. Even though dispersal events are modeled along phylogenetic branches and not at nodes, they could be argued to roughly correspond to speciation events, because allopatric speciation is expected to rapidly follow dispersal in such a regime.

In a Bayesian approach to island biogeography proposed by Sanmartín et al. (2008), the transition rates between unit ranges are allowed to differ, and are estimated from observed data. In this model, lineages undergo transitions between unit ranges determined by parameters for dispersal rate and equilibrium frequencies of species diversity. This method uses data for multiple groups of species to infer shared patterns of movement and distribution of species in an island system. For each group of species, a DNA sequence alignment and set of ranges are required. Separate molecular parameters and phylogenies are estimated for each group of species. Thus, each dataset is allowed to evolve on its own topology and set of branch lengths; biogeographic parameters are estimated across all groups together. A Markov chain Monte Carlo approach is used to sample biogeographic parameters, molecular parameters, branch length, and tree topology simultaneously, obtaining estimates of their posterior distribution given the data. Thus, it is possible to obtain marginal probabilities of the biogeographic parameters that do not condition on any particular phylogeny or set of branch lengths.

Because rate of molecular evolution is unlikely to be the same for all groups of species, branch lengths are scaled according to group-specific molecular clocks. Additionally, scaling parameters that account for differences in dispersal rate across groups are used to obtain the expected number of dispersal events per unit time. We note that the transition rates in this model explicitly account for dispersal and subsequent survival in the new area, but do not take into account the fact that range transitions may require an extinction event in the parent range, as described in Fig. 7.

Figure 7.

A, In a single area model, transitions of the form A → B imply a wholesale relocation of a species from one island to another. B, Alternatively, the true history may consist of multiple biological events, such as a range expansion event A → AB followed by a local extinction event AB → B. Letters A and B refer to areas. Dagger, extinction event; dot, species; open arrow, dispersal event.

2 State transitions: what types of historical range change events do methods allow?

As indicated above, methods for reconstructing ancestral history on a phylogenetic tree differ in the types of ancestral range states that are allowed. Moreover, they differ in the various historical events that they allow to change the ancestral ranges. The form of allowed ancestral ranges and the allowed transitions can both affect the outcome of ancestral range estimation, as we discuss below.

The methods that we consider can be broadly divided into methods like DIVA and DEC, which allow ancestral ranges to include several unit areas, and methods like stochastic mapping, which only allow ancestors to occupy a single area at a time. This relatively simple difference can lead to different types of range transitions, and can change how these transitions correspond to historical events such as dispersal, local extinction, and vicariance.

2.1 Two general forms of area transitions

To illustrate this point, we note that range transitions can occur either on a branch or at an internal node. If range changes occur on a branch, then they must be of the form R1→ R2. Range transitions that occur at internal nodes are more complex. To see why this is the case, consider that transitions at internal nodes correspond to a speciation event in which an ancestral species yields two daughter species that have not gone extinct and also happen to be sampled. Transitions at internal nodes are therefore of the form R1→ (R2a, R2b), in which the range before speciation yields two daughter ranges R2a and R2b that are the ranges of the two daughter species.

2.2 Transitions in single area models, and their biological interpretation

Given this formalism, we can see that methods requiring a single ancestral range must have area transitions of the form A→B occurring on branches or transitions of the form A → (B, C) occurring at internal nodes, where A, B, and C each represent a unit area. Many methods have the former, but we are not aware of any models that have the latter. In what has sometimes been termed the punctuational evolution model (Nepokroeff et al, 2003), a likelihood approach in which all branch lengths are set to one has been argued to reflect the case where evolutionary change occurs only at speciation. However, all area transitions must still occur on branches and not at nodes under the punctuational evolution model, and are thus of the form A → B and not A → (B, C).

We suggest that transitions of the form A → B may not correspond to single biological events. Especially in island models, it is unlikely that a species would simply move entirely from one unit area to another, as depicted in Fig. 7: A. Instead, it seems likely that when events of this form are inferred, the true history may consist of a range expansion event A → AB followed by an eventual local extinction event AB→B, as in Fig. 7: B. This scenario could be slightly modified to include the situation where the range AB is disconnected and leads to rapid speciation of two daughter species in the areas A and B, followed by eventual extinction of all descendants of the A lineage.

For transitions occurring at internal nodes, it would seem natural for events of the form A → (A, B) to occur in situations where gene flow between regions A and B is low enough to lead to rapid speciation following dispersal.

2.3 Transitions in multi-area models

In multi-area models, the types of transitions are much less restricted. DIVA allows transitions on branches that represent dispersal (area gain; see Fig. 5: A) or local extinction (area loss; see Fig. 5: B), and transitions at internal nodes that represent area vicariance. When the ancestral area is composed of a single unit range, the vicariance event is assumed to divide the ancestral area into smaller ranges that are not distinguishable, leading to range inheritance of the form A → (A, A) as depicted in Fig. 5: D. When the ancestral range includes multiple areas, the ancestral range is divided into disjoint daughter ranges R1∪ R2→ (R1, R2), as shown in Fig. 5: C. This scenario is intended to correspond to a historical event in which a geographic obstacle divides a species range into two parts between which gene flow is no longer possible. The DIVA approach tends to reconstruct widespread ancestral ranges at the root of the tree, a side-effect of the fact that the vicariance explanation estimates the ancestral range by agglomerating daughter ranges when the daughter ranges are disjoint. Dispersal events have an additional cost and need only be invoked when daughter ranges overlap. Furthermore, extinction events are rarely inferred if the range of the ancestors is not restricted.

The DEC model allows dispersal (Fig. 6: A) and extinction (Fig. 6: B) transitions along branches just as DIVA does, but differs in the transitions that are allowed to occur at internal nodes. The DEC model does not include a mechanism for the vicariance scenario in which a species’ range is subdivided into two daughter ranges each comprising multiple unit areas. Thus, one of the two daughter species must have a range consisting of a single area. For the range of the other daughter species, there are two distinct scenarios. In the first, organisms in some area A comprised by range R can become a new species: R → (A, R−A), depicted in Fig. 6: D. In the second, speciation can occur in some sub-area of A, so that both species still exist in A: R → (R, A), as shown in Fig. 6: E. Like DIVA, DEC allows range inheritance of the form A → (A, A) when the ancestral range is composed of a single area, shown in Fig. 6: C. Although speciations are presumed to be allopatric under the DEC model, these latter scenarios do not explicitly differentiate between sympatric speciation and allopatric speciation occurring within subsections of the area A.

We suspect that the DEC model's transitions should be less likely than DIVA's to result in widespread ancestral ranges (Fig. 8). If two sister taxa contain non-overlapping ranges, then the ancestor of those two taxa will be likely to include all the unit ranges of the species with more unit ranges, but only one of the other. This is because the speciation event that generated the two species gave rise to one species that occupies only a single area. This will likely be inferred to be the species with the smallest number of areas in its range, because the other areas must be added by dispersal, which is a rare event. However, if there are only two unit ranges, then the DEC and DIVA transition rules are identical.

Figure 8.

Allowed transitions and widespread ancestors in a hypothetical example comprising seven species, each with a range consisting of a single, distinct unit area represented by a letter A–G. A, Dispersal–vicariance analysis (DIVA)'s reconstruction of ancestral ranges. Because vicariance events of the form R1∪ R2→ (R1, R2) are allowed by DIVA, this method will reconstruct a widespread ancestor as long as ancestral range is not restricted. B, The dispersal–extinction–cladogenesis model will reconstruct a smaller ancestral range than DIVA. Because transitions only of the form R → (R, A) or R → (R−A, A) are permitted at the internal nodes, dispersal events are required to explain the observed distribution under this approach.

Despite this feature, DEC seems to construct widespread ancestral ranges (Ree et al., 2005; see also Drummond, 2008; Santos et al., 2009). The explanation for this lies in the fact that the scenario A → (A, B) requires an extinction event, and DEC tends to underestimate the extinction rate because of ascertainment bias. Thus, extinction events are rendered even more unlikely than they truly are. The scenario A → (A, B) must have A → (A, A) occur at the root node, followed by dispersal A → AB and extinction AB → B on one branch (Fig. 9: A). In contrast, the scenario AB → A, B requires no dispersal or extinction events (Fig. 9: B). DEC tends to underestimate the extinction rate because of the way its stochastic model is posed.

Figure 9.

Under the dispersal–extinction–cladogenesis model in a two area system comprising areas A and B, fewer extinction events are needed to explain the scenario AB → (A, B) than the scenario A → (A, B). Thus, dispersal–extinction–cladogenesis still tends to reconstruct widespread ancestors. A, A → (A, B) must have A → (A, A) occur at the root node, followed by dispersal A → AB and extinction AB → B on one branch. B, AB → (A, B) requires no dispersal or extinction events. Dagger, extinction event; dot, species; open arrow, dispersal event.

In the DEC stochastic model, the range character evolves on a fixed tree according to a continuous-time Markov chain on the branches of the tree, and follows either the R → (R, A) or R → (R−A, A) transitions at internal nodes with equal probability. This process generates observed ranges at the tips of the tree. However, the Markov model on the branches may generate an empty range in which the species has gone extinct in all areas. Once this occurs, the range stays empty, and speciations in an empty range must generate two daughter species with empty ranges. As the extinction rate increases, the probability of generating an empty range for at least one tip species also increases.

Estimation of the extinction rate under the DEC model involves choosing this rate to maximize the probability of generating the observed tip ranges. As none of the observed tip ranges can be empty, this involves choosing a low value of extinction rate to increase the probability of observing non-empty tip ranges. To us, it seems that this problem might be fixed by conditioning on generating non-empty tip ranges.

3 Speciation rate should depend on range

There is clear biological motivation for accommodating the scenario in which a dispersal event is followed by an immediate and favored speciation; that is, A → AB → (A, B). When a population of organisms disperses to a new area, and when that area is unlikely to receive new migrants, gene flow is expected to be low, and speciation is expected to be swift.

For different ranges to lead to different speciation rates, speciation must be considered to be an event with a rate. If the dispersal rate was higher than the within-area speciation rate, the situation in which A → AB → (A, B) can occur as a dispersal followed by an immediate speciation would actually favor unit ancestral ranges. This offers a possible avenue toward obtaining ancestral ranges comparable in extent to those observed for extant taxa.

As noted by Ree and Sanmartín (2009), a model has been proposed by Maddison et al. (2007) that not only describes the evolution of a discrete binary character, but also generates the phylogenetic tree. In this model, the rates of speciation and extinction depend on the state of this character, and character transition rates are allowed to vary. Modifying this approach to treat multi-area ranges and scenarios where the daughter lineages do not identically inherit the ancestral range would make this a very attractive method in biogeography. To our knowledge, such a method awaits implementation.

4 Conclusions and future directions

Recent years have witnessed a proliferation of quantitative methods in biogeographic inference. Novel parametric approaches such as the DEC model and Sanmartín et al.'s (2008) Bayesian approach to island biogeography represent exciting new opportunities in the study of range evolution. Such methods enable biogeographic processes to be explicitly modeled in probabilistic terms. In particular, the improved ability to incorporate diverse sources of temporal and historical information in biogeographic analyses (see Donoghue & Moore, 2003) is a desirable feature of working in a probabilistic framework. As these approaches are extended, refined and modified, balancing statistical power with model complexity and realism will be an important concern. We greatly anticipate further innovations in model-based biogeographic inference, and look forward to the implementation of a model that treats speciation as an event with a rate.

Acknowledgments

Acknowledgements  We thank Daniel KSEPKA, Jeff THORNE, and the MEAS Writing Group for their invaluable comments and suggestions. Both authors received support from the National Institute of Environmental Health Sciences (USA) training grant to the NCSU Bioinformatics Research Center; in addition, BDR was partially supported by National Institutes of Health (USA) grant no. GM070806.

Ancillary