1.1 Generalized parsimony approaches
Under a generalized parsimony approach, the reconstruction that requires the fewest changes in character state over the phylogenetic tree is preferred (Maddison & Maddison, 2003). Generalized parsimony assigns a cost for the transformation of any character state into each of the other possible character states. All costs may be equal, or, if there is reason to expect that transitions between some states might occur more frequently than changes between others, character state transitions may be differentially weighted.
Geographic ranges may be coded as multiple binary-state characters to denote the presence/absence of a taxon in each area. This allows the reconstruction of ancestral areas that are polymorphic with respect to range (Hardy & Linder, 2005; Harbaugh & Baldwin, 2007). However, geographic ranges are more often coded as discrete, multistate characters that do not allow for ranges spanning more than one area (e.g., Clark et al., 2008). Here, where the directional costs of transitions between states are weighted equally, no assumptions are made about whether change occurs along a branch connecting an ancestor to its descendant, or coincident with a speciation event at an internal node. If change is presumed to occur along a branch, dispersal to a new area followed by extinction in the original area is implied. In contrast, dispersal-mediated allopatric speciation is suggested if the change is presumed to occur at an internal node (see Fig. 1).
Figure 1. A hypothetical example of a parsimony approach to ancestral area reconstruction in a two-island system comprising areas A and B. In this example, geographic ranges are coded as discrete, multistate characters that do not allow for ranges spanning more than one area, and the directional costs of transitions between states are weighted equally. Hence, no assumptions are made about whether change occurs along a branch connecting an ancestor to its descendant, or coincident with a speciation event at an internal node. A, Change is presumed to occur along branches. This implies (i) a speciation event within the original area A and subsequent dispersal of the novel species to a new area B. This is followed by (ii) an extinction of the novel species in area A. B, In contrast, dispersal-mediated allopatric speciation is suggested if the change is presumed to occur at an internal node. Here, some individuals from area A undergo dispersal to area B, and speciation occurs shortly thereafter. Dagger, extinction event; dot, species; open arrow, dispersal event.
Download figure to PowerPoint
When range evolution scenarios are complicated, involving frequent dispersal and extinction relative to speciation and including a large set of areas distributed in a complex pattern across the tips of the tree, these parsimony approaches tend to recover a large number of equally most parsimonious reconstructions. Hence, there may be low potential for these methods to detect phylogenetic signal in datasets with these properties (Nepokroeff et al., 2003; Clark et al., 2008). However, when few geographic areas are considered, the distribution of these areas across the tips of the cladograms is not complex, and there is little reason to be concerned with underestimating the amount of change, these approaches may be of interest (e.g., Ksepka & Clarke, 2009). As indicated by Ree and Smith (2008), high rates of evolution pose a challenge not only for parsimony-based approaches, but also for all inference types.
1.2 Weighted ancestral area analysis (WAAA)
WAAA (Hausdorf, 1998) is an interesting but conceptually flawed method for reconstructing ancestral areas under parsimony. We address it here because it is not infrequently seen in published reports (e.g., Swenson et al., 2000; Outlaw et al., 2007). WAAA seeks to assign higher weight to areas occupied by “ basal” or “early-diverging” lineages, but, as we discuss below, its implementation is predicated on a problematic interpretation of tree topology.
In WAAA, the ranges of extant species and their ancestors are described in terms of a set of predefined unit areas. A worked example is presented in Fig. 2. A ratio of weighted gain steps to weighted loss steps is calculated for each area at each node on a cladogram. Areas for which the value of this ratio is high are considered more likely to be part of the ancestral area than are those for which this value is lower. Ancestral ranges might therefore comprise multiple areas. To reduce the number of areas recovered as ancestral for each node, a minimum threshold value for this ratio may be established.
Figure 2. A worked example of weighted ancestral area analysis in a two-area system comprising areas A and B, after Hausdorf (1998). A, Each area is optimized onto the tree, under the assumption that it was not part of the ancestral range. Changes are presumed to occur along branches. B, For each area, the weighted gain steps (GSW) are computed. Count the number of times n that the area is gained under the optimization from (1). For each gain i, count the number of nodes xi between that gain and the common ancestor. GSW =∑ni=11/xi. C, Each area is again optimized onto the tree, this time under the assumption that it was part of the ancestral range. D, For each area, the weighted loss steps (LSW) are computed as for (2). E, The probability index (PI) is calculated for each area. PI = GSW / LSW. In this example, the PI for area A is an order of magnitude greater than the PI for area B. This is interpreted to mean that area A is much more likely than area B to be part of the ancestral range. We emphasize that this approach relies on a problematic interpretation of tree structure.
Download figure to PowerPoint
This approach attempts to indicate the relative chances of different areas belonging to the ancestral range of some monophyletic group, while also yielding ancestral ranges that are about the same size as those observed for extant taxa. It relies on the assumption that an area is more likely to be part of the ancestral range when that area is observed to be occupied more frequently by the descendants of that ancestor, and that “basal” branches of the tree are more informative about ancestral ranges (Hausdorf, 1998).
However, it appears that the latter statement contains a misperception regarding tree structure, predicated on a problematic understanding of what it means to be “basal.” We attempt to clarify this point in Figs. 3 and 4. When a lineage has fewer extant taxa than its sister group, it is not correct to consider the species-poor lineage to be “basal” relative to its species-rich sister, because both groups arise simultaneously from their most recent common ancestor, as depicted in Fig. 4 (Crisp & Cook, 2005). Noting that branch length information is not considered in WAAA, it seems that this method relies on the troubling assumption that a trait found in the species-poor sister clade is more likely to represent a primitive condition present in the common ancestor of both groups (Santos, 2007). In fact, Hausdorf concludes his 1998 paper by stating, “If this assumption is wrong, then the results of the weighted ancestral area analysis will be wrong too.”
Figure 3. A,“Basal” and “derived” as directional terms on a rooted tree. Relative to a reference node (here denoted by a star), if an internal node can be reached by traveling along a one-directional path toward the tips of the tree, then that node is derived. If an internal node can be reached by traveling along a one-directional path toward the root of the tree, then it is basal. If an internal node cannot be reached by a one-directional path, then it can be considered neither basal nor derived relative to the reference node. B, Here, we attempt to clarify what is meant when the term “basal clade” is used. Clade A contains species A1 and A2, depicted in dark grey. Clade B contains species B1–B6, depicted in light grey. The basal divergence between clades A and B is represented by a dark grey dot, and clade A is the less speciose of the two clades subtended by the basal divergence. If A is referred to as the “basal clade,” it should be emphasized that a species in clade A is no more likely to represent a primitive condition present in the common ancestor of clades A and B than is a species belonging to clade B.
Download figure to PowerPoint
Figure 4. It can be misleading to describe a species-poor clade as “basal” relative to its species-rich sister, because tree balance need not remain constant through time. A, A hypothetical evolutionary tree comprising clade A, depicted in dark grey, and clade B, depicted in light grey. Here, the letters A and B denote members of the respective lineages A and B. B, At time t0, clades A and B originate at a speciation event. Note that both A and B are the same age. C, At time t1, there are the same number of species in clade A as there are in clade B. When only the species that are extant at time t1 are depicted, the tree is balanced; that is, there are the same number of species on each side of every node. D, At time t2, clade A contains one species and clade B contains four species. When only the species that are extant at time t2 are depicted, the tree is not balanced. This scenario should not be interpreted to imply a progression from “basal” on the left to “derived” on the right. E, At time t3, clade A contains four species and clade B contains one species. Compared to time t2, the tree balance is now reversed.
Download figure to PowerPoint
1.3 Dispersal–vicariance analysis (DIVA)
DIVA (Ronquist, 1996, 1997) seeks to model processes of range evolution in a parsimony framework. The ranges of extant species and their ancestors are described in terms of a set of predefined unit areas. Each range might comprise multiple areas, and ancestral ranges are estimated by minimizing the tree length under the specified cost matrices.
In DIVA, dispersal events (Fig. 5: A) incur a cost of one for each area added to a range, and extinction events (Fig. 5: B) cost one for each area deleted from a range. Speciation is assumed to occur in one of two ways. In the first, vicariance may separate a wide ancestral range into exactly two mutually exclusive sets of areas, each of which is inherited by one of the two daughter taxa, as depicted in Fig. 5: C. In the second way, if an ancestral range is restricted to a single area, speciation within the area may give rise to exactly two daughter taxa sharing that same area, shown in Fig. 5: D. (Note that this latter scenario does not explicitly differentiate between sympatric speciation and allopatric speciation occurring within subsections of the area.) In both cases, speciation events cost nothing. DIVA does not include a mechanism by which two daughter species may identically inherit an ancestral range comprising multiple unit areas; instead, secondary dispersal events are required to explain this scenario, as shown in Fig. 5: E.
Figure 5. The rules by which dispersal–vicariance analysis reconstructs ancestral distributions. A, Dispersal costs one per area added to a distribution. B, Extinction costs one per area deleted from a distribution. C, Where speciation occurs by vicariance separating a widespread ancestor into two mutually exclusive sets of areas, a cost of zero is incurred. D, A species occurring within a single area might speciate within that area, giving rise to two descendants occupying the same area. This event also has a cost of zero. E, When an ancestral species has a range comprising more than one unit area, and each of the two descendant species has the same distribution as the ancestor, the cost is equivalent to the number of secondary dispersals needed for two initially allopatric descendants to come to occupy the same set of unit areas as the ancestor. In this example, a cost of two is incurred. Letters A–D refer to areas. Dagger, extinction event; dot, species; open arrow, dispersal event.
Download figure to PowerPoint
Because the optimal ancestral ranges are those that minimize the number of implied dispersal and extinction events, DIVA is biased against early dispersal and will tend to reconstruct wide ancestral ranges (Ronquist, 1996; Ree et al., 2005). In the manual for DIVA 1.1 (Ronquist, 1996), it is suggested that users who wish to obtain ancestral range estimates similar in size to the ranges observed for extant species should include additional outgroups in the analysis, so that the ancestral node of interest is no longer the root or as close to the root. Alternatively, it is possible to restrict the maximum number of areas allowed for the ancestral range. This latter approach is more often seen in practice; however, restricting the maximum number of areas allowed for the ancestral range could lead to unrealistic discontinuous geographic distributions for basal nodes (Clark et al., 2008; Santos et al., 2009) or a large number of equally parsimonious reconstructions that cannot be meaningfully interpreted (Santos et al., 2009).
1.4 Dispersal–extinction–cladogenesis (DEC) model
The DEC model (Ree et al., 2005; Ree & Smith, 2008) is a continuous-time model for geographic range evolution that enables the inference of ancestral ranges in a likelihood framework. Ranges are described in terms of a set of predefined unit areas, and each range might include multiple areas. Range expansion and contraction events are caused by dispersal into a previously unoccupied area and local extinction within an area, shown in Fig. 6: A and Fig. 6: B, respectively. These are treated as stochastic processes with exponential rate parameters. The expected number of each kind of event along a phylogenetic branch is proportional to branch length.
Figure 6. The area transitions allowed by the dispersal–extinction–cladogenesis model when reconstructing ancestral distributions. A, Dispersal from an occupied area A into a previously unoccupied area B. B, Extinction in previously occupied area B. C, A species occurring within a single area might speciate within that area, giving rise to two descendants occupying the same area. D, Where the range of ancestral species comprises multiple areas A and B, one of the daughter species may inherit a single area A, and the other inherits the remainder of the ancestral range B. E, Alternatively, where the range of ancestral species comprises multiple areas A and B, one daughter species may inherit a single area A, while the other daughter species identically inherits the ancestral range AB. F, Secondary dispersals are required to explain the situation where an ancestral species has a range comprising more than one unit area and each of the two descendant species has the same distribution as the ancestor. G, Similarly, a widespread ancestral range comprising more than three unit areas cannot be separated into two mutually exclusive sets of multiple areas in a single step. Letters A–D refer to areas. Dagger, extinction event; dot, species; open arrow, dispersal event.
Download figure to PowerPoint
If an ancestral range is limited to a single area, then that area is inherited identically by both daughter lineages, as depicted in Fig. 6: C. If the ancestral range comprises multiple areas, then one of two scenarios is permitted. In the first, shown in Fig. 6: D, one daughter lineage inherits a single area and the other daughter lineage inherits the remainder of the ancestral range. In the second scenario, shown in Fig. 6: E, one daughter lineage inherits a single area while the other daughter lineage inherits the entire ancestral range. These scenarios of range subdivision and inheritance are inferred for the internal nodes of the phylogenetic tree.
Unlike DIVA, DEC does not include a mechanism for the vicariance event in which a species’ range is subdivided into two daughter ranges each comprising multiple unit areas. Under the DEC model, secondary dispersal and extinction events are required to explain such a distribution (Fig. 6: G). The authors suggest that inferring such an event invokes a particular geographic history without consideration of the spatial and temporal context in which the event occurred (Ree et al., 2005). They argue that speciation events involving only single areas are less likely to involve this kind of ad hoc hypothesis. Likewise, DEC does not include a mechanism by which two daughter species might identically inherit an ancestral range comprising multiple unit areas. Secondary dispersal is required to explain this scenario, as shown in Fig. 6: F.
DEC does not include speciation rate as a free parameter. Moreover, dispersal rate does not affect the geographic pattern of divergence within versus between areas. Hence, it is not possible to directly infer instances of dispersal-mediated allopatry, where divergence between areas is expected to follow dispersal, under the DEC model (Ree & Smith, 2008).
Given a phylogeny, a set of observed ranges for the terminal taxa and a DEC model, dispersal and extinction rates are optimized using maximum likelihood, integrating over all possible ancestral states for range inheritance. These rates are then fixed, and the likelihood of the data is iteratively recalculated for each ancestral state at each internal node. This does not condition on assumptions regarding ancestral states elsewhere in the tree. Alternative scenarios for ancestral range inheritance can thereby be ranked by their contributions to the overall likelihood.
It has been suggested that DEC could be thought of as a parametric, extended version of DIVA (Ree & Sanmartín, 2009). Like DIVA, the DEC model allows both dispersal and extinction. However, the costs for rare events like extinction and dispersal have been replaced with rate parameters that may be estimated from the data; a high rate for an event corresponds to a low cost. In the DEC probability model, all unit areas have the same extinction rate, and extinction in each area is independent of events in other areas. This approach is comparable to that taken by DIVA, in which a fixed cost is incurred for each extinction event by which an area is removed from a range, no matter how many areas are occupied. However, in DIVA the cost of extinction is always one, whereas the extinction rate that corresponds to this in the DEC model can vary.
Similarly, the DEC model assumes that the rate of dispersal from a particular unit area in a range is identical for all unit areas, and is independent of dispersal and extinction events in other occupied areas. When a dispersal event occurs, this dispersal event is assigned uniformly at random to one of the available dispersal routes between areas, and is then chosen to succeed with some probability that depends on both the specific route and on time. The chance of selecting a particular connection decreases when more connections are added, but the sum of the rates of dispersal across connections remains constant. This has a somewhat peculiar side-effect; namely, the addition of dispersal routes with zero probability of success can lower the rate of successful dispersal by competing with other dispersal routes that are more likely to lead to success.
For reasons explored in a later section of this paper, DEC tends to reconstruct wide ancestral ranges. Like DIVA, DEC allows ranges that are much larger than those for extant species to be removed from consideration. Additionally, it permits the exclusion of implausible distributions, such as those that are geographically discontinuous or inconsistent with the known history of a region.
Moreover, scale parameters that affect overall dispersal rate may be introduced to limit dispersal between areas. This is intended to accommodate cases where specific biological scenarios motivate particular dispersal regimes. For example, scaling the rate to zero between non-adjacent areas only allows dispersal to occur between areas that are adjacent to one another, and scaling the rate inversely to distance favors short-range dispersal over long-range dispersal (Ree & Sanmartín, 2009). Separate scaling matrices can be introduced for discrete time periods, allowing different expectations for dispersal opportunity through time to be considered.
We note that the DEC model could be extended in various other ways. For instance, the extinction rate could be allowed to vary for each area, so that extinction would be more likely in areas encompassing less favorable environments. However, the probabilistic model used in DEC is intentionally kept simple to reduce the number of parameters, as there is often not enough data available to estimate a larger number of parameters.
1.5 Borrowing models from biological sequence evolution
Likelihood models used to study the evolution of biological sequences have been co-opted for the reconstruction of ancestral geographic ranges (e.g., Nepokroeff et al., 2003; Pereira et al., 2007). Such approaches incorporate a number of assumptions that are germane to problems of sequence evolution, and as we discuss below, these methods can be applied to a subset of biogeographic scenarios for which these assumptions are met. In many situations, however, geographic range evolution is not analogous to the evolution of sequences across phylogeny. In such cases, models borrowed from the study of sequence evolution are not adequate, and meaningful interpretation of results could be difficult to achieve.
Under likelihood approaches, range evolution can be described using a continuous time Markov model of evolutionary change. Areas are taken to be discrete character states, and ranges are restricted to each comprise a single area. A lineage may either switch from one area to another or remain in the same area, according to a certain probability distribution. The probability of changing from one area to another is assumed to depend only on the area currently occupied.
Likelihood approaches can incorporate branch length information and consider the rate of change along each branch. Therefore, they are expected to function more realistically than parsimony methods in situations where change is frequent relative to speciation. Transition rates from each geographical area to every other area can be estimated through maximum likelihood (e.g., Nepokroeff et al., 2003). Given a Markov model with these transition rates specified, a phylogeny with information about branch lengths, and the observed ranges for extant species, the relative probability of each ancestral range can be calculated.
Transition rate estimates come with some associated error, but in order for the relative probabilities for each ancestral state to be valid, it must be assumed that the transition rates are exactly known. Similarly, the phylogeny on which ancestral areas are inferred reflects a consensus for a set of plausible alternative trees, as the true tree cannot be exactly known. As formulated above, these likelihood approaches ignore both phylogenetic uncertainty and uncertainty in transition rate estimates (Ronquist, 2004).
1.6 Stochastic mapping
Bayesian stochastic mapping (Jensen & Pedersen, 2000; Nielsen, 2002; Huelsenbeck et al., 2003) estimates the probability of character state transformations by accommodating uncertainty in the rate of evolution and in phylogenetic relationships. In Bayesian analysis, transition rate is allowed to vary over a range of possible values defined by a prior probability, and a posterior probability proportional to the prior probability multiplied by the likelihood is obtained. When the probabilities of ancestral states are determined, each value for the transition rate is weighted according to its posterior probability. This approach accounts for uncertainty in transition rate estimates.
Likewise, phylogenetic uncertainty can be taken into account by considering a set of plausible alternative trees rather than a single consensus. These trees can be generated by sampling the posterior of a Bayesian phylogenetic analysis using Markov chain Monte Carlo. Instead of obtaining a single transition rate estimate for each tree in this set, each tree would be sampled for all possible values for the transition rate, with each combination of tree and rate being sampled proportionate to its posterior probability. After obtaining a sample of joint probabilities, the marginal distribution for the transition rate can be calculated. Thus, the Bayesian marginal distribution of transition rate accounts for both phylogenetic uncertainty and uncertainty in transition rate estimates.
In all implementations of which we are aware, stochastic mapping of range evolution presupposes that ancestral ranges are restricted to a single area. Moreover, it is assumed that extinctions or vicariance events have not subdivided ancestral ranges (e.g., McGuire et al., 2007; Dacosta & Klicka, 2008). This restriction makes sense in the context of biological sequence evolution, where, for example, it is not meaningful to think of a nucleotide residue occupying states A and T at the same time. However, in studies of range evolution, terminal taxa might have ranges comprising multiple geographic areas. Some authors seek to circumvent this limitation by defining additional composite area states (e.g., Pereira et al., 2007), but this does not correspond to a biologically motivated scenario. Such workarounds highlight the limitation of implementations that have been co-opted from other disciplines and directly applied to the study of range evolution. Precluding the reconstruction of ranges comprising multiple areas is undesirable when attempting to infer ancestral ranges that are expected to comprise multiple areas. However, this limitation could be less problematic if one is interested in the general dispersal patterns of multiple clades among disjunct areas, such as island systems, watershed drainages, or mountain peaks separated by intervening valleys. These types of areas are effectively isolated from one another by barriers to gene flow. Even though dispersal events are modeled along phylogenetic branches and not at nodes, they could be argued to roughly correspond to speciation events, because allopatric speciation is expected to rapidly follow dispersal in such a regime.
In a Bayesian approach to island biogeography proposed by Sanmartín et al. (2008), the transition rates between unit ranges are allowed to differ, and are estimated from observed data. In this model, lineages undergo transitions between unit ranges determined by parameters for dispersal rate and equilibrium frequencies of species diversity. This method uses data for multiple groups of species to infer shared patterns of movement and distribution of species in an island system. For each group of species, a DNA sequence alignment and set of ranges are required. Separate molecular parameters and phylogenies are estimated for each group of species. Thus, each dataset is allowed to evolve on its own topology and set of branch lengths; biogeographic parameters are estimated across all groups together. A Markov chain Monte Carlo approach is used to sample biogeographic parameters, molecular parameters, branch length, and tree topology simultaneously, obtaining estimates of their posterior distribution given the data. Thus, it is possible to obtain marginal probabilities of the biogeographic parameters that do not condition on any particular phylogeny or set of branch lengths.
Because rate of molecular evolution is unlikely to be the same for all groups of species, branch lengths are scaled according to group-specific molecular clocks. Additionally, scaling parameters that account for differences in dispersal rate across groups are used to obtain the expected number of dispersal events per unit time. We note that the transition rates in this model explicitly account for dispersal and subsequent survival in the new area, but do not take into account the fact that range transitions may require an extinction event in the parent range, as described in Fig. 7.