- Top of page
- 1 Material and methods
- 2 Results
- 3 Discussion
- 4 Conclusions
Abstract We propose a simple statistical approach for using Dispersal–Vicariance Analysis (DIVA) software to infer biogeographic histories without fully bifurcating trees. In this approach, ancestral ranges are first optimized for a sample of Bayesian trees. The probability P of an ancestral range r at a node is then calculated as where Y is a node, and F(rY) is the frequency of range r among all the optimal solutions resulting from DIVA optimization at node Y, t is one of n topologies optimized, and Pt is the probability of topology t. Node Y is a hypothesized ancestor shared by a specific crown lineage and the sister of that lineage “x”, where x may vary due to phylogenetic uncertainty (polytomies and nodes with posterior probability <100%). Using this method, the ancestral distribution at Y can be estimated to provide inference of the geographic origins of the specific crown group of interest. This approach takes into account phylogenetic uncertainty as well as uncertainty from DIVA optimization. It is an extension of the previously described method called Bayes-DIVA, which pairs Bayesian phylogenetic analysis with biogeographic analysis using DIVA. Further, we show that the probability P of an ancestral range at Y calculated using this method does not equate to pp*F(rY) on the Bayesian consensus tree when both variables are <100%, where pp is the posterior probability and F(rY) is the frequency of range r for the node containing the specific crown group. We tested our DIVA-Bayes approach using Aesculus L., which has major lineages unresolved as a polytomy. We inferred the most probable geographic origins of the five traditional sections of Aesculus and of Aesculus californica Nutt. and examined range subdivisions at parental nodes of these lineages. Additionally, we used the DIVA-Bayes data from Aesculus to quantify the effects on biogeographic inference of including two wildcard fossil taxa in phylogenetic analysis. Our analysis resolved the geographic ranges of the parental nodes of the lineages of Aesculus with moderate to high probabilities. The probabilities were greater than those estimated using the simple calculation of pp*F(ry) at a statistically significant level for two of the six lineages. We also found that adding fossil wildcard taxa in phylogenetic analysis generally increased P for ancestral ranges including the fossil's distribution area. The ΔP was more dramatic for ranges that include the area of a wildcard fossil with a distribution area underrepresented among extant taxa. This indicates the importance of including fossils in biogeographic analysis. Exmination of range subdivision at the parental nodes revealed potential range evolution (extinction and dispersal events) along the stems of A. californica and sect. Parryana.
Studies in historical biogeography based on phylogeny have accumulated rapidly due to the recent increase in availability of molecular phylogenetic data (see Xiang et al., 1998a, 2004, 2005, 2006; Wen, 1999; Sanmartín et al., 2001; Donoghue & Smith, 2004; Sanmartín & Ronquist, 2004; Soltis et al., 2006). One of the most widely used methods of inferring biogeographic histories based on phylogeny is Dispersal–Vicariance Analysis (DIVA) (Ronquist, 1997, 2001). DIVA is a method of reconstructing biogeographic history that falls under the broad heading of event-based methods, in which biogeographic processes that help drive speciation are incorporated a priori into the methodology (Ronquist, 1996, 1997; Sanmartín et al., 2001). Specifically, DIVA uses a parsimony approach that minimizes extinctions and dispersals and assumes vicariance as the null hypothesis (Ronquist, 1996). The program estimates distributions of hypothesized ancestors at internal nodes on a fully bifurcating phylogenetic tree based on the distributions of terminal taxa (Ronquist, 1996). Results of biogeographic analysis using DIVA are optimized ancestral ranges at each internal node under the parsimony criterion. Frequently, multiple equally parsimonious biogeographic pathways (MP pathways) are obtained from a given tree, and these are summarized as multiple optimal solutions at some or all internal nodes of the tree. Although new model-based likelihood and Bayesian methods of reconstructing biogeographic histories have recently been developed (Ree et al., 2005; Ree & Smith, 2008; Sanmartín et al., 2008; also Lemmon & Lemmon, 2008), a quick, advanced search using Google Scholar for 2008 published reports containing the words “biogeography” and “DIVA” illustrates that DIVA continues to be widely used in historical biogeographic studies. The primary advantage of DIVA over the likelihood method of Ree et al. (2005) is that less prior information is required (Ree et al., 2005; Ree & Smith, 2008). DIVA is also fast, simple, and user-friendly and gives results congruent to the model-based likelihood method Lagrange (http://code.google.com/p/lagrange/) for most lineages that have been compared (Ree et al., 2005; Burbrink & Lawson, 2007; Ree & Smith, 2008; Velazco & Patterson, 2008; Xiang & Thomas, 2008; Xiang et al., 2009) when analyses using DIVA included outgroups that are not widely distributed or the root range was used for area coding for outgroups at higher rank than species (see Ronquist, 1996).
Running the DIVA program requires that two parameters are defined; the phylogeny and the distributions of terminal taxa. Aside from any questions that might arise regarding the underlying assumptions implemented in the program, uncertainty in the results of DIVA arises from two areas, phylogenetic uncertainty and uncertainty in DIVA optimization. Biogeographic reconstruction using DIVA is typically carried out using a single tree topology; the author's “best” tree representing the true phylogeny (e.g., Fiz et al., 2008; Jeandroz et al., 2008; Lim, 2008). The single tree approach is a common practice in phylogenetic biogeography using many methods including Component analysis (Page, 1993a, 1993b), Bremer's ancestral area analysis (Bremer, 1992), and the model-based likelihood methods of Ree et al. (2005) and Ree & Smith (2008). Of the five reports published in the American Journal of Botany and Systematic Biology in 2008, in which a primary research goal was to reconstruct historical biogeography, five used DIVA, four used a single tree (Calviño et al., 2008; Hines, 2008; Huttunen et al., 2008; Mansion et al., 2008) and one showed that alternative resolutions of polytomies had no effect on biogeographic reconstruction (Mast et al., 2008). Using a single tree rarely accounts for the full range of possible, slightly less optimal topologies given the data. Additionally, the “best” phylogeny is not always fully resolved or strongly supported for all nodes; some clades may be weakly supported or there may be polytomies. Polytomies are particularly problematic. The backbone phylogeny used in DIVA analysis must be fully bifurcating as the program is unable to accept polytomies, but polytomies present a problem for most methods of biogeographic analysis using phylogeny, as reconstruction necessarily breaks down at these unresolved nodes. The other area of uncertainty from DIVA is the multiple, equally parsimonious biogeographic scenarios for a given phylogeny. The program does not provide any quantifiable method of selecting between the multiple possibilities. However, authors can use information from area connections and divergence times to rule out certain hypotheses or to favor one hypothesis over another, as also discussed by Ronquist (1996). Both types of uncertainty in DIVA have been recognized and handled by Nylander et al. (2008) using posterior probabilities (pp).
Nylander et al. (2008) recently showed the utility of a probabilistic approach to DIVA in reconstructing the biogeographic history of the avian genus Turdus L. Specifically, they optimized 20,000 Bayesian trees in DIVA and used the results of these optimizations to determine the marginal distributions of alternative ancestral ranges at each node of interest, dependent on the node's occurrence in the sampled topologies. Thus, alternative ancestral ranges at each node in the tree (Fig. 1a of Nylander et al., 2008) can be assumed to have a probability equal to the product of the clade pp (phylogenetic uncertainty) and the occurrence of the alternative ranges for the clade in DIVA (the uncertainty in the biogeographic reconstruction). The occurrence of each alternative range was determined as a fraction of all optimal ranges; that is, for a given tree, a node with three optimal ancestral ranges “A, B, or AB”, the occurrence of each range was recorded “A:1/3, B:1/3, AB:1/3”. This approach accounts for both uncertainty in the location of a node in the broader tree topology (i.e., phylogenetic uncertainty) and uncertainty in ancestral range reconstructions (multiple, equally parsimonious DIVA optimizations). Nylander et al. (2008) referred to this as a Bayes-DIVA analysis. Using a subset of Bayesian trees to account for uncertainty in phylogeny has been used before (e.g., Lutzoni et al., 2001; Pagel et al., 2004). In biogeography, this methodology was also suggested by Lemmon and Lemmon (2008) and was previously used by Huelsenbeck and Immenov (2002). Nylander et al. (2008) were the first to apply this approach to use with DIVA.
Here, we extend the Bayes-DIVA method to allow estimation of the geographic origin of a lineage in a polytomy. We first redefined a node as the parent node (parent node, hereafter) of a crown group node, where a crown group node (crown node, hereafter) represents the last shared common ancestor of all constituents of a crown group with an undefined sister (x) (Fig. 1). Therefore, the parent node is inherently present on every tree in the posterior distribution of phylogenetic trees in which the crown group occurs, regardless of the relationship of the crown group to other groups. Using this definition allows for estimation of the ancestral range of the stem lineage of a highly supported terminal taxon or crown group even if the lineage is resolved as a member of a polytomy in the phylogeny (Fig. 1). The probability (P) of an ancestral range r at a node of interest is calculated as
where Y is the parent node, t is one of the randomly selected Bayesian trees, n is the total number of sampled trees, F(rY)t is the occurrence of an ancestral range r at node Y for tree t, and Pt is the probability of tree t, which is the proportion of the tree in the pool of the sampled trees (which can be extended to the proportion of the tree in the pool of the entire posterior distribution of trees). F(rY) is calculated as the actual frequency of r within the pool of biogeographic pathways optimized using DIVA for each sampled tree: .
Figure 1. Graphical explanation of parent nodes, crown nodes, and unspecified sister groups. A, Hypothetical phylogeny containing well-supported crown groups marked by triangular symbols and incomplete resolution of relationships among them. Open circles indicate crown nodes of crown groups 1–4. Closed circles indicate parent nodes (node, sensu this study). Numbered parent nodes corresponding to numbered crown groups. B, Unspecified sister groups (x) for crown groups 1–4. Node numbers in closed circles correspond to those in A.
Download figure to PowerPoint
The value i is the number of times a range (r) occurs in the total number of MP pathways (Rt) over the tree. The actual frequency can be obtained by using the command “printrecs” in DIVA. An alternative estimation of F(rY) is using the method of Nylander et al. (2008) as 1/N, where N is the total number of alternative ancestral distributions at node Y. An example of this method of probability calculation and both methods of deriving F(rY) are illustrated in Fig. 2. This revised Bayes-DIVA approach can provide statistical confidence on inferred biogeographic origins of lineages of interest with unresolved or poorly supported phylogenetic placement, for which the traditional DIVA analysis or the Bayes-DIVA approach used by Nylander et al. (2008) are uninformative.
Figure 2. Example of calculation of P(rY) and of F(rY) using two methods. A, Hypothetical sample of three Bayesian trees, T1–T3. Node Y (circles) is parent node of Lineage 1. A, B, C, and D are distribution areas. Ranges of terminals are given below lineage names. Possible ranges for node Y include A, B, C, D and widespread areas including two or more of these. In B and C, only areas with F(rY) > 0 for at least one tree shown. B, Calculation of F(rY) using actual frequency of areas from dispersal–vicariance analysis output (i.e., ). C, Calculation of F(rY) assuming all optimal areas equally probable for each t (i.e., 1/N).
Download figure to PowerPoint
The parent node Y in this study is similar to the floating node described by Pagel et al. (2004) in that both Y and the floating node do not always include the same crown groups. However, the floating node must include two specific crown groups of interest, although it may contain other clades or taxa as well (Pagel et al., 2004). Y differs in that it is the parent of exactly two groups: a specific crown group of interest and its sister x, which is undefined. Another important difference is that the two clades of interest at a floating node of Pagel et al. (2004) can have any level of support, whereas the Y applies to only the nodes connecting the well-supported crown clade and its unspecified sister. Therefore, the floating node is not suitable as a substitute for Y.
Using simulated data, we tested whether the range probabilities of a parent node can be accurately inferred as the product of the pp at the node containing the crown group and a defined sister, and the frequency of occurrence of the range at that node optimized by DIVA on the Bayesian consensus tree topology, that is,
We further tested the utility of our approach using data from Aesculus L., a genus of woody trees and shrubs with a disjunct Laurasian distribution. We also illustrate two additional applications of this method. First, we estimated the impact of two fossil wildcard taxa (sensuNixon & Wheeler, 1992) on biogeographic reconstruction of Aesculus. Second, we examined range subdivisions at the parental nodes of lineages of interest and estimated the most probable ranges inherited by these lineages (referred to as post-Y range hereafter) to gain some insights into range evolution along the stem branches. The primary goals of this study are: (i) to describe an alternative method of using the Bayes-DIVA analysis under phylogentic uncertainty which can provide estimation of geographic origin for crown groups with unknown sister relationships; and (ii) to test the method and its possible applications using Aesculus L.
Aesculus (Sapindales, Sapindaceae) is a genus of 13–19 species belonging to six major lineages, which are supported by phylogenetic studies using molecular and morphological data: sect. Aesculus (2 species), sect. Macrothyrsus (1 species), sect. Parryana (1 species), sect. Pavia (4 species), an Asian clade (3–10 species), and the species Aesculus californica Nutt. (Xiang et al., 1998b; Forest et al., 2001; Harris et al., 2009). Extant Aesculus species are distributed across the Northern Hemisphere and each lineage is restricted to one of the following areas: East Asia (EA); western North America (wNA); eastern North America (eNA); and Europe (EU), except sect. Aesculus, which is disjunct in EA and EU. Aesculus has a rich fossil record from EA, EU, and wNA and with fossils found in strata ranging from the Paleocene to the Quaternary (Hu & Chaney, 1940; Condit, 1944; Puri, 1945; Szafer, 1947, 1954; Tanai, 1952; Schloemer-Jäger, 1958; Prakash & Barghoorn, 1961; Axelrod, 1966; Budantsev, 1983; de Lumley, 1988; Mai & Walther, 1988; Wehr, 1998; Golovneva, 2000; Manchester, 2001; Jeong et al., 2004; Dilhoff et al., 2005).
Aesculus is an ideal genus for biogeographic study owing to its small number of species, pan-Northern Hemisphere distribution, extensive fossil record, and the continental endemism of most lineages and all species. However, molecular phylogenetic studies of Aesculus using several DNA regions (Xiang et al., 1998b; Harris et al., 2009) have resulted in poorly supported or unresolved relationships among the six major lineages despite strong support for the polytypic lineages (i.e., crown groups). Thus, the utility of DIVA applied in the traditional way for biogeographic reconstruction of the genus is limited. In addition to deep node polytomies, biogeographic reconstruction of Aesculus presents another challenge due to uncertainties in positions of some fossil species. Recently, many authors have cited the need for inclusion of fossils in phylogenetic reconstruction and phylogeny-based biogeographic analyses (Manchester, 1999; Rothwell, 1999; Wen, 1999; Lieberman, 2003; Crane et al., 2004; Donoghue & Smith, 2004; Xiang et al., 2005, 2006, 2009; Hilton & Bateman, 2006; Rothwell & Nixon, 2006). Excluding fossils can produce a false or incomplete biogeographic history of a group (Manchester, 1999; Lieberman, 2003; Crane et al., 2004). The limitations of including fossils, for which often only incomplete morphological data and rarely ancient DNA data is available, have been discussed (Nixon & Wheeler, 1992; Kearney, 2002; Kearney & Clark, 2003; Wiens, 2003, 2006) and observed in empirical studies (e.g. Rothwell & Nixon, 2006; Harris et al., 2009; but see Manos et al., 2007). Fossil taxa for which little informative data is available may act as wildcard taxa (Nixon & Wheeler, 1992) in phylogenetic analysis. Wildcard taxa are defined as those that, due to significant missing characters, may be placed algorithmically at many or all nodes on the tree topology (Nixon & Wheeler, 1992; Kearney & Clark, 2003). Two geographically and temporally important complete leaf (leaflets attached to a petiole) fossil species of Aesculus offer few phylogenetically informative characters. These are Aesculus longipedunculus Schloemer-Jäger (Eocene, EU) and Aesculus“magnificum” (Budantsev, 1983; Manchester, 2001) (Paleocene, EA). In preliminary analyses, these fossil species behave as wildcards, limiting phylogenetic resolution for the fossils and for otherwise well-supported groups. In the example using Aesculus, we use the revised Bayes-DIVA to provide a statistical measure of shifts in ancestral range probabilities when fossils are included versus excluded.