Geological and ecological factors drive cryptic speciation of yews in a biodiversity hotspot


  • Jie Liu,

    1. Key Laboratory of Biodiversity and Biogeography, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
    2. Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
    3. University of Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author
  • Michael Möller,

    1. Key Laboratory of Biodiversity and Biogeography, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
    2. Royal Botanic Garden Edinburgh, Edinburgh, UK
    Search for more papers by this author
  • Jim Provan,

    1. School of Biological Sciences, Queen's University Belfast, Belfast, UK
    Search for more papers by this author
  • Lian-Ming Gao,

    Corresponding author
    1. Key Laboratory of Biodiversity and Biogeography, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
    Search for more papers by this author
  • Ram Chandra Poudel,

    1. Key Laboratory of Biodiversity and Biogeography, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
    2. Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
    3. University of Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author
  • De-Zhu Li

    Corresponding author
    1. Key Laboratory of Biodiversity and Biogeography, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
    2. Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
    Search for more papers by this author


  • The interplay of orographic uplift and climatic changes in the Himalaya-Hengduan Mountains region (HHM) have had a key role in speciation and population demography. To gain further insight into these processes, we investigated their effects on Taxus wallichiana by combining molecular phylogeography and species distribution modeling.
  • Molecular data were obtained from 43 populations of T. wallichiana. Nineteen climatic variables were analyzed alongside genetic discontinuities. Species distribution modeling was carried out to predict potential past distribution ranges.
  • Two distinct lineages were identified, which diverged c. 4.2 (2.0–6.5) million years ago (Ma), a timescale that corresponds well with the recent uplift of the Qinghai-Tibet Plateau and subsequent climatic changes of the region. Correlations with climatic variables also suggest that ecological factors may have further reinforced the separation of the two lineages. Both lineages experienced population expansion during the last glaciation.
  • The high genetic divergence, long-term isolation and ecological differentiation suggest a scenario of cryptic speciation in T. wallichiana associated with geological and climatic changes in the HHM. Our findings also challenge the notion of general population ‘contraction’ during the last glaciation in the HHM.


The origin and evolution of biodiversity is strongly linked with many historical and ecological factors. These can include geological and/or climatic processes such as continental drift, the uplift of mountain chains, and climatic oscillations associated with ice ages. The interaction of these processes can create new ecological niches and thus provide opportunities for speciation. Indeed, divergent selection and adaptation associated with different habitats or ecological niches are increasingly viewed as a major cause of speciation in plants (Rieseberg & Burke, 2001; Funk et al., 2006; Harmon et al., 2008). A range of ecological factors have been proposed in driving speciation, including climatic variables such as precipitation and temperature (see Givnish, 2010; Keller & Seehausen, 2012; Wagner et al., 2012), and differential climatic adaptation has been implicated in cases of incipient or completed speciation events (Nosil et al., 2005; Lowry et al., 2008).

The Himalaya-Hengduan Mountains (HHM) region is considered a key biodiversity hotspot (Myers et al., 2000). The region was formed by the collision of the Indian continent with the Eurasian plate 40–50 million yr ago (Ma) (Yin & Harrison, 2000), and the consequent rise of the Himalaya and the Qinghai-Tibet Plateau (QTP). Despite some controversy around the exact timing of the formation of the QTP (e.g. Spicer et al., 2003), episodes of uplifts continued throughout the late Pliocene (c. 3 Ma) and well into the Quaternary (c. 2.6 Ma; Li & Fang, 1999). The rise of the QTP modified global and East Asian climate dramatically (Ruddiman & Kutzbach, 1989; Shi et al., 1998), triggering and intensifying the Asian monsoon, which in turn profoundly influenced biological processes in the QTP (Li & Fang, 1999). Present atmospheric circulation patterns over the QTP and surrounding areas are characterized by the Indian and East Asian monsoon in summer and the Westerlies in winter (An et al., 2001; Yao et al., 2012b).

The HHM region, which extends along the southern and southeastern fringe of the QTP (Fig. 1), is a region of extreme elevational changes within relatively short distances, with a corresponding diversity of habitats. In these areas, the wet summer monsoons are blocked by North to South oriented mountain ranges, such as the Gaoligong Mountains and Mekong-Salween Divide, resulting in the formation of a longitudinal moisture gradient (Fig. 1). This area shows a profound ecological heterogeneity and species diversity, which has been linked to the geological and climatic effects associated with the uplift of the QTP. More recently, climatic fluctuations during the Quaternary (2.6 Ma – present) might have further driven allopatric divergence and speciation, as populations contracted into glacial refugia during the four major periods of glaciations in the area (e.g. Wang et al., 2010; Xu et al., 2010; Jia et al., 2012). Such climatic oscillations had profound effects on population demography, causing range shifts or extinctions, as well as possibly driving local adaptation (Davis & Shaw, 2001; Hewitt, 2004).

Figure 1.

Geographic locations of the 43 Taxus wallichiana populations analyzed in the present study and the distribution of the 29 cpDNA haplotypes detected (see Table 1 for population codes; boxed haplotypes represent those found in the HM lineage). Pie chart size corresponds to the sample size of each population. Present atmospheric circulation patterns over the Qinghai-Tibet Plateau (QTP) and surroundings were shown in the lower-right corner (Adapted from An et al., 2001 and Yao et al., 2012a).

With such a complex geological, climatic and ecological diversity, the HHM regions are ideal for studying the effects of different factors on species diversification and evolution. Previous phylogenetic and biogeographical studies in the QTP and neighboring regions have focused on species-level diversification resulting from the uplift of the QTP (e.g. Liu et al., 2002, 2006; Wang et al., 2009ab). Phylogeographical analyses at intraspecific levels have suggested that the divergence and demography of populations were also profoundly affected by the rise of the QTP and the Quaternary climatic oscillations in this area (e.g. Zhang et al., 2005; Yang et al., 2008, 2012; Wang et al., 2010; Xu et al., 2010; Jia et al., 2011, 2012; Li et al., 2011), with deep intraspecific divergences formed over a range of timescales (e.g. Jia et al., 2012; Li et al., 2011; Wang et al., 2009a). Among these studies, however, very few (Cun & Wang, 2010; Opgenoorth et al., 2010; Xu et al., 2010) have focused specifically on the HHM region because of its remoteness and inaccessibility, and, consequently, the mechanisms of the interplay between geological and climatic events in driving speciation and evolution remain understudied.

Taxus wallichiana Zucc. var. wallichiana, as defined in the Flora of China (Fu et al., 1999), was recently raised to species level (Farjon, 2010). It included two genetically closely related lineages, designated T. wallichiana proper and the Hengduan type (Liu et al., 2011b). The species is confined to the East Himalaya and the Yunnan plateau region (Gao et al., 2007; Poudel et al., 2012; Fig. 1), where it grows as scattered understory trees in subtropical forest between 2000 and 3500 m altitude. Thus, this species is ideal to study the interplay of the geological and climatic factors on population divergence and demographic history in the HHM region.

While most previous studies in the QTP and/or HHM region used genetic discontinuities to demonstrate incipient or cryptic speciation events, they were explained as a result of geological and presumed associated habitat and climatic variations, but ecological data were not explicitly examined to support these processes. Here, we use an integrative approach by combining genetic approaches with climatic and paleoenviromental data to elucidate the speciation and demographic history of T. wallichiana, and determine the role of a range of climatic variables in this scenario. The specific objectives were: to quantify the genetic differences between the two lineages of T. wallichiana; to date the divergence between the two lineages and relate their age to geological and climatic events; to determine in detail the main ecological (climatic) factors separating the two lineages; and to unravel the demographic history of T. wallichiana. The outcome of these analyses will contribute to a better understanding of how environmental factors interacted with the orogeny of the QTP, contributing to the present-day abundance of species diversity in the HHM region.

Materials and Methods

Sample collection and DNA isolation

A total of 815 individuals were sampled from 43 populations of Taxus wallichiana Zucc. for this study, with 16 populations originating from the South Hengduan Mountains region (HM group) and 27 from the East Himalaya to the Yunnan Plateau region (EH group) (Fig. 1; Supporting Information Table S1). Eight to 23 individuals were collected per population (average 19 plants per population). Young and healthy leaves were collected and dried immediately in silica gel for DNA extraction. Voucher specimens of each sampled individual are deposited at the Herbarium of the Kunming Institute of Botany, Chinese Academy of Sciences (KUN). Taxus brevifolia was used as an outgroup for the phylogenetic analyses of the cpDNA haplotypes, based on a previous study (Hao et al., 2008).

Total genomic DNA was isolated using a modified CTAB method as described by Liu & Gao (2011). The quality and quantity of DNA was determined on 1% TAE agarose gels and with a NanoDrop® ND-1000 spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA). The DNA was diluted to a final concentration of 30–50 ng μl−1.

Chloroplast DNA sequencing

The plastid trnL (UAA)-trnF (GAA) region (trnL-trnF) was amplified for all accessions with primers ‘c’ and ‘f’ described in Taberlet et al. (1991). The 25 μl PCR mix contained 2.5 μl 10× PCR buffer, 2.5 μl MgCl2 (25 mM), 2.0 μl dNTP mixture (2.5 mM), 0.3 μl each primer (10 μM), 0.2 μl Taq polymerase (5 U μl−1) (TaKaRa, Dalian, China), 1 μl template DNA (c. 30–50 ng genomic DNA), and finally 16.2 μl distilled deionized water. PCRs were performed on a GeneAmp PCR System 9700 thermal cycler (Perkin Elmer, Foster City, CA, USA) with the following profile: initial denaturation at 94°C for 3 min, followed by 30 cycles of 95°C for 1 min, 50°C for 1.5 min, 72°C for 2 min, and a final step of 72°C for 10 min.

PCR products were visualized on 1% TAE agarose gels and then purified using a Sangon Purification Kit (Sangon, Shanghai, China). The purified products were used for bi-directional sequencing using the PCR primers and the PRISM Dye Terminator Cycle Sequencing Ready Reaction Kit (Applied Biosystems, Foster City, CA, USA). The products were run on an ABI 3730 xl DNA Sequencer. All newly acquired sequences have been submitted to GenBank (accession numbers in Table S2; Tbrevifolia, JQ406921).

Microsatellite genotyping

The nuclear genetic diversity of all 815 accessions was assessed using the following nine microsatellite loci: TA116, TG111, TG06, TG34, TG47, TC04, Tax86, TW01, TS03 (Liu et al., 2011a). Forward primers were 5′-end fluorescently labelled using either FAM or TAMRA (Applied Biosystems). PCR was carried out on a GeneAmp PCR System 9700 thermal cycler (Perkin Elmer) following the protocols described in Liu et al. (2011a). PCR products were separated on an ABI 3730 xl DNA Sequencer and individuals genotyped with GENEMAPPER v3.2 (Applied Biosystems).

Chloroplast DNA analysis

The chromatograms of each trnL-trnF sequence were assembled with SEQMAN (DNAStar Inc., Madison, WI, USA). Consensus sequences were aligned in CLUSTAL X v2.0 (Larkin et al., 2007) and subsequently manually adjusted where necessary. All sequences were assigned to different haplotypes using DNASP v5.10 (Librado & Rozas, 2009). All indels were treated as single mutation events. A median-joining haplotype network was constructed using the program NETWORK v4.5.1 (available at with MP criterion.

The spatial genetic structuring of the chloroplast haplotypes was analyzed using a spatial analysis of molecular variance analysis (SAMOVA v1.0; Dupanloup et al., 2002). Molecular diversity indices, including the number of haplotypes, haplotype diversity (Hd) and nucleotide diversity (π), were estimated using ARLEQUIN v3.5.1 (Excoffier & Lischer, 2010).

The existence of phylogeographic structure was tested following Pons & Petit (1996). Two estimates of population diversity (HS, HT) and two of differentiation (GST, NST) were obtained using PERMUT v1.0 ( with 1000 permutations. An analysis of molecular variance (AMOVA) was performed to partition variation within and among defined groups and populations using ARLEQUIN.

Neutrality tests with Tajima's D and Fu's FS were implemented in ARLEQUIN to test for deviations from neutrality. In an attempt to further infer demographic processes, pairwise mismatch distributions were carried out using ARLEQUIN to infer the demographic history of T. wallichiana at the lineage level. The sum of squared deviation (SSD) and Harpending′s raggedness index (HRI) between observed and expected mismatch distributions were used as test statistics. Where the null hypothesis was not rejected, the formula t = τ/2u was used to estimate the age of expansion (t), where = μkg, and μ the mutation rate of the sequence in substitutions per site per year (s/s/y), k the number of nucleotides and g the generation time in years. Because no evolutionary rates for gymnosperms are available for the trnL-trnF region, we obtained the average rate of trnL-trnF for Taxaceae (see next paragraph) using BEAST. For the generation time g, a value of 25 yr was taken according to Wang et al. (2006).

trnL-trnF mutation rate and lineage divergence time

In order to estimate the trnL-trnF mutation rate of Taxaceae, we first estimated the divergence time of Taxaceae and related genera. Sequences of two cpDNA regions (rbcL and matK) were retrieved from GenBank or newly generated as described in Liu et al. (2011b), for 66 samples representing 39 genera of extant gymnosperms (dataset I: Table S3). The fossil record of modern Taxaceae is generally scarce (Hao et al., 2008). Thus, only one fossil Marskea jurassica (Harris, 1976) was chosen as the earliest known fossil for Taxaceae s.s., designated ‘K’ with an age of 165 Ma (Florin, 1958) (Table S4; Fig. S2). In addition, ten well-identified fossils from other gymnosperms were used to calibrate the tree following Mao et al. (2012) (Table S4). We used the age estimate for Taxaceae and related genera from dataset I, and dataset II (which consisted of trnL-trnF sequences of 39 species of Taxaceae plus four outgroups, Cunninghamia lanceolata var. konishii, Cupressus gigantea, Juniperus taxifolia, Taiwania cryptomerioides, retrieved from GenBank, Table S3), to estimate the trnL-trnF mutation rate for Taxaceae. With the trnL-trnF mutation rate, we estimated the divergence time of the two lineages of T. wallichiana, using dataset III (comprising the 29 haplotypes of T. wallichiana and the outgroup T. brevifolia).

For all three datasets, phylogenetic analyses were carried out using BEAST v1.6.2 (Drummond & Rambaut, 2007) with a GTR + G + I substitution model, selected by MODELTEST v3.6 (Posada & Crandall, 1998), and an uncorrelated lognormal relaxed clock (Drummond et al., 2002). A Speciation: birth to death process was specified as tree prior. The uniform distribution was chosen for dataset I, and normal distribution was selected for datasets II and III, respectively, as suggested in the BEAST manual. Each BMCMC was run for 70 million generations, sampling every 7000th generation, and discarding the first 10% as burn-in. All these analysis were carried out on the Oslo Bioportal ( Convergence was checked using TRACER v1.5 (Available from Trees were visualized with TREEANOTATOR v1.6.2 in BEAST.

Microsatellite data analysis

For all nine loci, MICROCHECKER v2.2.3 (Van Oosterhout et al., 2004) was used to detect the presence of null alleles and genotyping errors such as large allele dropout or stuttering using 1000 randomizations. Tests for departures from Hardy–Weinberg equilibrium (HWE) and linkage disequilibrium (LD) were performed in each population as well as a globally unified population using FSTAT v2.9.3 (Goudet, 2001). Significance levels were adjusted using the sequential Bonferroni correction for multiple comparisons (Rice, 1989).

Genetic diversity indices (total number of alleles, NA; observed heterozygosity, HO; expected heterozygosity, HE) were calculated with GENALEX v6.4 (Peakall & Smouse, 2006) for each population, locus and group. FSTAT was used to estimate allelic richness (AR) at population, locus and group levels.

We estimated global FST (Weir & Cockerham, 1984) and population-pairwise levels of FST using ARLEQUIN. In addition, population differentiation FST was also estimated for each locus across all populations using FSTAT. The standardized genetic differentiation (G′ST; Hedrick, 2005) was calculated with SMOGD (Crawford, 2010). G′ST is a more suitable measure of differentiation than traditional measures (e.g. FST and GST) for highly polymorphic markers such as microsatellite (Hedrick, 2005; Crawford, 2010). Partitioning of total genetic variation within and among populations was further analyzed by AMOVA with 1000 permutations using ARLEQUIN.

Starting from individual genotypes, the number of genetic clusters was inferred using the program STRUCTURE v2.3 (Pritchard et al., 2000). We applied the admixture model and the option of correlated allele frequencies between populations as recommended by Falush et al. (2003). The simulation was run with the number of clusters (K) from one to twenty, and was repeated twenty times for each K. Each run comprised of a burn-in period of 105 iterations, followed by 105 Markov chain Monte Carlo (MCMC) steps. The most likely value for the number of populations (K) was determined by log-likelihood value, and the second-order rate of change in values of K (ΔK) method suggested by Evanno et al. (2005).

BARRIER v2.2 (Manni et al., 2004) was used to compute the Monmonier's maximum-difference algorithm for identifying biogeographic boundaries or areas exhibiting the largest genetic discontinuities between population pairs. The robustness of these boundaries was assessed by running BARRIER on 1000 Nei's distance matrices, produced by bootstrapping over loci with MICROSATELLITE ANALYSER (MSA) v4.05 (Dieringer & Schlötterer, 2003).

Climatic data analysis

Based on our population geo-referenced locality data (Table S1), we used recent (c. 1950–2000) data for 19 BIOCLIM variables (Hijmans et al., 2005), which summarized temperature and precipitation variables (Table S5), to identify climatic factors potentially associated with the genetic structure and divergence between the EH and HM populations. Because the landscape of the HHM region is topographically heterogeneous, for each set of coordinates we extracted monthly temperature and precipitation values from the high-resolution global gridded climate layer data with a grid size of 30″, which corresponds to 1 km2 at the equator. We used DIVA GIS (Hijmans et al., 2001) to extract the climatic variables and calculated the mean values and standard errors with MICROSOFT EXCEL for each population and a two-tailed t-test on the EH and HM groups separately. We further conducted paired two-tailed t-tests on pairs of populations (EH and HM) along the westerly and southerly gene flow barrier between EH and HM populations in order to characterize the strength of ecological factors in the barrier. The pairs were arranged by shortest distance across the barrier or by matching altitudes where possible to eliminate the effects of altitude on temperature per se (Table S6).

Species distribution modeling (SDM)

SDM was carried out to determine suitable present and past climate envelopes for the HM and EH lineages. Models were generated for the Last Glacial Maximum (LGM; c. 21 ka), based on the Community Climate System Model (CCSM), and the Last Interglacial (LIG; c. 120 ka). Species occurrence data were obtained from the Global Biodiversity Information Facility (GBIF; and personal observations, totaling 67 and 31 occurrences for the EH and HM lineages, respectively. Niche models based on the 19 BIOCLIM variables in the WorldClim dataset (Hijmans et al., 2005) were generated at 2.5 min resolution using the MAXENT v3.2 software package (Phillips et al., 2006) with the default parameters for convergence threshold (10−5) and number of iterations (500). Duplicate records from the same locality were removed to reduce the effects of spatial autocorrelation.


Chloroplast DNA sequencing data

The lengths of the cpDNA trnL-trnF sequences ranged from 812 to 912 bp and the aligned matrix was 914 bp in length. The alignment contained 20 polymorphic sites and 8 indels (Table S2). A total of 29 chloroplast haplotypes was identified among the 815 accessions (Fig. 1; Table 1 and Table S2).

Table 1. Genetic diversity within populations of Taxus wallichiana based on chloroplast haplotype and nuclear microsatellite DNA data
Population codecpDNA trnL-trnFNuclear microsatellite
Haplotypes (number of individuals) H P H d π (×10−2) N A N P A R H O H E
  1. Private haplotypes are given in bold print.

  2. N, number of individuals per population; HP, number of private haplotypes; Hd, haplotype diversity; π, nucleotide diversity; NA, number of total alleles; AR, alleles richness; NP, number of private alleles; HO, observed heterozygosity; HE, expected heterozygosity; EH group, East Himalaya to the Yunnan Plateau region; HM group, South Hengduan Mountains region.

EH group
KAH5(9), H7(3), H8(8)00.6470.2222002.050.210.29
NDH5(7), H7(2), H8(11)00.5950.2043222.770.370.43
XYH5(7), H6(1), H7(2), H8(10)00.6470.2042602.380.270.36
CNH5(5), H6(2), H7(3), H8(9), H16(1)10.7370.2063312.860.410.48
MTH5(15), H6(3), H7(2)00.4260.1203523.150.400.50
XLH5(9), H6(2), H7(5), H8(1), H17(1), H19(1), H29(1)10.7530.2423503.170.480.51
XBH2(2), H5(12), H6(3), H7(3)10.6160.1643502.990.350.46
CYH5(6), H6(3), H7(3), H17(2), H18(2), H25(4), H27(1)20.8620.2943913.390.520.55
GSH4(4), H6(11), H7(2), H21(3)10.6580.2114013.530.360.55
YGH1(1), H5(1), H6(8), H7(4), H20(2), H21(4)20.7840.1594203.610.430.53
FGH5(2), H6(16), H7(4), H27(1)00.5020.1163503.190.450.50
BLH5(2), H6(13), H7(5)00.5320.1033203.000.420.47
LKH5(4), H6(11), H7(1), H8(4)00.6470.1362802.800.440.45
CJH5(7), H6(4), H7(6), H8(3)00.7630.1383903.370.510.56
YJH5(4), H6(2), H7(7), H8(7)00.7420.1993002.900.400.50
DLH6(6), H7(12), H8(2)00.5680.0833403.220.530.53
DYH5(4), H6(3), H7(8), H8(2), H9(1), H26(2)00.7950.2683903.340.460.51
SHLH5(3), H6(16), H7(1)00.3530.0782902.760.440.45
LXH6(8), H8(12)00.5050.0602402.460.360.38
YDH5(1), H6(5), H7(11), H19(1)00.5750.1023513.290.480.54
FEH6(2), H7(3), H8(6), H19(9)00.7110.1473203.030.430.47
YXH6(9), H7(1), H8(8), H19(2)00.6580.0953213.190.490.53
SJH5(1), H7(11), H8(7), H26(1)00.6000.1572802.720.440.42
JDH3(1), H5(4), H6(7), H7(4), H8(2), H19(2),10.8160.1963303.140.520.53
XPH5(6), H6(2), H7(6), H8(2), H19(3), H28(1)10.8160.2353002.980.440.51
MAH5(1), H6(2), H24(5)10.6070.1922602.790.520.43
HM group
MLH9(19), H15(1)10.1000.0123502.850.280.36
KPGH9(16), H13(1), H14(1)20.2160.0274103.470.360.47
LJSH9(17), H12(3)10.2680.0333803.160.320.47
LJH9(19), H11(1)10.1000.0123603.240.360.46
HBH9(11), H10(2)10.2820.0344113.580.410.46
EYH6(1), H9(19)00.1000.0853202.860.240.39
WBH6(2), H9(8)00.3560.3043803.790.390.49
CXH6(1), H9(18), H22(1)10.1950.1744103.430.360.51
Mean 0.4420.6260.16033.880.283.060.390.46

The BEAST-derived tree and haplotype network revealed similar relationships (Fig. 2). The 29 haplotypes clustered into two lineages, one including 22 haplotypes from the EH lineage, and the other consisting of seven haplotypes from the HM lineage (Fig. 2a). The haplotype network showed two clades, also corresponding to the EH and HM lineages (Fig. 2b; Table 1). Of the 22 haplotypes found in the EH lineage, 13 were private. Within this lineage, haplotypes H5, H6, H7 and H8 were most frequent, and were widely distributed across the EH populations. The remaining eighteen haplotypes were linked to the four main haplotypes in a ‘star-like’ network. Of the seven haplotypes detected in the HM lineage, haplotype H9 was most abundant and predominant in all populations within the HM lineage (Fig. 1; Table 1). The other six were private, and connected to H9 by single steps in a ‘star-like’ network (Fig. 2b). Four populations (i.e. WB, CX, EY and DY, Fig. 1; Table 1) possessed haplotypes from both the EH and HM lineages.

Figure 2.

Bayesian tree (a) with Taxus brevifolia as outgroup and MP median-joining network (b) based on the 29 cpDNA haplotypes. Numbers above branches in the Bayesian tree are posterior probability values. The size of circles in the network correspond to the frequency of each haplotype. Slashes across network branches indicate the number of mutational steps. The remaining branches represent single mutational steps. Colored haplotypes are shared by two or more populations. HM lineage, South Hengduan Mountains region; EH lineage, East Himalaya to the Yunnan Plateau region; Ma, million years ago.

Molecular genetic diversity indices Hd and π for each population are summarized in Table 1; Hd ranged from 0.000 to 0.862, and π from 0.000 to 0.294. At the species level, HT was estimated as 0.809 (± 0.034), HS as 0.430 (± 0.046) and π as 0.00117. At the group level, the EH group exhibited higher genetic diversity (0.824, 0.625 and 0.00106 for HT, HS and π, respectively) compared to the HM group (0.100, 0.095 and 0.00043; Table 2).

Table 2. Genetic diversity and genetic differentiation of 43 populations of Taxus wallichiana at the species and group levels
GroupscpDNA trnL-trnFNuclear microsatellite
H S H T π (× 10−2) G ST N ST H S H T G ST G′ ST F ST
  1. HS, mean genetic diversity within populations; HT, total genetic diversity; n, nucleotide diversity; FST, among population differentiation; GST, genetic differentiation use only of the allelic frequencies; NST genetic differentiation considered similarities between the haplotypes; G'ST, standardized measure of genetic differentiation;

  2. ns, not significant. HM group, South Hengduan Mountains region; EH group, East Himalaya to the Yunnan Plateau region.

  3. a

    NST is significantly different from GST (< 0.01).


The AMOVA revealed that 61.8% of genetic variation was partitioned among groups, 8% among populations within groups, and 30.2% within populations (Table 3). The coefficient of differentiation measured over the 43 populations was for GST = 0.49 and NST = 0.72 (Table 2). The permutation test indicated that NST was significantly higher than GST (< 0.01), pointing to a strong phylogeographical structure for the T. wallichiana cpDNA haplotypes. In the SAMOVA analysis, with K = 2, two groups were well defined, corresponding to the EH and HM groups, although the ΦCT value was not the highest. The ΦCT values changed little with increasing number of groups (K), and was highest when = 4, when populations JZ and MA, with a high proportion of private haplotypes, were each separated as independent groups from the EH group.

Table 3. Results of the analysis of molecular variance (AMOVA) for chloroplast DNA data and nuclear microsatellite data of 43 Taxus wallichiana populations
Source of variationd.f.Sum of squaresVariance componentsPercentage of variationFixation index (FST)
  1. Significance: *, < 0.001.

  2. d.f., degree of freedom.

cpDNA trnL-trnF
Among groups13452.299.2461.8 
Among populations within groups411116.891.208.0 
Within populations7723480.064.5130.2 
Total8148049.2414.95 0.70*
Nuclear microsatellite
Among groups1561.060.7524.0 
Among populations within groups41457.170.247.8 
Within populations15733333.092.1268.3 
Total16154351.323.11 0.32*

The mismatch distribution analyses gave unimodal graphs for both the EH and HM lineage. A sudden population expansion is inferred by a near-perfect fit of the observed mismatch distribution with the expected (Fig. S1a–d), indicated by nonsignificant SSD and HRI statistics (Table 4). Values for the Fu's FS and Tajima's D tests were significantly different from zero for the HM lineage, but not for the EH lineage (Table 4). Based on the value of τ, the demographical and spatial expansion times were calculated as 60.4 (32.2–92.3) thousand years ago (ka) and 56.2 (23.9–85.4) ka, respectively, for the EH lineage respectively, and 85.8 (19.9–91.5) ka and 6.8 (0–20.7) ka, respectively, for the HM lineage (Table 4).

Table 4. Results of the mismatch distribution analysis and neutrality tests of the two Taxus wallichiana lineages
PopulationsExpansion typesTauTime (t) since expansion began, in kaSSDP-valueHRIP-value D P-value F S P-value
  1. EH lineage, East Himalaya to the Yunnan Plateau region; HM lineage, South Hengduan Mountains region; SSD, sum of squared deviation under expansion model; HRI, Harpending's raggedness index; D, Tajima's D test statistic; FS, Fu's FS test statistic.

EH lineageDemographic expansion2.10060.4 (32.2–92.3)0.005510.3100.0380.580−0.8680.203−6.7820.050
Spatial expansion2.02656.2 (23.9–85.4)0.004680.3500.0380.580
HM lineageDemographic expansion3.00085.8 (19.9–91.5)0.000020.2500.7680.770−1.8170.002−13.0180.000
Spatial expansion0.2006.8 (0–20.7)0.000030.1300.7680.870

trnL-trnF mutation rate and lineage divergence time

The BEAST-derived cpDNA (rbcL and matK) tree of dataset I is shown in Fig. S2. The crown age of Taxaceae is c. 210.5 Ma (95% highest posterior density, HPD: 173.6–247.4 Ma) (Table S7). The mean ages of other main genera for Taxaceae were dated as 23.9–30.6 Ma (Table S7). Based on these node ages, the BEAST analysis provided an average substitution rate for trnL-trnF of 8.08 × 10−10 substitutions/site/year (s/s/y) from dataset II. Using this rate and dataset III, the most common ancestor of the EH and HM lineages was estimated to date back to c. 4.2 (95% HPD: 2.0–6.5) Ma (Fig. 2a), suggesting a Pliocene split between the two lineages. The crown ages of the EH and HM lineages were estimated with c. 2.4 (95% HPD: 1.2–3.8) Ma and 2.1 (95% HPD: 0.7–3.6) Ma, respectively, indicating a Pleistocene diversification of the haplotypes within each lineage.

Microsatellite data

For the nuclear microsatellite markers, no evidence of scoring errors due to large allele dropout or stuttering at any locus was indicated. For the nine microsatellite loci used here, only 4.4% population-locus combinations deviated significantly from HWE after sequential Bonferroni correction (corrected α = 0.00013, < 0.01). There was no evidence for linkage disequilibrium.

Genetic diversity indices are summarized for each locus (Table 5) and population (Table 1), respectively. The number of alleles per locus ranged from 6 to 15. Mean HS and HT among loci were 0.47 and 0.60, respectively. At the population level, NA and AR ranged from 20 to 42 and 2.05 to 3.79, HO and HE ranged from 0.21 to 0.53 and 0.29 to 0.56, respectively (Table 1). Population differentiation was significant for all loci (< 0.001; Table 5), with the average FST for multilocus estimates being 0.22. The standardized genetic differentiation, G′ST was higher than FST across all loci (G′ST = 0.42 Table 5). The AMOVA results indicated significant genetic differentiation (FST = 0.32, < 0.001), with 24% of the variation partitioned among groups, only 7.8% of the variation among populations within groups, and the rest of 68.3% variation within populations (Table 3). The FST was also significant among the populations within the two groups. The population genetic differentiation within the EH group was FST = 0.11 (< 0.001), and 0.08 (< 0.001) among populations of the HM group (Table 2).

Table 5. Summary of the nine microsatellite loci used to study the population genetics among 43 populations of Taxus wallichiana
LocusSize range (bp) N A H O H E H S H T F ST G′ ST
  1. NA, number of alleles; HO, observed heterozygosity; HE, expected heterozygosity; HS, mean genetic diversity within populations; HT, total genetic diversity; FST, among population differentiation; G′ST, standardized measure of genetic differentiation.


The STRUCTURE analysis suggested Log-likelihood values of the data increased considerably when raising = 1–2 (Fig. 3a), after which the ln Pr (X/K) did not increase significantly. Based on the second-order rate of change in values of K (ΔK), the most likely number of genetic clusters for the complete dataset was also estimated at 2 (Fig. 3b).These confirmed the existence of two distinct genetic units corresponding to the EH and HM lineages, with the exception of only a few individuals (Fig. 3c). Four populations (CX, WB, EY and DY) located on the geographical boundary between the two groups included individuals assigned to both lineages. The genetic discontinuities between the EH and HM groups were also identified by BARRIER (Fig. 4). Strong genetic barriers between the groups were detected that were supported by high bootstrap values of 92–99.8%. However, no clear genetic barriers were detected among populations within the two groups (Fig. 4).

Figure 3.

Bayesian inference analysis of nuclear microsatellite data for the determination of the number of clusters (K) for the Taxus wallichiana populations analyzed. (a) The mean posterior probability of the data for = 1–20 (20 replicates), the standard deviation of each mean L(K) value was given respectively, the red rhombus indicated K = 2. ; (b) distribution of ΔK, the red rhombus indicated K = 2. ; (c) results for two and three clusters as detected by STRUCTURE. HM group, South Hengduan Mountains region; EH group, East Himalaya to the Yunnan Plateau region.

Figure 4.

Results of the BARRIER analysis based on microsatellite data, showing the spatial separation of Taxus wallichiana populations. (a) Delaunay triangulation and detected barrier (thick red line) separating the EH and HM populations, with bootstrap values over 1000 replicates using Nei's genetic distances (1983). (b) Geographic location of the genetic barrier, indicated by a dashed red line, as suggested in the BARRIER analysis. Circles indicate the location of the East Himalaya to the Yunnan Plateau (EH) and South Hengduan Mountains (HM) populations, corresponding to the two genetic clusters identified by the NETWORK and STUCTURE analyses.

Climatic data

Out of the 19 bioclimatic variables, all but four (bio2: mean diurnal temperature range, bio3: isothermality, bio7: temperature annual range, bio15: precipitation seasonality), showed significant differences in mean values between populations of the HM and EH group (Table 6 and Table S8). The variables bio1 (annual mean temperature: HM = 9.2°C vs EH = 11.5°C), bio9 (mean temperature of driest quarter: HM = 3.1°C vs EH = 5.8°C), bio11 (mean temperature of coldest quarter: HM = 2.8°C vs EH = 5.4°C), and bio12 (annual precipitation: HM = 960 mm vs EH = 1271 mm) exhibited the highest significance levels of differences of the means between the two groups (< 0.001), followed by bio6 (min temperature of coldest month) and four rainfall variables, bio13 (precipitation of wettest month), bio 16 (precipitation of wettest quarter), bio18 (precipitation of warmest quarter), and bio 19 (precipitation of coldest quarter) (= 0.001).

Table 6. Means, standard errors (SE) and results of t-tests on 19 BioClim variables for 43 populations of Taxus wallichiana
  Average altitude (m)Bio1 (°C)Bio2 (°C)Bio3 (°C)Bio4 (SD*100)Bio5 (°C)Bio6 (°C)Bio7 (°C)Bio8 (°C)Bio9 (°C)Bio10 (°C)Bio11 (°C)Bio12 (mm)Bio13 (mm)Bio14 (mm)Bio15 (CV)Bio16 (mm)Bio17 (mm)Bio18 (mm)Bio19 (mm)
  1. Entire populations sample (A); populations along the barrier paired by altitude (B); shortest distance (C) (see Table S4 for pairs).

  2. ns, not significant; EH, East Himalaya to the Yunnan Plateau region; HM, South Hengduan Mountains region.

(A) All populations (two-tailed t-test)
  P > 0.001> 0.001nsns0.050.010.001ns0.01> 0.0010.01> 0.001> 0.0010.0010.01ns0.0010.010.0010.001
(B) Populations paired by altitude (paired two-tailed t-test)
  P 0.050.01nsns0.0010.
(B1) Western barrier ML-CX (paired two-tailed t-test)
  P ns0.05ns0.010.01ns0.
(B2) Southern barrier EY-YY (paired two-tailed- t-test)
  P nsnsnsnsnsnsnsnsnsnsnsns0.05ns0.010.05ns0.001ns0.001
(C) Populations paired by shortest distance (paired two-tailed t-test)
  P ns0.01nsns0.050.050.01ns0.
(C1) Western barrier KPG-CX (paired two-tailed t-test)
  P ns0.05nsnsns0.050.05ns0.
(C2) Southern barrier EY-YY (paired two-tailed t-test)
  P nsnsnsnsnsnsnsnsnsnsnsns0.05ns0.010.05ns0.001ns0.001

Pairing populations of the two regions along the entire gene flow barrier showed statistically highly significant differences for 17 of the 19 BioClim variables (Table 6). Splitting the pairs into westerly and southerly barriers showed 16 significant variables for the first, and five variables for the latter which were all temperature variables, with bio17 (precipitation of driest quarter) and bio19 (precipitation of coldest quarter) having highly significant differences. There were a few differences in the results when aligning the pairs by altitude or by shortest distance, with the former giving higher significant differences for some variables (Table 6).

Species distribution modeling

For both the EH and HM lineages under all climate scenarios, the area under the receiver operating characteristic curve (AUC) values were ≥ 0.999, indicating a far better than random prediction. Current distribution predictions were generally good representations of the actual distributions of both lineages, the only exception being the predicted occurrence of the HM lineage in the East Himalaya and West Sichuan Basin, where this lineage does not occur at present. Palaeodistribution modeling for both lineages indicated a much more restricted distribution during the LIG than at present, with subsequent expansion at the LGM to cover a slightly greater area than that predicted under current climatic conditions (Fig. 5).

Figure 5.

Potential distribution range of Taxus wallichiana in HHM region simulated by Ecological Niche Models using bioclimatic variables. Darker colors show areas with more suitable predicted conditions. LGM, Last Glacial Maximum; LIG, Last Interglacial; EH lineage, East Himalaya to the Yunnan Plateau region; HM lineage, South Hengduan Mountains region.


Cryptic speciation in Taxus wallichiana

The strong population genetic structure identified in T. wallichiana, for both the cpDNA haplotypes and microsatellite data indicated that the EH and HM groups represent two distinct lineages (Table 2). The estimated differentiation values (GST) were larger than the mean genetic differentiation calculated for paternally inherited plastid markers in 37 conifer species (0.469 vs 0.165; Petit et al., 2005) and for biparentally inherited microsatellite loci (0.32 vs 0.24/0.26 for FST/RST; Nybom, 2004), indicating the existence of strong barriers to gene flow between the EH and HM groups. Geographic isolation of populations within species and variation in ecological factors are major precursors to cryptic speciation (Hoskin et al., 2005). The Nushan Mountains, which form a large northern part of the Mekong–Salween divide, appear to act as a topological N–S barrier for Taxus to the West of the HM group (Fig. 4), as suggested by previous studies (Gao et al., 2007; Li et al., 2011). It is interesting to note that even though Taxus is wind pollinated, pollen does not seem to be able to cross the 4000 m range of the Nushan Mountains. This may be explained by the fact that the plants occur as understory trees, where wind movement is limited, and the dispersal distance of Taxus pollen has been shown to be limited to a few meters (Wheeler et al., 1995).

Although the timing of the uplift process that formed the QTP is of longstanding debate, studies suggest that the plateau rose rapidly c. 10–8 Ma (Harrison et al., 1992; Royden et al., 2008) or even more recently (since 3.6 Ma; Li & Fang, 1999). Our divergence time estimations suggest that the two lineages diverged in the Pliocene (4.2 (2.0–6.5) Ma), a time that corresponds well with later estimates for the uplift of the QTP and the formation of the Hengduan Mountains (Tapponnier et al., 1990; Shi et al., 1998; Akciz et al., 2008). It is likely that the Nushan Mountains had almost attained their current height by that time, and that their rapid uplift effectively divided the populations of T. wallichiana.

Both lineages mostly comprised reciprocally exclusive haplotypes, along the boundary between the two groups, but five cpDNA haplotypes from the EH region were found in three populations of the HM lineage (EY, WB, CX), and one from the HM region in population DY of the EH lineage (Fig. 1; Table 1). Based on the microsatellite data, only a few of these individuals showed signs of introgression. For instance, the two individuals in population WB possessed EH chloroplast haplotypes and also a nuclear genotype belonging to the EH group (Fig. 3c). Population WB is the most northern mixed population in the HM group that exhibits such admixture, and the presence of migrants here is probably due to rare seed dispersal events across the Nushan Mountains from the West, or a northward migration along the Mekong River from the South. The remaining three populations that exhibit admixture are located at the southern edge of the HM group where no topological barriers to gene flow are existent, and the two lineages remain separated perhaps through niche specialization (see following paragraph). The observed haplotype mixtures indicate that the barriers are ‘leaky’, and demonstrate the occurrence of recent secondary contacts between the two lineages.

Ecological factors reinforce divergence in Taxus wallichiana

The uplift of the QTP and its associated climatic changes has been speculated as a main cause of plant diversification in the QTP area (Cun & Wang, 2010; Xu et al., 2010; Yang et al., 2012). The current distribution of T. wallichiana is primarily determined by the Indian monsoon in summer and Westerlies in winter, with limited influence from the East Asian monsoon (Fig. 1). Because the monsoons were blocked gradually by the N–S mountain ranges in the South Hengduan Mountains region, the climate for the HM lineage populations can be characterized as cold and dry, whereas the distribution range of the EH lineage has a humid and warm climate as a result of the uplift of the QTP. The intensification of the Indian monsoons during the Pliocene (An et al., 2001; Wan et al., 2007; Zhang et al., 2009; Chang et al., 2010), would have increased levels of precipitation in the HHM region. However, the synchronous rise of the Hengduan Mountains, particularly Gaoligong Mountains and Nushan Mountains in this period, obstructed the moist air eastward to the HM lineage distribution range, resulting in differentiation between the EH and HM lineages due to environmental factors, as suggested by paleoclimatic evidence (Kou et al., 2006; Yao et al., 2012a). Such differences were further strengthened by the accelerated uplift phases of QTP in the late Pliocene. Consequently, following geographical isolation by the Nushan Mountains, the two T. wallichiana lineages would have been exposed and eventually adapted to their differing environmental conditions. The HM populations occur at higher altitudes (2639–3230 m, average 2987 m) compared to those of the EH lineage (2213–3037 m, average 2661 m), and this is reflected in climatic differences. While both groups experience a dry winter and wet summer, the EH populations are generally associated with a warmer and wetter climate due to their occurrence at lower altitudes and the effects of the Indian monsoon, compared to the colder and drier climate of the HM lineage (i.e. higher altitude) (Table 6). Although no topological barriers are present between the two lineages at the southern edge of the Hengduan Mountains region, significant differences in climatic variables can be detected, particularly those involving winter rainfall, creating an ecological barrier (Table 6).

The ecological differences found between the two T. wallichiana lineages studied here seem to represent species-specific characteristics and are apparently sufficient to keep both lineages separated to a high degree, as indicated by the chloroplast and nuclear data. Such ecological niche partitioning will have reinforced the divergence of the two lineages following initial spatial isolation, and may have given rise to some degree of differential adaptation to their respective environmental conditions. This, given sufficient time, might ultimately lead to reproductive isolation (Rieseberg & Burke, 2001; Nosil et al., 2009; Thorpe et al., 2010; Wagner et al., 2012), resulting in the formation of new species.

Unusual demographic history of Taxus wallichiana

Climate changes during Pleistocene glacial–interglacial cycles had a dramatic effect on species distribution ranges (Comes & Kadereit, 1998; Hewitt, 2004), causing migration and/or extinction of populations, followed by periods of isolation, divergence and subsequent expansion (Taberlet et al., 1998; Cun & Wang, 2010). Most plant taxa are believed to have shifted latitude or elevation ranges in response to glaciations (Davis & Shaw, 2001). The divergence time estimated for the EH and HM lineages here suggests that they became separated well before the Quaternary, in the Pliocene. Therefore, climatic oscillations during the Pleistocene will have had pronounced effects on their more recent population history. In contrast with most previously suggested glacial evolutionary scenarios in the QTP and adjacent regions, such as peri-glacial refugia, refugia-within-refugia, and local persistence (e.g. Zhang et al., 2005; Yang et al., 2008; Cun & Wang, 2010; Opgenoorth et al., 2010; Wang et al., 2010), our analyses indicate expansion of T. wallichiana populations throughout the most recent glacial period, rather than contraction. For both lineages, the star-like pattern exhibited by the minimum spanning network (Fig. 2) is characteristic of recent population expansion, and the observed pairwise differences of cpDNA haplotypes fit a sudden expansion model particularly well (Fig. S1) and produce significantly negative Tajima's D and Fu's FS statistics (Table 4). The mismatch distribution analysis suggested different phases of demographic and spatial expansion for the EH and HM lineages (Table 4). Expansion of T. wallichiana during the last glaciations was further supported by the SDM analysis, which indicated larger distribution ranges of both lineages at the LGM compared to both the LIG and present (Fig. 5).

This relatively unusual demographic scenario can be explained partly by the topoclimatic characteristics of the HHM region. It is located at the southern and southeastern fringe of the QTP, and is exposed to the moist and warm air of the Indian monsoon from the South (Figs 1, 5). Because the QTP prevents the southward flow of cold and dry continental air toward the HHM region, the region is characterized by a warmer and wetter climate than the majority of the QTP, which would provide a more stable environment for plant populations, even during glaciations. In addition, the HHM region shows great latitudinal range and elevation differences, which could possibly provide microclimatic refugia for T. wallichiana populations during Quaternary climatic oscillations. Such areas would not be as prevalent in the Himalayan regions as in the South Hengduan Mountains region due to the topographic character (Fig. 1). Thus, T. wallichiana populations could have migrated downward and southward to track their optimal ecological conditions during the glacial periods and expanded into larger areas of suitable habitats. By contrast, during the LIG, the temperature was at least 5°C higher than at present (Shi et al., 1998). Given that the current temperature in the HHM region decreases by 0.64°C/100 m (Li & Zhang, 2010), and assuming niche conservatism over time, Twallichiana populations would have had to move uphill at least 800 m from their present distributional altitude due to little space for latitudinal shift. The environment in the Himalaya would have been very harsh above 3500 m, and there would be limited suitable habitats for T. wallichiana populations, which would ultimately result in population contractions and extinction in this region. Conversely, the topography of the South Hengduan Mountains shows greater potential for latitudinal shift than in the Himalaya region, which could facilitate persistence of T. wallichiana populations in this region during the LIG.

The possibility of T. wallichiana population expansion during the last glaciation is also supported by paleoclimatic and paleoenvironmental analysis from this region. There have been four major glaciations in the QTP during the Quaternary, becoming progressively less extensive after the largest Naynayxungla Glaciation (c. 0.72–0.5 Ma), and LGM glacier advances were significantly less extensive in this area (Shi et al., 1998; Zheng et al., 2002; Owen, 2009). It is thus likely that the last glaciation only had a limited impact on T. wallichiana. Furthermore, paleoclimatic analyses also suggested wetter winters and lower annual mean temperatures than today during the last 40–14.2 ka in the HHM region (Walker, 1986; Shi et al., 2001; Xiao et al., 2009), when Taxus still persisted in the South Hengduan Mountains (Jiang et al., 1998). The expansion scenario is further supported by vegetation reconstructions based on pollen fossil data, which indicated that mixed and evergreen subtropical forests were present in SE Tibet during the LGM (e.g. Yadong, XY; Motuo, MT; and Chayu, CY; Fig. 1) (Shi et al., 1998), where populations of T. wallichiana are found today. Biome reconstructions on the basis of recent climatology and pollen data also indicate a warm-temperate climate during the LGM for the present-day distribution area of T. wallichiana (Harrison et al., 1992).

A scenario of population expansion, or at least population stability, during the last glacial period, as opposed to the general ‘contraction–expansion’ scenario usually described has occasionally been reported in other plants (e.g. Yan et al., 2012) and animals (e.g. Li et al., 2009) in subtropical areas of China. There is also growing evidence for population demographic stability or expansion throughout the LGM elsewhere in a range of different organism (e.g. King et al., 2009; Marko et al., 2010; Bisconti et al. 2011; Cunha et al. 2011; Pinheiro et al., 2011; Batalha-Filho et al., 2012).


Our analyses of Taxus wallichiana, combining molecular phylogeography and species distribution modeling, strongly suggest that the observed patterns of genetic variation and divergence in T. wallichiana are best explained by a combination of both Miocene/Pliocene geological and climatic events, and late Quaternary climatic oscillations. Initial topological constraints, reinforced by subsequent differential ecological (climatic) adaptations have resulted in cryptic speciation, and the formation of two discrete taxonomic entities, a split indicated by both molecular (Gao et al., 2007; Liu et al., 2011b) and morphological analyses (Möller et al., 2007). Furthermore, T. wallichiana populations exhibit an unusual demographic history, having expanded their range during the last glacial period. These findings indicate that a combination of orographic and climatic factors has played a fundamental role in promoting diversification and evolution of species in the HHM region, and that these processes may more complex than previously thought.


The authors are extremely grateful to a large number of individuals, too numerous to list, who supported our extensive fieldwork and laboratory work. We acknowledge Pete Hollingsworth, Andrew Lowe, Richard Milne and Jian-Quan Liu for their constructive comments on earlier versions of the manuscript. The University of Oslo Bioportal provided computation and software resources. This work was supported by the Key Research Program of the Chinese Academy of Sciences (KSZD-EW-Z-011); the National Natural Science Foundation of China (30700042, 31200182); the Ministry of Science and Technology of China (2012FY110800); and the talent project of Yunnan Province, China (2008YP064). The Royal Botanic Garden Edinburgh is funded by the Rural and Environment Science and Analytical Services division (RESAS) in the Scottish Government.