POPULATION GENETIC EVIDENCE FOR COMPLEX EVOLUTIONARY HISTORIES OF FOUR HIGH ALTITUDE JUNIPER SPECIES IN THE QINGHAI–TIBETAN PLATEAU

Authors

  • Zhonghu Li,

    1. Molecular Ecology Group, State Key Laboratory of Grassland Farming System, Lanzhou University, Lanzhou 730000, Gansu, China
    Search for more papers by this author
  • Jiabin Zou,

    1. Molecular Ecology Group, State Key Laboratory of Grassland Farming System, Lanzhou University, Lanzhou 730000, Gansu, China
    Search for more papers by this author
  • Kangshan Mao,

    1. Molecular Ecology Group, State Key Laboratory of Grassland Farming System, Lanzhou University, Lanzhou 730000, Gansu, China
    Search for more papers by this author
  • Kao Lin,

    1. Laboratory of Evolutionary Genomics, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai 200031, China
    2. Graduate School of the Chinese Academy of Sciences, Beijing 100039, China
    Search for more papers by this author
  • Haipeng Li,

    1. Laboratory of Evolutionary Genomics, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai 200031, China
    Search for more papers by this author
  • Jianquan Liu,

    1. Molecular Ecology Group, State Key Laboratory of Grassland Farming System, Lanzhou University, Lanzhou 730000, Gansu, China
    2.  E-mail: liujq@nwipb.ac.cn
    Search for more papers by this author
  • Thomas Källman,

    1. Program in Evolutionary Functional Genomics, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, 75326 Uppsala, Sweden
    Search for more papers by this author
  • Martin Lascoux

    1. Graduate School of the Chinese Academy of Sciences, Beijing 100039, China
    2. Program in Evolutionary Functional Genomics, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, 75326 Uppsala, Sweden
    Search for more papers by this author

Abstract

Population genetics data based on multiple nuclear loci provide invaluable information to understand demographic, selective, and divergence histories of the current species. We studied nucleotide variation at 13 nuclear loci in 53 populations distributed among four closely related, but morphologically distinct juniper species of the Qinghai–Tibetan Plateau (QTP). We used a novel approach combining Approximate Bayesian Computation and a recently developed neutrality test based on the maximum frequency of derived mutations to examine the demographic and selective histories of individual species, and isolation-with-migration analyses to study the joint history of the species and detect gene flow between them. We found that (1) the four species, which diverged in response to the extensive QTP uplifts, have different demographic histories; (2) two loci, Pgi and CC0822, depart significantly from neutrality in one species and Pgi, is also marginally significant in another; and (3) shared polymorphisms are common, indicating both incomplete lineage sorting and gene flow after species divergence. In addition, the detected unidirectional gene flow provides indirect support for the theoretical prediction that introgression should mostly take place from local to invading species. Our results, together with previous studies, underscore complex evolutionary histories of plant diversification in the biodiversity-hotspot QTP.

In the last decade there have been major changes in our understanding of the responses of populations of both plant and animal species to climatic fluctuations during the Quaternary. Firstly, analyses of macrofossils and population genetic studies, combined with new analytical methods, have thoroughly established that not all plants and animals were confined to scattered and limited southern refugia during cold periods. Instead, the new data indicate that cold-tolerant species were able to survive at intermediate latitudes, and some might have even survived within or close to their current ranges. It is also likely that some species were able to retain fairly large effective population sizes during the glacial periods (e.g., Birks and Willis 2008; Binney et al. 2009). Secondly, it has become increasingly clear that gene flow among species and introgression commonly occurred during the same period (e.g., Nosil 2008; Slotte et al. 2008; Nadachowska and Babik 2009; Ross-Ibarra et al. 2009; Carling et al. 2010; Hey 2010a; Zheng and Ge 2010). In addition, especially in forest tree, species with long generation times and large effective population sizes, ancestral shared polymorphisms can be extremely common and fixed sites rare (Willyard et al. 2007; Chen et al. 2010; Li et al. 2010). Hence, the evolutionary unit of interest when inferring species histories is not individual species, but rather complexes of species interlinked by gene flow. Furthermore, gene flow among populations within species will influence the extent of gene flow between species; the more “connected” the populations are within a species, the lower the gene flow will be between species (Petit and Excoffier 2009). Thirdly, computer simulations and experimental studies have underlined the importance of population dynamics in shaping the current distribution of variation within and among species. For example, population expansion may have a strong impact on the fixation of rare alleles (the so-called surfing phenomenon, Hallatschek et al. 2007; Excoffier and Ray 2008; Hallatschek and Nelson 2008) and the direction of introgression in contact zones (Currat et al. 2008). Finally, the relative importance of genetic drift and selection in shaping nucleotide variation is currently being reassessed, and recent genomic studies indicate that selection may play a much more important role than has generally been assumed (Hahn 2008; Wakeley 2010). Although these results may not invalidate previous demographic inferences, they suggest that efforts should be made to distinguish loci (and ignore or at least separately consider) that may have been under selection when addressing demographic processes. This is a difficult task as demographic and selection processes often leave similar signatures in the genome, but new methods are emerging that should facilitate their differentiation (e.g., Gutenkunst et al. 2009; Beaumont 2010; Hey 2010a,b; Li 2011).

With its diverse landscapes and complex geological and climatic history, the Qinghai–Tibetan Plateau (QTP) as one of the biodiversity hotspots represents a fascinating natural laboratory for examining how species have evolved and adapted. The QTP started to rise in the middle Eocene and subsequent uplifts have continued well into the Pliocene (Shi et al. 1998). In particular, the high mountain ranges that surround the southern part of the QTP platform and are home to many coniferous species have been significantly uplifted since the middle Miocene (Wang et al. 2006). Like the rest of Eurasia, the QTP was also subject to extensive climatic changes during its formation, which were locally magnified by the large differences in altitude between its deep valleys and high mountain ranges (Shi et al. 1998). These geological and climatic changes are likely to have led to species migrations and fluctuations in population size, as well as interesting and diverse speciation processes. For example, previous studies of conifer species from the QTP have suggested that complex speciation processes have occurred, involving both bifurcating divergence due to allopatric isolations and the creation of new species through diploid hybridization (Ma et al. 2006; Li et al. 2010; Xu et al. 2010).

In the study presented here, we focused on four closely related juniper species from the QTP: Juniperus tibetica, J. saltuaria, J. convallium, and J. przewalskii. These four juniper species are morphologically distinguished by the cone shape and color, seed scales, branch tips, and leaf apex (Fu et al. 1999; Adams 2008). Juniperus tibetica, J. saltuaria, and J. convallium have a parapatric and/or sympatric distribution in the southwestern parts of the QTP. Juniperus tibetica occurs at 2700–4800 m above sea level (asl), and is thought to be endangered as it is heavily logged, and grazed by goats. Juniperus saltuaria has the southernmost distribution and is found at similar altitudes (2700–4600 m asl), whereas J. convallium tends to grow at slightly lower altitudes (2200–4300 m asl). The current range of J. przewalskii does not overlap with those of the other three species because it is distributed in the northeastern part of the QTP at altitudes ranging from 2600 to 4300 m asl (Fu et al. 1999). All four species are monoecious, except for occasional dioecious J. convallium individuals (Fu et al. 1999; Adams 2008). They all have long generation times (≥50 years from our field observations) and their seeds are mainly dispersed by frugivorous birds. Each species forms pure forest stands on southern slopes, but sometimes occurs in alpine meadows.

Phylogenetic analyses of the whole juniper genus based on chloroplast DNA (cpDNA) variations suggested that these four species comprise a small monophyletic group with a relatively recent diversification history (Mao et al. 2010). However, they seem to have had different demographic histories during the Quaternary. Results of an extensive phylogeographic study based on chloroplast sequence variation suggest that species of the J. tibetica complex (J. tibetica, J. saltuaria, and J. convallium) were able to survive the Last Glacial Maximum (LGM) in the area where they are currently growing (Opgenoorth et al. 2010). Of course, this does not preclude important movements of individual species within the area, especially those that occurred before the LGM, but changes in latitude seem to have been limited. In contrast, J. przewalskii, which grows on the platform and edge of the QTP, to the east of the J. tibetica group, may have followed a more classical “retreat and expansion” model as the current genetic diversity at both cpDNA and nuclear levels in populations located on the QTP platform is a subset of that found in populations at the edge (Zhang et al. 2005; Li et al. 2011). In any case, it is highly likely that these and previous changes in the distribution range led to gene flow between species because numerous individuals with intermediate morphology have been observed (Opgenoorth et al. 2010).

In this study, we sequenced 13 nuclear loci in these four high-altitude juniper species. DNA sequence polymorphisms within species and patterns of divergence between the species were then used to infer their overall history and demographic histories of individual species. Since the four species appear to have diverged recently they offer possibilities to assess the interplay of demographic processes with ongoing species differentiation. More specifically, we addressed the following questions. Is there a lack of differentiation in nuclear DNA (nrDNA) among species of the J. tibetica group, as previously observed in cpDNA (Opgenoorth et al. 2010)? Do the species depart significantly from the standard neutral model (SNM), in particular do they show genetic diversity compatible with past bottlenecks and/or population growth? Are expanding species more affected by introgression than stable species, as predicted by Currat et al. (2008)? Do any loci show evidence of selection? Since widely applied demographic models assume that all loci are evolving neutrally, but there is mounting evidence for genome-wide effects of selection (Hahn 2008), we tested for selection at individual loci using a new combination of methods. In addition to presenting results of these analyses, we discuss the implications of our results on our understanding of the speciation processes that have affected this group of species.

Materials and Methods

PLANT MATERIAL

We collected seeds from 19, 7, 8, and 19 populations of J. tibetica, J. convallium, J. saltuaria, and J. przewalskii, respectively (Fig. 1, Table S1). The number of individuals varied between one and 10 per population, with an average of 4.1 and a total of 37–80 individuals per species. Our sampling scheme was therefore intermediate between scattered and pooled sampling strategies sensu (Städler et al. 2009) and should not lead to strong bias when estimating summary statistics of the site frequency spectrum, such as Tajima's D (Li et al. 2011). Based on our previous phylogeographic studies and field surveys of these four species over the last 20 years (Zhang et al. 2005; Opgenoorth et al. 2010), we excluded all putative hybrid populations by classifying individuals based on the morphological distinctions between them (Fu et al. 1999; Adams 2008). For each species, we only selected populations with typical morphological characteristics. Hence, we inevitably underestimated the frequency of current gene flow among species.

Figure 1.

The locations, on the Qinghai–Tibetan Plateau, of each of the sampled populations for the four juniper species studied here, Juniperus tibetica, J. convallium, J. saltuaria, and J. przewalskii. The inset box shows the location of the studied area in a large-scale map of China. The x-axis gives the longitude and the y-axis gives the latitude.

We stored seeds at –20°C awaiting analysis, then soaked them overnight in water at room temperature before isolating the haploid megagametophyte, from which we extracted DNA using either a QIAGEN DNeasy Plant Mini Kit (QIAGEN, Inc., Valencia, CA) or the modified CTAB procedure (Doyle and Doyle 1990).

NUCLEAR LOCI SEQUENCING

We sequenced a total of 13 nuclear loci—CC0702, CC0822, CC1147, CC2196, CC2920, Myb, Chs, HemA, phosphoglucose isomerase (Pgi), CC1333, CC2241, Maldehy, and LHCA4 (Tsumura et al. 1997; Ujino-Ihara et al. 2000; Dvornyk et al. 2002; Tani et al. 2003; Kado et al. 2008) (Table S2)— after amplification (Li et al. 2011) using an ABI 3130xl or 3730xl Genetic Analyzer (Applied Biosystems, Foster City, CA) and an ABI Prism BigDye Terminator Cycle Version 3.1 Sequencing Kit, following the manufacturer's instructions. Singletons were verified by repeated amplification and resequencing from the same megagametophyte. Only sequences with high quality and single-peaks were retained. DNA sequences were aligned by Clustal X (Thompson et al. 1997) or Clustal W implemented in mega 4.1 (Kumar et al. 2008). We further confirmed all putative polymorphic sites through visual inspection of the chromatograms. All sequence data have been deposited in the EMBL/GenBank Databases under accession numbers JN099484-JN099681. Some sequence data for J. przewalskii were obtained from a previous study (Li et al. 2011).

POPULATION GENETICS ANALYSES

To save space, full details of population genetics analyses and corresponding references are provided in the online supplementary file (File S1). Below we outline the main features of the different analyses that were performed. We used DnaSP version 5.00 (Librado and Rozas 2009) to estimate basic population genetic parameters and LDhat 2.0 (McVean et al. 2002) to examine linkage disequilibrium (LD) and recombination events (Rm). Nuclear haplotype network was constructed using tcs version 1.21 (Clement et al. 2000). The genetic structure of the four juniper species was assessed by Wright's fixation index (Wright 1951; Weir and Cockerham 1984), analysis of molecular variance (AMOVA) in Arlequin version 3.1.1 (Excoffier et al. 2005) and geographical grouping and genotypic clustering in STRUCTURE Version 2.3 (Hubisz et al. 2009). We used the corrected allele frequency model with admixture and population information in the analyses. A total of 53 single nucleotide polymorphisms (SNPs) were used for the Structure analysis.

We tested the neutrality of the evolution of the examined loci using diverse statistics, including Tajima's D statistic (Tajima 1989), Fu and Li's D* and F* (Fu and Li 1993), Zeng et al.'s E (Zeng et al. 2006), and Fay and Wu's H (Fay and Wu 2000; Zeng et al. 2006). We further used the multilocus Hudson-Kreitman-Aguadétest (HKA: Hudson et al. 1987) to assess the fit of the data to the neutral equilibrium mode. Finally, the likelihood that natural selection has occurred at individual loci was estimated using the recently developed maximum frequency of derived mutations (MFDM) test (Li 2011). Briefly, the MFDM is used to examine the imbalance of the phylogenetic tree of a locus, exploiting the fact that, according to coalescent theory (Hudson 1990), variation in population size does not affect tree topology in a single Wright–Fisher population. Thus, the probability of an unbalanced tree is independent of any population bottleneck or population expansion if a single Wright–Fisher population is assumed. However, since population subdivision and admixture can also lead to an unbalanced tree due to migration, a phylogenetic method and a simple sampling scheme was also used to detect evidence of these migration events.

To compare the fit of the data acquired for each of the four species to different demographic models and identify genes departing from the best fitting demographic model, we used Approximate Bayesian Computation (ABC) (Marjoram and Tavaré 2006; Beaumont 2010; Csilléry et al. 2010 and reference therein) implemented in the seqlib software package (De Mita et al. 2007, http://sourceforge.net/projects/seqlib/) (details in File S1). The algorithm in seqlib compares observed values for the number of segregating sites, S, the nucleotide diversity, π, and the number of haplotypes, with the same summary statistics obtained through coalescent simulations. Three models were evaluated: (1) the SNM, (2) a population expansion model (PEM), and finally (3) a bottleneck model (BNM). The posterior distribution was used in posterior predictive simulations based on: (1) the total dataset, to evaluate how well our inferred model captured the properties of the original data; and (2) data for individual loci to test for departure of individual loci from the model and thus detect putatively selected loci (Thornton and Andolfatto 2006). Results of the latter should be interpreted cautiously, as the number of loci was limited, but should nonetheless provide an interesting comparison with the results of the MFDM test. Model selection was based on both their Bayes factors (BFs) and the posterior simulation over both all loci and for individual loci. Evidence for a given model was considered substantial when: (1) its BF exceeded 3.2 (Kass and Raftery 1995); (2) the posterior simulation included the observed value of the summary statistics; and (3) the number of loci departing from the posterior simulation distribution was minimal. Evidence in favor of the model was considered strong if the BF exceeded 10. If any of these criteria were not fulfilled then we retained the simplest model.

To estimate migration and splitting time parameters, we used estimation methods based on the isolation-with-migration (IM) model developed by Hey and co-workers (Wakeley and Hey 1997; Nielsen and Wakeley 2001; Hey and Nielsen 2007; Hey 2010b). For this, we first analyzed species in a pairwise manner using a basic two-population model, and then jointly with the multiple populations IM model. Both approaches are implemented in IMa2 (Hey 2010a,b). The multiple populations IM model requires a phylogenetic tree with species-splitting events ordered in time. To estimate the phylogenetic relationships of the four juniper species we used BEST 2.3.1 (Liu and Pearl 2007). We randomly chose one individual from each species and used J. microsperma as an outgroup. The interspecific relationships based on all nuclear loci were established as following: {([J. convallium, J. saltuaria]: 0.73, J. tibetica): 0.70, J. przewalskii}. In addition, we also obtained the same species tree topology through sampling two and 10 individuals from each species although support values among species are difficult to calculate because such analyses took too long time. This tree topology should have represented true phylogenetic relationships of four species-based nuclear loci (e.g., Liu and Pearl 2007; Hey 2010a).

The IM model is based on several simplifying assumptions, including neutrality, nonrecombination of genetic loci, and random mating in ancestral and descendent populations (Nielsen and Wakeley 2001; Hey and Nielsen 2004, 2007; Hey 2010b). We only chose nonrecombining blocks of each locus for our analyses. The two nuclear loci that showed most evidence of natural selection (CC0822 and Pgi) were removed in all subsequent IM analyses. Since both splitting time and effective population size parameters are given in mutational units, estimates of the absolute mutation rates for individual loci are required to convert them into years and numbers of individuals. Assuming that J. microsperma and the juniper species examined here (e.g., J. przewalskii) diverged approximately 38.1 Mya (Mao et al. 2010), we estimated the average divergence at silent sites (Ks) for individual loci between J. microsperma and J. przewalskii, which can be treated as the accumulation of silent mutations over 38.1 Mya. Thus, the mutation rate per year at individual loci was estimated according to the formula u=Ks/2T. To obtain the mutation rate per generation, per-year rates were multiplied by the generation time, which was assumed to be 50 years, based on our previous field surveys and studies of other Cupressaceae species (Fujimoto et al. 2008). The resulting geometric average mutation rate (0.194 × 10−9 per site and per year) over the nine loci was used to scale the divergence time and effective population size. We note that this estimate is an order of magnitude lower than direct estimates obtained for Arabidopsis thaliana (7 × 10−9 base substitutions per site per generation, Ossowski et al. 2010), but of the same order of magnitude as indirect estimates previously obtained for conifers (0.7 × 10−9 substitutions per site per year; Willyard et al. 2007, 2009). In any case, as this mutation rate estimate is based on a series of nonwarranted assumptions, it should be interpreted cautiously.

Results

NUCLEOTIDE DIVERSITY

Sequence data from 13 nuclear loci were collected from four juniper species: three parapatric species, J. tibetica, J. convallium, and J. saltuaria distributed in the southwestern parts of the QTP and one northeastern QTP relative, J. przewalskii. On average, 180 megagametophytes per locus were sequenced across the species. In total, just over 8000 bp of aligned sequence was obtained and 72, 67, 92, and 88 segregating sites were identified in J. tibetica, J. convallium, J. saltuaria, and J. przewalskii, respectively. Singletons constituted 40% of the total number of segregating sites. Juniperus saltuaria had the highest silent nucleotide diversity, πs (0.00415), followed by J. przewalskii (0.00383) and J. tibetica (0.00292), whereas J. convallium showed the lowest diversity (0.00211). A similar pattern was observed for the total nucleotide diversity (πT) (Fig. 2, Table S3).

Figure 2.

Box plots of the summary statistics for Juniperus convallium, J. przewalskii, J. saltuaria, and J. tibetica. The top row shows nucleotide diversity in the four species, with total diversity (πT) and silent nucleotide nucleotide diversity (πS) in the left and right boxes, respectively. The bottom row shows Tajima's D and Fay and Wu's H for each of the species. Bars represent the median, the bottom and the top of the boxes represent the 25% and 75% percentilies, respectively, and whiskers extend out to 1.5 times the interquartile range. Dots are outliers.

LD AND RECOMBINATION

Since the number of segregating sites per locus was low, we had very limited power to estimate LD. Among the 460 pairwise comparisons between SNPs, 147 were significant after Bonferroni correction. In the four species, the value of r2 averaged over all sequenced regions investigated was 0.236, and LD as measured by r2 declined to <0.2 within 200 bp (data not shown). The population recombination rate, ρ, was slightly higher for J. convallium and J. przewalskii than for J. tibetica and J. saltuaria (Table S3).

POPULATION GENETIC STRUCTURE

As illustrated by the ΦST values among species shown in Table 1, J. saltuaria appears to be the most strongly differentiated species, having ΦST values as high as 0.5304 and 0.5208 with respect to J. tibetica and J. convallium, respectively (unexpectedly higher than the value, 0.3595, for the allopatric J. przewalskii). In contrast, the Bayesian clustering algorithm (Structure version 2.3) indicated that the most likely number of clusters for the entire dataset was K= 2; the first cluster primarily containing the three species of the tibetica group (J. tibetica, J. convallium, and J. saltuaria) and the second cluster dominated by the northeastern QTP species, J. przewalskii. Both the original method described by Pritchard et al. (2000) and the ΔK statistics presented by Evanno et al. (2005) suggested that the most likely number of clusters was two. However, the difference in likelihood between K= 2 and K= 4 was relatively small (average LnPD =–1399 and LnPD =–1414, respectively). These results primarily reflect the fact that a single cluster is much more unlikely than K≥2, but could also suggest a hierarchical structure of the data. Figure 3 shows the Structure clustering results for K= 2–4. The clustering results for K= 3 and K= 4 show similar patterns to K= 2. Overall, J. przewalskii is clearly differentiated from the southwestern QTP species, although it appears to have genetic contributions from the other three species. When considering individual species, there was no clear regional grouping of the J. tibetica and J. saltuaria populations, except that population 27 was genetically distinct from the other populations. For J. przewalskii, there was no clear geographical pattern. However, a previous study found that most alleles of the plateau populations are a subset of those found in populations at the edge of the QTP (Li et al. 2011). In contrast, for J. convallium, we were able to distinguish a southern (20, 21, 22, 23) and a northern group (24, 25, 26) (Figs. S1–S4).

Table 1.  ΦST values over all loci among species.
  J. tibetica J. convallium J. saltuaria J. przewalskii
  1. Significant level: *P < 0.01; **P < 0.001.

J. tibetica  
J. convallium 0.2259**    
J. saltuaria 0.5304**0.5208** 
J. przewalskii 0.2648** 0.3595** 0.2460*  
Figure 3.

Structure analysis of the four species when K= 2–4 clusters are assumed. For each K value, results of the run with the highest value of LnPD were used. Variation among runs was limited. Population are presented as pie charts in which individuals are colored based on mixed membership.

A minimum spanning network of the haplotypes at 12 of the loci was constructed (one locus Chs had too limited polymorphism and was omitted) (Fig. S5). Based on the topology and the frequency of the haplotypes, the most abundant haplotypes were located near the center of the network, and were often shared by the four species. Derived from each of these haplotypes were numerous closely related tip haplotypes that were often private to a single species.

DEPARTURE FROM THE SNM

An HKA test (Hudson et al. 1987) was performed to test departure from neutrality at individual loci. All four species were tested in turn with J. microsperma as an outgroup to ascertain whether the observed polymorphisms within species and the divergence between species deviated significantly from what would be expected under the SNM. In each case, the HKA test suggests no deviation from the neutral model at any of the individual loci (data not shown).

The mean Tajima's D and Fu and Li's D*and F*values were negative for all four species, and the mean Tajima's D ranged from –0.162 in J. przewalskii to –0.884 in J. convallium. The mean Fay and Wu's H was negative in J. convallium (–0.719), close to zero in J. przewalskii (–0.080) and J. saltuaria (–0.066), and positive in J. tibetica (0.198) (Fig. 2, Table S4). Few individual loci departed significantly from the SNM, but most had negative Tajima's D, and Fu and Li's D* and F* values. Interestingly, D, H, D*, and F* values for locus Pgi in J. convallium were significantly negative, and the SNM could be rejected for CC0822 in one species (J. przewalskii). Negative average values of both Tajima's D and Fay and Wu's H revealed the presence of skews toward both low-frequency variants (negative D) and high frequency-derived variants (negative H). This has generally been shown to reflect the presence of a relatively ancient bottleneck (Haddrill et al. 2005; Pyhäjärvi et al. 2007; Li et al. 2010), which may be the case for J. convallium. In contrast, J. tibetica might have experienced an even more ancient bottleneck because for older bottlenecks, Fay and Wu's H becomes more positive as new mutations accumulate. Finally, J. saltuaria and J. przewalskii did not show any skew toward high frequency-derived variants (H≈ 0), but exhibited a strong skew toward low-frequency variants, a pattern that could simply reflect recent population growth.

MFDM TEST

The MFDM test indicated that there were significant probabilities (P <0.05) of selection at three loci: CC1333, Pgi, and CC0822, all in J. przewalskii (Table 2). In addition, P-values for selection at Pgi and CC0702 in J. convallium and Pgi in J. saltuaria (slightly higher than 0.05, but less than 0.07), were also obtained (Table 2). Interestingly, synonymous mutations were found at the same Pgi site in both J. convallium and J. przewalskii, and three nonsynonymous mutations in its vicinity (192, 239, and 345). Since migration may also cause unbalanced trees, we used a migration detector (MD) to analyze this possibility (Li 2011). For each locus in each species, we arbitrarily picked one individual from another species as the MD. We obtained three MDs and applied three analyses for each gene because there were three other species. All these analyses indicated that migration is not responsible for these unbalanced trees (File S2).

Table 2.  Bayes factors (BFs) and results of posterior simulations over all loci and for individual loci of the Approximate Bayesian Computation analysis. The fourth column gives the number of loci with a posterior probability value less than 0.05 for at least one summary statistics. Also shown in the last column is the result from the maximum frequency of derived mutations (MFDM) test. We have indicated that the loci whose P-value was less than 5% and those loci that were departing in the posterior simulations and also had values close to 5%.
Species and modelBFPosterior simulations over lociPosterior simulations for individual lociMFDM test
J. convallium
 SNM 1 Tajima's D<0.05 6  
 PEM6.44OK6 
 BNM 6.30 OK 2 (Pgi, CC0702) Pgi (P=0.057), CC0702 (P=0.068)
J. przewalskii
 SNM 1 OK 2  
 PEM0.10OK5 
 BNM 5.00 OK 4 (Pgi, CC0822, CC2241, HemA) Pgi (P=0.032), CC1333 (P=0.038), CC0822 (P=0.041)
J. saltuaria
 SNM 1 Tajima's D<0.05 2  
 PEM19.0OK3 (Pgi, CC2196, HemA) Pgi (P=0.060)
 BNM 6.00 OK 4  
J. tibetica
 SNM 1 OK 2 (CC0702, CC2241)  
 PEM1.00OKAll 
 BNM 2.00 OK All  

ABC FOR INDIVIDUAL SPECIES

Based on the criteria for model selection described in Material and Methods, a BNM offered the best fit to the data for J. convallium and J. przewalskii, a PEM for J. saltuaria and a SNM for J. tibetica. The strongest evidence was obtained for the J. saltuaria PEM (BF = 19) (Table 2).

Posterior predictive simulations for individual loci indicated that the locus Pgi departs from the best fitting models in three of the four species and loci CC0702 and CC2241 in two of the species (Table 2). Interestingly, P values obtained from the MFDM test for the locus Pgi in these three species were less than 5% or just slightly over 5%.

ISOLATION WITH MIGRATION MODELS

As in other conifer species (Chen et al. 2010; Li et al. 2010), shared polymorphisms among the investigated juniper species are extensive, whereas fixed sites are rare (Table S5). Repeated runs of simulations by the IMa2 program led to unambiguous marginal posterior probability distributions of the demographic parameters for all six species-pair combinations (Table 3). To facilitate interpretation of the parameter estimates, we converted them to a scale of effective individuals or years based on the average mutation rate estimated across all loci (0.194 × 10−9 per site and per year). Encouragingly, the estimates of the effective population sizes for three of the juniper species (J. saltuaria, J. convallium, and J. przewalskii) were consistent across the pairwise analyses, suggesting that the IM model does capture some general feature of population histories of these species. However, we failed to obtain a stable estimate of the population size of J. tibetica across pairwise comparisons (Fig. 4, Table 3). Juniperus saltuaria has the largest estimated effective population size of the four species, around 134,000–183,000. Furthermore, this is three to five times larger than its estimated ancestral population size, indicating that the population of J. saltuaria has dramatically expanded, in accordance with the results of the ABC analysis. Juniperus convallium and J. przewalskii have effective population sizes of the same order and a little smaller than their common ancestral populations (27,700–43,600 and 45,300–51,300, respectively), hence there is evidence that both of these species have been subject to weak bottlenecks, again in agreement with the results of the ABC analysis and our previous study of J. przewalskii (Li et al. 2011).

Table 3.  Maximum-likelihood estimates (MLE) and the 95% highest posterior density (HPD) intervals of demographic parameters from pairwise IMa2 multilocus analyses.
Comparison θ1θ2θA m 1 m 2 t N 1 N 2 N A T (year)2N1m12N2m2
  1. Note, all estimates include the per gene mutation rate (μ), which is equal to the geometric mean of the mutation rate of all the loci.

  2. θ1= effective population size of the first species.

  3. θ2= effective population size of the second species.

  4. θA= effective population size of ancestral population.

  5. m 1= population migration rate from the first to the second species.

  6. m 2= population migration rate from the second to the first species.

  7. T= time since species divergence.

  8. The population migration rate: 2Nm=θ×m/2.

J. tibetica/J. convallium MLE 1.123 0.301 0.385 4.020 2.540 0.035 1.22×105 3.27×104 4.18×104 7.59×105 2.257 0.382
  HPD95Lo 0.669 0.089 0.171 0 0 0.008 7.26×104 9.65×103 1.86×104 1.74×105   
 HPD95Hi1.9991.0390.75936.42  34.46  0.1952.17×1051.13×1058.23×1044.23×106  
J. tibetica/J. saltuaria MLE 0.578 1.358 0.414 2.775 3.815 0.134 6.27×104 1.47×105 4.49×104 2.91×106 0.802 2.590
 HPD95Lo0.2900.6420.1220.3450.7850.0523.15×1046.96×1041.32×1041.13×106  
  HPD95Hi 1.110 3.146 0.834 9.445 9.765 0.298 1.20×105 3.41×105 9.05×104 6.46×106   
J. tibetica/J. przewalskii MLE 0.822 0.418 0.578 0.155 5.115 0.072 8.92×104 4.53×104 6.27×104 1.56×106 0.064 1.069
  HPD95Lo 0.426 0.218 0.250 0 1.285 0.021 4.62×104 2.36×104 2.71×104 4.56×105   
 HPD95Hi1.8220.7701.0666.1559.9950.3281.98×1058.35×1041.16×1057.12×106  
J. convallium/J. saltuaria MLE 0.255 1.691 0.357 5.485 3.045 0.133 2.77×104 1.83×105 3.87×104 2.89×106 0.699 2.575
 HPD95Lo0.1250.9590.0231.69500.0291.36×1041.04×1052.50×1036.29×105  
  HPD95Hi 0.531 1.999 0.863 9.995 8.875 0.531 5.76× 104 2.17×105 9.36×104 1.15×107   
J. convallium/J. przewalskii MLE 0.402 0.446 0.486 1.198 1.790 0.076 4.36×104 4.84×104 5.27×104 1.65×106 0.241 0.399
  HPD95Lo 0.194 0.238 0.170 0 0.222 0.029 2.10×104 2.58×104 1.84×104 6.29×105   
 HPD95Hi0.8940.7901.0103.6463.9740.2739.70×1048.57×1041.10×1055.92×106  
J. saltuaria/J. przewalskii MLE 1.238 0.473 0.408 1.805 0.825 0.151 1.34×105 5.13×104 4.43×104 3.28×106 1.117 0.195
 HPD95Lo0.5880.2280.008000.0436.38×1042.47×1048.67×1029.33×105  
  HPD95Hi 2.763 0.853 1.192 7.025 6.255 1.997 3.00×105 9.25×104 1.29×105 4.33×107   
Figure 4.

Histories for all six species pairs are represented as boxes (for sampled and ancestral populations). Horizontal lines (for splitting times) and curved arrows (for migration). Time is represented on the vertical axis in each figure, with the sampled species names given at the top of each figure at the most recent time point. For all figures, the 95% highest posterior density intervals are shown with arrows in gray for population sizes (i.e., box widths) and splitting times (dotted lines). Only those population migration rates that were found to be statistically significance using a likelihood-ratio test are shown in which case the estimated value of 2Nm is given as well as the significance level. *P <0.05, **P <0.01.

The inferred effective migration rate (2Nm) clearly indicates that the simple strict isolation model should be rejected for four of the six species pairs (Fig. 4, Table 3). Rather, evidence of gene flow between diverging populations was observed in all comparisons, although results of the log-likelihood-ratio tests were not significant for two pairwise comparisons (J. convallium vs. J. przewalskii and J. tibetica vs. J. convallium). Effective gene flow values higher than unity are often regarded as sufficiently high to prevent population differentiation due to genetic drift (Moeller et al. 2007), and most of the pairwise comparisons revealed moderate-to-high levels of effective gene flow among species. Importantly, significant gene flow seems to have occurred in both directions between J. tibetica and J. saltuaria, but otherwise it seems to have been significantly asymmetric (Figs. 4 and 5). In particular, the pairwise comparisons indicate that gene flow has occurred predominantly from J. przewalskii toward J. saltuaria, and from J. tibetica to J. przewalskii. Encouragingly, and as previously noted by Hey (2010a) in his studies on chimpanzees, the results of the full analysis of all species were broadly consistent with analyses of pairs of species. A benefit of the full analysis is that it allows the detection of gene flow among ancestral populations. In the present study, the full analysis indicated the occurrence of gene flow from J. przewalskii into the putative common ancestor of the three southwestern QTP species (Table S6).

Figure 5.

IMa2 analyses for the four species of juniper of the Qinghai–Tibetan Plateau. Juniperus tibetica, J. convallium, J. saltuaria, and J. przewalskii. See Figure 4 for detailed explanation of the meaning of symbols.

Discussion

The overall pattern of nucleotide diversity in the juniper species studied here was similar to that observed in previous studies of conifer species, with low-to-moderate levels of nucleotide diversity and extensive shared polymorphism across species (e.g., Kado et al. 2003; Bouillé and Bousquet 2005; Ma et al. 2006; Syring et al. 2007; Willyard et al. 2009; Chen et al. 2010; Li et al. 2010). Because we only sampled populations with typical morphological characteristics, estimates of nucleotide diversity per species are likely slightly underestimated. Likewise, estimates of migration will be downwardly biased but estimates of divergence time between species will be upwardly biased. However, since the same sampling scheme was applied to all four species, the general pattern should still be informative on the history of the species. Our results suggest that much is to be gained by analyzing the species both simultaneously and separately to characterize their specific population dynamics. These analyses together underscore the complex evolutionary histories of the closely related species in the biodiversity-hotspot QTP.

DEMOGRAPHY AND INTROGRESSION

In a previous study, Opgenoorth et al. (2010) analyzed cpDNA variation and geographical distribution in the J. tibetica group and in two additional Himalayan juniper species (J. indica and J. microsperma), and suggested that current populations “are remnants of a former interstadial forest that were fragmented during the last LGM and that experienced postglacial local expansions before again experiencing fragmentation and marginalization as a result of anthropogenic influence as well as desiccation.” Due to the difficulty in separating the species, and the presence of putative hybrids, these authors decided to analyze the five species as a single group. At first glance, our data indicate that this decision might seem justified as we also found no clear grouping of the species in the Structure analysis and extensive shared polymorphisms, most of which were ancestral, although some appear to reflect more recent gene flow. However, our results also suggest that it is important to consider species separately. Indeed, both the ABC and IM) analyses indicated that the four species have had different demographic histories, with evidence of population expansion in J. saltuaria, weak support for bottlenecks in J. convallium and J. przewalskii and no significant departure from the SNM in J. tibetica. It should be noted also that, although the detected bottlenecks in J. convallium and J. przewalskii could be an artifact due to ancient gene flow from other species that would create something equivalent to an ancestral population structure, gene flow from other species is unlikely to contribute to the evidence for population expansion in J. saltuaria (Peter et al. 2010). The IM analysis highlighted the importance of asymmetrical gene flow among the species and provided further support for the prediction of Currat et al. (2008) that when one species invades an area already occupied by a related species, introgression of neutral genes takes place mainly from the local species to the invader. We would thus expect expanding species to be more affected by introgression than stable species. Hence, assuming that the evidence for population expansion observed in the ABC analysis reflects a range expansion, we would expect gene flow to occur more frequently to J. saltuaria from the other species than vice versa. Accordingly, in the three comparisons involving J. saltuaria, the average gene flow was consistently found to be greater toward J. saltuaria than from it, and in two comparisons (with J. tibetica and J. przewalskii), this trend was significant (Fig. 4, Table 3). When all species were considered jointly in the IMa2 analysis, the same pattern of migration events was also observed among these juniper species (Fig. 5). It should be noted, however, that since we avoided putative hybrid populations, further studies focusing on areas with mixed populations would be needed to confirm these findings.

GENE FLOW AND SPECIATION

As in many previous IM studies of closely related species (Nosil 2008; Nadachowska and Babik 2009; Ross-Ibarra et al. 2009; Carling et al. 2010; Hey 2010a; Zheng and Ge 2010), we detected evidence of gene flow between the examined species. These findings have been used to question the prevalence of allopatric speciation (e.g., Nosil 2008). However, there are several reasons to be cautious before accepting parapatric/sympatric speciation as a new paradigm. First, recent genome-wide re-evaluation of the abundance of gene flow in Anopheles gambiae (Turner et al. 2005; Lawniczak et al. 2010) uncovered much more divergence and consequently less inferred gene flow than previous studies based on more limited data. Second, computer simulations (Becquet and Przeworski 2009) indicate that results from IM analyses, especially those based on a limited number of loci, should be interpreted carefully. In particular, an important question to address in cases where gene flow can be inferred is whether the apparent gene flow corresponds to secondary contact and admixture after a long period of isolation or whether it has been more continuous between the diverging species. For the four juniper species studied here, gene flow estimates obtained from four of the six pairwise comparisons were statistically significant. Thus, we further estimated the posterior probability density of the mean time of migration events across all loci. None of the time distributions had a clear mode, and all indicated that there have been recent migration events and, possibly, some more ancient events (Fig. S6). It should be noted, however, that gene flow was detected by testing for an excess in the variation in coalescence time among loci compared to that expected under neutral allopatric speciation, but gene flow early in the speciation process may not detectably increase the variance in coalescence time (Becquet and Przeworski 2009). Hence, there will be a bias toward recent gene flow. Nevertheless, the joint IMa2 analysis of the four species indicated that there was significant historical gene flow between J. przewalskii and the common ancestor of the three southwestern QTP species (Fig. 5), suggesting that gene flow might indeed have occurred during their initial divergence. Thus, in summary, there seems to be evidence of gene flow during the early stages of speciation, although a larger number of loci should be examined to confirm this conclusion. Furthermore, if gene flow occurred during the early stages of the speciation process, elucidation of this process will require identification of some nonrecombining areas of the genome harboring reproductive isolation factors. Detecting such factors in conifers is likely to be difficult as conifer genomes appear to be composed of small, gene-rich, highly recombining areas surrounded by large nonrecombining areas that mostly contain repetitive elements (H. Tachida, pers. comm.).

DIVERGENCE TIMES

As in previous conifer studies (Chen et al. 2010; Li et al. 2010), our IM analyses suggest that the examined species diverged very recently. The shortest divergence time we found was only 759,000 YA (between J. tibetica and J. convallium) and the longest was 3,280,000 YA (between J. przewalskii and J. saltuaria) or, assuming a generation time of 50 years, around 15,000–65,000 generations ago, respectively. These divergence times are even shorter than those observed among closely related spruce species from the QTP (Li et al. 2010) and among ponderosa pines (Willyard et al. 2007, 2009). For the latter species, Willyard et al. (2009) proposed a stem and crown divergence model with the genus originating some 15 Mya and speciation occurring some 5 Mya. A similar accelerated speciation was proposed by Mao et al. (2010) for Juniperus. Interestingly, they estimated that the most ancient split among the four species analyzed here occurred within the past 20 Mya, and other splits much more recently. Considering the high uncertainty attached to all these time estimates, especially the deeper ones (Fig. 3 in Mao et al. 2010 and Fig. 5 in the present study) and the fact that the topology of the genealogy is not the same for cpDNA and nrDNA, we can confidently conclude that the time estimates provided by Mao et al (2010) are within the same range as ours. The QTP was subject to several extensive uplifts ca. 20, 15, 8, and 3.6 Mya (Harrison et al. 1992; Shi et al. 1998; Chung et al. 1998; Guo et al. 2002; Spicer et al. 2003) and speciation could well have been a consequence of these events. The ongoing speciation observed in both spruce and junipers may thus be a consequence of the recent occurrence of these major geological events, and the ensuing deep changes in ecological conditions. Nonetheless, the fact that the same pattern of recent diversification seems to hold across different genera on different continents suggest that conifers share some common extrinsic or intrinsic features (or both) that were conducive to the observed pattern of a long period of apparent stasis and burst of speciation.

DETECTING SELECTION

Distinguishing signatures of recent selection and demographic events in genomes remains a major challenge in population genetics. In the present study, we combined two approaches based on different principles to jointly estimate demographic parameters and identify loci under selection. In posterior predictive simulations, loci under selection are detected as those departing significantly from the distribution of summary statistics obtained by coalescent simulations that are based on demographic parameters estimated in an initial ABC analysis. We therefore attempted to correct for demographic history. The rationale of the MFDM test is to avoid effects of demographic processes by focusing on the topology of the gene genealogy, which theoretically should not be affected by demographic events such as population expansion or bottlenecks (Li 2011). Encouragingly, the sets of loci detected by both methods overlapped. The first approach detected more departures than the second, which is perhaps not surprising as we confined our analysis to fairly simple demographic models. This also suggests that the method used to account for possible gene flow in the MFDM method does help to minimize the number of false positives. Our results imply that a combination of ABC analysis with posterior predictive simulations and tests similar to the MFDM test may provide an interesting way to both estimate demographic parameters and detect loci under selection. One of the loci identified by both methods was the metabolic gene encoding Pgi. Interestingly, Pgi has been identified as being under balancing selection in the Glanville fritillary butterfly (Wheat et al. 2010) and Leavenworthia (Filatov and Charlesworth 1999) and affecting fitness in other species (Wheat et al. 2010 and references therein). Indirect evidence of natural selection on Pgi in several conifer species has also been reported (e.g., Bergmann and Mejnartowicz 2000). Pgi was found to be one of the most variable of the loci examined here, and its peak variability was located around the site at which selection was detected by MFDM (Fig. S7). However, its Tajima's D value was negative, thus the gene does not seem to be under balancing selection. Interestingly, evidence of selection at the same site (Site 271) was detected by the MFDM method in two species (J. przewalskii and J. convallium), suggesting that the departure from neutrality is indeed not due to demographic factors. Since variation at Pgi has been found to be associated with responses to extreme temperatures (Rank et al. 2007) and low-oxygen environments (Riddoch 1993), it is a good candidate gene for further studies in high-altitude species. In particular, it would be interesting to test for the presence of altitudinal variation at this locus.

Associate Editor: J. Pannell

ACKNOWLEDGMENTS

The authors thank X. Tian, B. Tian, L. Chen, F. Ma, and M. Ji for collecting Juniperus seeds. This research was supported by grants from the National Natural Science Foundation of China (30725004 and 40972018), Key Innovation Project of Ministry of Education of China (707056), Key Project of International Collaboration Program by the Ministry of Science and Technology of China (2010DFB63500), the International Collaboration “111” Project to J. Liu, and the National Natural Science Foundation of China (41101058) to Z. Li. K. Lin and H. Li are supported by the Basic Research Program of China (973 project, No.2012CB316505) and the National Natural Science Foundation of China (31172073). M. Lascoux and T. Källman are part of the Evoltree Network of Excellence. M. Lascoux would like to thank the Chinese Academy of Sciences for granting him a Visiting Professorship in 2010–2011 and H. Li for his hospitality in Shanghai.

Ancillary