Aquilegia is a well-known model system in the field of evolutionary biology, but obtaining a resolved and well-supported phylogenetic reconstruction for the genus has been hindered by its recent and rapid diversification.
Here, we applied 454 next-generation sequencing to PCR amplicons of 21 of the most rapidly evolving regions of the plastome to generate c. 24 kb of sequences from each of 84 individuals from throughout the genus.
The resulting phylogeny has well-supported resolution of the main lineages of the genus, although recent diversification such as in the European taxa remains unresolved. By producing a chronogram of the whole Ranunculaceae family based on published data, we inferred calibration points for dating the Aquilegia radiation. The genus originated in the upper Miocene c. 6.9 million yr ago (Ma) in Eastern Asia, and diversification occurred c. 4.8 Ma with the split of two main clades, one colonizing North America, and the other Western Eurasia through the mountains of Central Asia. This was followed by a back-to-Asia migration, originating from the European stock using a North Asian route.
These results provide the first backbone phylogeny and spatiotemporal reconstruction of the Aquilegia radiation, and constitute a robust framework to address the adaptative nature of speciation within the group.
The genus Aquilegia (Ranunculaceae) includes c. 70 species distributed in the temperate zones of the northern hemisphere, with similar numbers of described taxa occurring in North America, Asia and Europe (Munz, 1946). In recent years, the genus has emerged as a model system for the study of plant evolution and development with extensive genetic and genomic resources (reviewed in Kramer, 2009; Kramer & Hodges, 2010), including the recent release of Aquilegia coerulea Goldsmith whole genome sequence (http://www.phytozome.net). Because of its phylogenetic position between the core eudicots and monocots, genomic analyses from the genus promise to shed light on functional changes that occurred during angiosperm diversification (Fang et al., 2010), as well as angiosperm genome structure before the whole-genome duplication at the base of the core eudicots (De Bodt et al., 2005). Furthermore, Aquilegia possesses unusual floral organs such as petaloid sepals, the staminodium, and petals with a nectar spur, which allow the study of evolutionary novelties (Kramer et al., 2007; Voelckel et al., 2010). In particular, the nectar spur has been interpreted as a ‘key innovation’ that promoted a rapid and recent species diversification within the genus (Hodges & Arnold, 1995; Hodges, 1997). In North American taxa, variation in spur color and length has been demonstrated to be adaptative for different pollinators, and the evolution of both traits is correlated with shifts in pollination syndrome and speciation (Whittall & Hodges, 2007; Hodges & Derieg, 2009).
Despite the attention that Aquilegia has received from evolutionary biologists, the available phylogeny of the entire genus is poorly resolved and currently unsuitable for generating and testing hypotheses on the evolution of the group. Phylogenetic studies based on nuclear internal transcribed spacer (ITS) sequences and a small number of cpDNA regions proved unable to resolve infrageneric relationships as a result of low nucleotide variation among taxa (Hodges & Arnold, 1994, 1995; Bastida et al., 2010). On the other hand, work utilizing amplified fragment length polymorphisms (AFLPs) has resolved a species-level phylogeny of the New World Aquilegia (Whittall & Hodges, 2007), but expanding this study to include European and Asian species would be problematic because of the anonymous nature of these markers.
Phylogenies for rapid radiations are notoriously difficult to resolve, because the short timespan between cladogenetic events allows few synapomorphies to accumulate along internodal branches. As a result, the phylogenetic signal retrieved from the data may be insufficient to overcome inconsistency caused by homoplasious changes and/or the stochastic process of coalescence underlying lineage sorting (Fishbein et al., 2001; Rannala & Yang, 2008; Whitfield & Kjer, 2008). Recently, several lines of evidence have demonstrated that using sufficiently large amounts of chloroplast sequence data is a powerful way to ameliorate poor resolution of phylogenies, such as those resulting from rapid radiations (e.g. Erixon & Oxelman, 2008; Jian et al., 2008; Parks et al., 2009, 2012; Wang et al., 2009a; Lin et al., 2010; Arakaki et al., 2011; Zhang et al., 2011; Njuguna et al., 2013). At low taxonomic levels, phylogenetic utility mostly relies on rapidly evolving, noncoding regions that accumulate structural and point mutations during population-level processes, such as those underlying species divergence (Shaw et al., 2005, 2007; Gao et al., 2010). In the absence of hybridization (Takahata & Slatkin, 1984), these markers are expected to provide a neutral phylogenetic signal largely unaffected by selective pressure (Massey et al., 2008), and the genetic properties of organellar DNA allow faster fixation of allelic variants and reciprocal monophyly among lineages compared with nuclear markers (Birky et al., 1983).
The major challenge in reconstructing cpDNA phylogenies of rapid radiations has been the need to collect a large number of characters from a large number of taxa. In the case of Aquilegia, the global distribution of the genus and the remote habitats where many species occur (Munz, 1946) make it especially demanding to collect fresh samples. Moreover, species of Aquilegia are well known to hybridize (Taylor, 1967), especially in garden conditions, making botanic garden samples suspect if they are not immediate descendants of field-collected seed. In this context, herbarium collections constitute an invaluable resource of genomic material with documented origin and identification (Pleijel et al., 2008). This resource becomes even more readily exploitable by the implementation of next-generation sequencing (NGS) technologies, which offer the additional advantage of being able to tackle the problem of character sampling by providing extremely high throughput and low cost per base (e.g. Metzker, 2010; Bybee et al., 2011; Griffin et al., 2011; Steele & Pires, 2011; Steele et al., 2012; Straub et al., 2012). In fact, the degraded DNA that often characterizes herbarium samples (Staats et al., 2011) can be sequenced very efficiently with current NGS platforms characterized by short reads.
In this study, we applied 454 technology to obtain a c. 24 kb data set from the chloroplast genome of a large taxon sampling of Aquilegia, including both fresh and herbarium material. With this work, we aimed to test whether and to what degree the use of a large portion of the most rapidly evolving intergenic regions of the chloroplast enabled the reconstruction of a resolved and well-supported phylogeny of Aquilegia; and to use the unprecedented resolution of the main lineages of Aquilegia to shed light on the evolutionary mode and tempo of the early diversification of the genus in order to support further genetic and evolutionary studies in this model system.
Materials and Methods
We sampled 84 individuals from 62 recognized species and subspecific entities of Aquilegia using only field-collected leaf material. Wild-collected herbarium specimens provided leaf tissue for all American and Asian taxa, and integrated sampling of the European accessions (Supporting Information, Table S1). This represents c. 80% of currently recognized taxa, including a nearly complete sampling of the Eurasian diversity and an extensive sampling of North American taxa, for which relationships have already been investigated (Whittall & Hodges, 2007). We included multiple accessions from separate populations for some of the widespread European taxa (i.e. Aquilegia alpina, Aquilegia atrata, Aquilegia ottonis, Aquilegia vulgaris), as well as a large sampling of two case studies to investigate genetic structure within and among taxonomic units, that is, Aquilegia bertolonii s.l. (recently recircumscribed into A. bertolonii s. str., Aquilegia reuterii and Aquilegia iulia; Pignatti, 2013) and Aquilegia thalictrifolia. For outgroup rooting, Semiaquilegia, Isopyrum, Paropyrum, Paraquilegia and Thalictrum were chosen following the results by Wang et al. (2009b).
Generation of NGS data and assembly quality control
Polymerase chain reaction amplification was performed on 20 regions, which contained 21 intergenic spacers and three introns from the single-copy portion of the chloroplast genome (Shaw et al., 2005, 2007). We also sequenced a portion of the matK coding sequence. A set of 84 primers was selected for amplification, 47 derived from previous studies, and 37 newly designed to optimize amplicon length for 454 sequencing. Amplification was carried out according to the ‘rpl16’ program of Shaw et al. (2005). A complete list of primer sequences, annealing temperatures and amplicon lengths is presented in Table S2.
Sequencing of the PCR products was performed by 454 pyrosequencing. Briefly, the PCR products from each sample were divided in two pools, one with amplicons < 600 bp (short pool, SP) and the other with amplicons > 600 bp (long pool, LP). In total, 178 equimolar pools (89 SPs and 89 LPs) were generated. Each pool was used for library preparation for the GS FLX Titanium instrument (Roche 454 Life Science) according to Meyer et al. (2008). Modifications to the original protocol and further details are provided in Supporting Information, Methods S1.
Assemblies were produced using the GS Amplicon Variant Analyzer (AVA) application version 2.3 (Roche; Methods S2), and a custom Perl script was used to calculate the per-site depth of coverage (dC), as well the median and mean dC across each assembly (Figs 1, S2, S3, Methods S2).
Sequences were aligned using MUSCLE (Edgar, 2004) implemented in Geneious v5.3 (Drummond et al., 2011) with default settings. Alignment for indel-rich regions such as monomeric stretches was adjusted by manual editing. The regions trnT(GGU)–psbD and trnC(GCA)–rpoB included, respectively, two and one inversions that were corrected and aligned. All sequences have been deposited in the Genbank database under accession numbers KC288536 – KC290399.
To account for various error sources that could lead to the mis-identification of variants and affect the reliability of the data used for the phylogenetic and dating analyses, we implemented a statistical framework to test the probability of a consensus variant mis-calling, , based on the frequency of the alternative bases on the assembly (Methods S2). This approach has the additional advantage of reducing type I and II errors associated with the choice of subjective thresholds (e.g. minimum dC or minimum frequency of a base in the assembly).
Potentially informative characters (PICs) and percentage of variability (PV) for each of the 21 regions were qualitatively assessed as described by Shaw et al. (2007). The nucleotide data matrix used for all phylogenetic analyses was obtained by concatenating all sequenced regions. Moreover, an indel matrix was created following the ‘simple indel coding’ method (Simmons & Ochoterena, 2000) with Gapcode (www.reelab.net), and including scoring of inversions. Length variation in mononucleotide repeats was not scored, following recommendations of Kelchner (2000) and because of high error rates in pyrosequencing of mononucleotide sequence stretches. Aligned matrixes with and without indel coding are available at Treebase ref. 13716.
Maximum parsimony (MP) searches relied on PAUP 4.0b10 (Swofford, 2002). A heuristic search was carried out using unweighted characters with 1000 replicates of random addition sequence and 10 trees held at each step, TBR branch swapping on best trees only, and 100 trees retained at each addition-sequence replicate. Bootstrap proportions (BPs) were estimated by 10 000 replicates of heuristic search, with 10 random addition sequence per replicate and one tree held at each step, TBR branch swapping and MULTREES setting ‘off’ (DeBry & Olmstead, 2000).
Subsequent phylogenetic analyses were run under partition-specific models, one for the matK gene, and the other for all noncoding regions. The best-fit models of substitutions selected by ModelTest 3.7 (Posada & Crandall, 1998) under the Akaike information criterion (Posada & Buckley, 2004) were the TVM + Γ for matK, and the TVM + Γ + I for the combined noncoding regions.
Maximum likelihood (ML) searches were carried out using RAxML v7.2.8 (Stamatakis, 2006). Tree searches used the GTRGAMMA model starting from 100 parsimony trees. Bootstrap proportions were assessed using the GTRCAT approximation over 10 000 replicates and values were mapped on the best-scoring ML tree. For visualization of competing phylogenetic signal, a bipartition network based on the ML bootstrap trees was constructed using SplitsTree 4 (Huson & Bryant, 2006).
Bayesian inference was carried out in MrBayes v3.1.2 (MB; Ronquist & Huelsenbeck, 2003). The GTR + Γ model was implemented for the matK partition, and the GTR + Γ + I for the combined noncoding regions. Two simultaneous analyses with eight Markov chain Monte Carlo (MCMC) chains with incremental heating of 0.1 were run for 2 × 107 generations and sampled every 1000 generations. Assessment of convergence of the MCMC relied on the effective sampling size criterion for each parameter as implemented in Tracer v1.4 (Rambaut & Drummond, 2007), as well as on the diagnostic visualization of cumulative posterior probabilities (PPs) and among-run variability of split frequencies in AWTY (Nylander et al., 2008b). The first 4000 sampled trees were discarded as burn-in. MrBayes analyses were also conducted on the data set including the indel matrix, following implementation of the binary model for this data partition.
Contrary to expectations based on geographic distribution of taxa, our results showed that neither the European species nor representatives from Central and North Asia formed monophyletic groups. We ran two additional ML analyses using RAxML, each constrained to recover one of these clades, and tested specific tree topologies using the Shimoidaira–Hasegawa test (SH; Shimodaira & Hasegawa, 1999) and the approximately unbiased test (AU; Shimodaira, 2002) in Consel (Shimodaira & Hasegawa, 2001) to determine significance at the 0.05 level.
To dissect haplotype diversity among the European species, genetic structure analyses were performed using PCO-MC (Reeves & Richards, 2009). Genetic distances were calculated in PAUP 4.0b10 with the best-fit model parameters for this data set as assessed in ModelTest, and the clustering procedure followed the authors' recommendations (http://lamar.colostate.edu/~reevesp/PCOMC/PCOMC.html).
Inference of calibration points and dating of the Aquilegia divergence
As there is no reliable fossil evidence available for Aquilegia and its closely related genera (Hodges & Arnold, 1994; Bastida et al., 2010), we inferred calibration points by estimating divergence times for the main clades of the Ranunculaceae based on sequence data of rbcL, matK, and 26S loci produced by Wang et al. (2009b). The aligned matrix is available at Treebase ref. 13716. The best-fit model of substitutions GTR + Γ + I was selected for rbcL and 26S, and the TVM + Γ + I for matK. ML and MB analyses were conducted on a partitioned data set as described earlier. Analyses conducted in PAML 4.4d (Yang, 2007) rejected the strict clock model (P <0.0001), and thus relative branching times were estimated using BEAST v1.6 (Drummond & Rambaut, 2007). We used partition-specific models, with a randomly generated starting tree, a speciation model following a Yule process as tree prior and uncorrelated lognormal relaxed molecular clock. We also used three calibration points to obtain absolute divergence times (see next paragraph). Five independent analyses were run for 108 generations, sampling every 1000th generation, and with burn-in of 107. Effective sample size of the combined runs was estimated to be well above 500 for all estimated parameters and a maximum clade credibility tree was computed using TreeAnnotator.
We chose to calibrate the age of the Ranunculaceae using the values reported in the work by Anderson et al. (2005), as this includes a taxon sampling considerably larger than other studies yielding divergence times within the Ranunculales (Wikstrom et al., 2001; Bell et al., 2010). However, to account for the more recent results obtained with improved dating methods, we set a normal distribution allowing for a 97.5% credible interval (CI) encompassing more recent age estimations (i.e. 66 Ma (Bell et al., 2010); mean = 73 Ma, SD = 4). The unique seeds of Eocaltha zoophila confidently place this taxon on the lineage leading to the extant aquatic genus Caltha (Rodriguez-de la Rosa et al., 1998), and we therefore used the Maastrichtian/Campanian boundary (i.e. 70 Ma) as the lower bound of the split of the genus. Additionally, reliable records of achenes of Myosurus have been dated with confidence from the Oligocene (Mai & Walter, 1978), and thus we set an exponential distribution with an offset at the Oligocene/Miocene boundary for the split of Myosurus (i.e. 23 Ma, mean = 1).
Dating analyses for Aquilegia were conducted on the partitioned nucleotide data set as described earlier. BEAST search parameters included 12.8 × 108 generations, sampling every 8000th generation and burn-in of 12.8 × 107. We chose two secondary calibration points inferred from the Ranunculaceae chronogram, with normal distributions centered at the mean age and CIs encompassing the respective highest posterior densities (HPD; see Results): one applied to the root (crown age of subfamily Thalictroideae; mean = 26.16 Ma, SD = 2.88), and the other to the stem age of Aquilegia (mean = 5.6 Ma, SD = 1.18).
Operational areas for biogeographic analyses were defined as geographic ranges that reflected the main centers of diversity of Aquilegia: (A) Eastern Asia, including eastern Siberia, Manchuria, Korea and Japan; (B) North America; (C) Central China, including regions from the province of Gansu southwards to Yunnan and eastwards to the borders with Manchuria; (D) Central Asia, including the Himalayas and Tien Shan westwards to the Caucasus; (E) North Asia, including regions spanning from southwest Siberia and the Altai mountains eastwards across Mongolia to the borders with Manchuria, and northwards to the Arctic circle; and (F) Europe. Distribution data were compiled considering the main distribution ranges for each species as inferred from Munz (1946) and floristic treatments, including Flora of China (www.efloras.org), Flora Iranica (Rechinger, 1992), Flora of the USSR (Bulavkina, 1937), and Flora Europaea (Akeroyd, 1993).
Biogeographic reconstructions were performed using three methods. Fitch optimization as implemented in Mesquite v. 2.75 (Maddison & Maddison, 2011) and dispersal-vicariance analyses (Ronquist, 1997) as implemented in DIVA v.1.2 (Ronquist, 2001) are parsimony-based methods providing a dispersalist vs a vicariancist approach, respectively. To account for phylogenetic uncertainty in biogeographic reconstructions, we used the Bayes-DIVA approach (Nylander et al., 2008a) and Mesquite applied to 1000 trees randomly sampled from the stationary distribution of the MB MCMC analysis. Frequencies of ancestral areas for the major clades were computed for both methods. As a third approach, we applied the ML method based on the dispersal–extinction–cladogenesis model (DEC; Ree et al., 2005) as implemented in Lagrange (Ree & Smith, 2008) to determine the most likely reconstruction of biogeographic evolution. Lagrange was applied on the BEAST chronogram, with ancestral ranges constrained to two areas, which reflects the maximum number of areas of the species used. This strategy was adopted to limit the uncertainty of ancestral areas of the earliest nodes compared with the DIVA analysis, and thus to provide a further hypothesis of biogeographic evolution.
454 sequences alignments and quality
The analyses conducted on the 454 assembled reads revealed that <1% of the amplicons were not represented or were covered for < 95% of their length, mainly as a result of suboptimal amplifications for the outgroups. The 454 sequencing generated a total of 497 711 reads with mean length of 590.2 bp (SD = 59.4). The subsequent assembly phase yielded a total of 1739 assemblies, each containing an average of 285.6 reads (SD = 308.5; median = 203) with mean length of 381.3 bp (SD = 122.5). Mean dC across assemblies was 106.5 reads per site (SD = 115.9; median = 74.2; Fig. S2). Despite efforts to normalize the quantity of starting amplicon DNA, there was a fair degree of variability in dC among the assemblies (Methods S2, Fig. S3), and both accession and region strongly contributed to dC median and mean values (ANOVA, all P < 10−15). Depending on the region and type-specific error rate, fM(region; t), and on dC, the probability of consensus miscalling, Pfalse, of the variants at the informative sites ranged from 5 × 10−6 to ≪ 1 × 10−100, indicating very robust data for the informative sites (Table S3, Fig. S4).
Phylogeny of Aquilegia
The 21 plastid regions varied in length from 504 (matK) to 2683 (rps4-trnF(GAA)) aligned nucleotides and varied in their potential phylogenetic information (Fig. 2a). Consistent with previous studies (Mort et al., 2007; Shaw et al., 2007), PICs proved to be a reliable measure of hierarchical information content, as their number was significantly correlated with parsimony-informative characters (R2 = 0.81, P <0.001, ρ = 0.9). The coefficient of determination between the number of PICs and the region length indicates that the latter accounted for only 13% (P <0.05, ρ = 0.48) of variation among PICs, thus suggesting that most of the variability is dependent on differences in mutation rates.
The combined nucleotide data matrix comprised 24 534 characters, of which 447 were variable within the ingroup, including 172 parsimony informative sites. The indel matrix provided an additional 90 characters, of which 73 were parsimony informative. All three phylogenetic reconstruction methods yielded consistent tree topology using the nucleotide data set with no contradiction of supported groups (i.e. BP > 50%, PP > 0.95; Figs 3, S5). We found strong support for the main lineages, which provides robust evidence for the early divergence events of Aquilegia radiation, as also confirmed by the tree-like structure of the main edges in the bipartition network (Fig. S5). Analyses including indel coding resulted in identical topologies, differing only in the support values for some of the nodes in the MP framework (Fig. S6). Notable differences (i.e. > 5% BP) do not show a general trend across the tree, but the effect of indels results in an increase/decrease of support values depending on the node. Similarly, the degree of homoplasy as indicated by the retention index (RI) of the data sets with and without indel coding does not differ (RI = 0.9/0.89 respectively). In the following, for ease of description we will refer to the operational areas described for the biogeographic analyses.
Relationships among taxa show two large and highly supported clades: one groups the Eastern Asian species Aquilegia amurensis, Aquilegia japonica and Aquilegia parviflora with the North American taxa, while the other includes all other Eurasian representatives. In the former clade, the Eastern Asian species are the sister group (Group I) to the North American taxa (Group II). Within the large clade of Eurasian taxa, a first split separates representatives of Central China (Group III) from all other species. Aquilegia oxysepala was found to be a polyphyletic taxon, as A. o. var. oxysepala distributed in Eastern Asia was resolved in a separate clade. Within the sister group to Group III, a split separates a clade comprising Central Asian representatives (Group IV) from all remaining species. Here, a first split separates the Caucasian Aquilegia olympica from the remaining taxa, and among these, a following split separates a clade including the Italian alpine A. bertolonii and Aquilegia einseleana from the rest. Notably, the short node separating A. olympica either receives decreased support or collapses when only one accession of A. bertolonii is included in the analyses (Fig. S7), indicating that the retrieved topology is dependent on the different phylogenetic signal contained in the sequences of this taxon. The sister group to A. einseleana and A. bertolonii is a clade in which a group including North and Eastern Asian taxa (Group V) is sister to the remainder of the European representatives (Group VI). The hypothesis of a monophyletic origin of Group V with the geographically close representatives of Central Asia (i.e. Group IV) was rejected by both SH and AU tests (data not shown).
Regarding the remaining large European clade, little resolution was achieved, as lack of phylogenetic information left most representatives unresolved in a basal polytomy. Notably, the Corso-Sardinian endemics (i.e. Aquilegia barbaricina, Aquilegia bernardii, Aquilegia litardierei, Aquilegia nugorensis and Aquilegia nuragica) are recognized as monophyletic. Analyses conducted on a data set including only one randomly chosen accession for each European taxon yielded consistent resolution and nodal support for the main nodes of the tree compared with the complete analysis (Fig. S7). Enforcing monophyly of the European representatives produced a tree rejected by the AU test, but not by the more conservative SH test (data not shown).
PCO-MC analyses (Fig. 4) show that most of the European accessions fall within a single cluster, except for A. einseleana, A. barbaricina, A. bertolonii, and A. reuterii. Aquilegia bertolonii and A. reuterii had multiple accessions and these formed distinct clusters. All retrieved clusters had high stability values, suggesting that they likely represent genetically distinct groups; however, clusters for A. bertolonii and A. reuterii lacked statistical significance, perhaps as a result of small sample size (Reeves & Richards, 2011). A PCO-MC analysis including the closest Asian accessions also identified Group VI as a stable cluster (Fig. S8).
Inference of calibration points from the Ranunculaceae data set
The ML and MB trees (Fig. S9) are consistent with the results by Wang et al. (2009b), with no conflict among supported nodes (i.e. BP > 50%; PP > 0.95). The chronogram obtained by the BEAST analysis (Supporting Information Fig. S10) shows the crown age of the family at 87.3 Ma (HPD = 82.4–92.1 Ma), consistent with the results obtained by Anderson et al. (2005) using nonparametric rate smoothing analyses. The origin of subfamily Thalictroideae sensu Raf. (Wang et al., 2009b) is estimated at 26.2 Ma (HPD = 20.3–32.3 Ma), consistent with the results by Bastida et al. (2010). Within the subfamily, the stem age of Thalictrum is estimated at 17.4 Ma (HPD = 11.3–24 Ma). This is in line with the fossil record of Thalictrum-type pollen from the Miocene (Muller, 1981), suggesting that divergence date estimates within the subfamily used as calibration points for the Aquilegia data set will be reliable.
Age estimates and ancestral area reconstruction of Aquilegia
Origin and diversification of Aquilegia (Figs 5, S11) are estimated in the upper Miocene (6.85 Ma; HPD = 5.04–8.69 Ma) and lower Pliocene (4.76 Ma; HPD = 3.09–6.47 Ma), respectively. Reconstructions by unconstrained Bayes-DIVA are ambiguous about the origin of Aquilegia, as the ancestral area could include all regions where the genus is present today except for North Asia (E) and Europe (F); conversely, Fitch parsimony supports an ancestor of the genus distributed in Eastern Asia (A). Lagrange analyses with constrained ranges indicate an ancestral distribution of Aquilegia spanning Eastern Asia and China (C), followed by vicariance between these two areas coinciding with the diversification of the genus. Divergence of the New World species occurred during the middle Pliocene (3.84 Ma; HPD = 2.3–5.5 Ma), either from an Eastern Asian ancestor according to Fitch reconstruction or following vicariance of an Eastern Asian-North American (B) distribution according to Bayes-DIVA and Lagrange. Origin of the large Eurasian clade is ambiguous in Fitch reconstruction, whereas it is indicated as including Central China and Central Asia (D) by Bayes-DIVA and Lagrange. Diversification within the clade is estimated in the middle Pliocene (3.88 Ma; HPD = 2.42–5.44 Ma), followed by the colonization of Europe apparently using the Caucasus (A. olympica) as a migration route. The estimated time of the occurrence of Aquilegia in Europe differs depending on the reconstruction method. Fitch parsimony suggests dispersal in the upper Pliocene (2.84 Ma; HPD = 1.66–4.13 Ma), whereas Bayes-DIVA and Lagrange indicate that a widespread distribution spanning Europe and Central Asia existed at an earlier age (3.37 Ma; HPD = 2.08–4.79). Similarly, colonization of North Asia from European ancestors is suggested to have occurred during the early/middle Pleistocene according to a vicariance/dispersal approach (2.51 Ma; HPD = 1.45–3.68/2.12 Ma; HPD = 1.18–3.13 Ma), followed by a return to Eastern Asia (A. oxysepala var. oxysepala and Aquilegia buergeriana) using North Asia as migration route.
Phylogeny of Aquilegia
In this study, we sequenced c. 24 kb of cpDNA, constituting approximately half of the noncoding single-copy portion of the plastome (Shaw et al., 2007), plus partial matK sequence from 62 taxa of Aquilegia. Compared with previous studies (Hodges & Arnold, 1994, 1995; Bastida et al., 2010), our results present a substantial increase in the proportion of highly supported nodes, indicating that previous studies suffered from insufficient data (i.e. soft polytomies). The increase in resolution is perhaps not surprising in the case of Aquilegia, as, in rapid radiations, sampling error is likely to overwhelm the phylogenetic signal with a small number of characters (Jeffory et al., 2006), and increasing sequence length from one locus is expected to improve accuracy of the phylogenetic reconstruction (Yang, 2002; Wortley et al., 2005). Hence, our data demonstrate the plastome to be both sufficiently complex and large to capture the phylogenetic signal underlying closely spaced cladogenetic events, and support the potential of using large amounts of cpDNA sequence to tackle the difficult problem of rapid radiations.
In contrast to the clear resolution at deeper branches, our data resulted in poorer resolution at shallower phylogenetic depths. A notable example is the lack of resolution for the large clade including the majority of the European accessions, where we retrieved a large number of short to zero-length branches, as well as conflicting phylogenetic signal, which may indicate an increased rate of speciation associated with very rapid cladogenesis. Achieving a reliable resolution for the shallower nodes of the Aquilegia phylogeny is a major goal for future investigations. We have already sampled a very large portion of the noncoding cpDNA where variation is most likely to accumulate, but c. 80 kb of the single copy regions of the cpDNA remain to be sequenced (D.I. Huang & S.A. Hodges, pers. comm.). Although variation may be lower in coding regions, additional sequences from cpDNA not surveyed in this study may resolve some of the recalcitrant clades (Wortley et al., 2005; Parks et al., 2009, 2012). We suggest that recently developed techniques based on sequencing of total genomic DNA could be the preferred method for this purpose, as they provide straightforward means to obtain nearly complete plastomes from genomic DNA, from both fresh and herbarium specimens (Straub et al., 2012). A further limitation of our approach is that it provides evidence only from one of the cell genomes. Several processes, such as incomplete lineage sorting and introgression as a result of hybridization and organelle capture (e.g. Tsitrone et al., 2003; Cronn & Wendel, 2004) could in fact cause incongruence between the plastid phylogeny provided here and the organismal phylogeny (Wendel & Doyle, 1998). It is likely that these factors took part in shaping the evolution of Aquilegia (see later), thus our phylogenetic and biogeographic hypotheses could be refined once a sufficient number of nuclear sequences is combined to enable species tree reconstruction.
In light of the biogeographic affinities of the major clades (see next section), long-distance dispersal seems to be a rare phenomenon, consistent with the limited capability of seed dispersal of the genus (Hodges & Arnold, 1994; Strand et al., 1996). Instead, habitat partitioning appears to be the predominant pattern underlying lineage divergence, and more geographically restricted allopatric speciation has likely played a primary role in shaping the genus radiation, especially within the European group (Bastida et al., 2010). When the high interfertility of Aquilegia spp. (Prazmo, 1965; Taylor, 1967) is considered in the context of the Quaternary events of climate change and associated plant migration, the chances for hybridization and introgression among incipient lineages of the radiation appear rampant, and gene tree/species tree incongruence is a likely scenario that needs to be considered in further studies. Moreover, very recent origins with very shallow nodes may further challenge reliable phylogenetic reconstruction, as incomplete lineage sorting is known to affect more recent species groups, while fixation of lineage-specific alleles should become more likely with increasing time (Maddison, 1997; Wendel & Doyle, 1998). Recent improvements of methods for inferring species tree reconstruction offer a powerful approach to further delve into the evolutionary history of Aquilegia, and the increasing possibility of obtaining genome-scale data may allow the most rapidly diverging clades to be resolved. In this context, the availability of the A. coerulea Goldsmith genomic sequence offers the possibility to facilitate the application of enrichment-based (Faircloth et al., 2012; Lemmon et al., 2012) as well as whole-genome resequencing approaches already in use in other model species (e.g. Lam et al., 2010; Turner et al., 2010).
Spatiotemporal reconstruction of the Aquilegia radiation
Origin of the genus and migration to North America
Our results indicate an Oligocene origin for the most recent common ancestor (MRCA) of the clade including Aquilegia and Semiaquilegia, whereas the origin of Aquilegia is estimated to have occurred in the upper Miocene (c. 6.85 Ma). These results indicate a more recent origin of Aquilegia compared with Grant's hypothesis (Grant, 1993, 1994), which suggested divergence of the genus as part of the Arcto-Tertiary flora that covered the northern continents before the gradual cooling of the mid-Miocene. The earliest diversification within the genus occurred in the lower Pliocene (c. 4.76 Ma), with the separation of a clade that gave rise to the largest part of current Eurasian taxa from another including all North American species and representatives from Eastern Asia. The ancestral range of the MRCA of the genus is suggested to be in Eastern Asia (A) according to Fitch parsimony, and spanning to include Central China (C) according to Lagrange. In both scenarios, migration to North America (B; Group II) occurred between 3.84 and 2.99 Ma from a common lineage with the Far East representatives A. amurensis, A. parviflora and A. japonica (A; Group I). Ambiguous results in the DIVA analysis are a result of the uncertainty of the reconstruction at the root of the Aquilegia divergence, and in fact, if the maximum number of areas is constrained to two, results consistent with the Lagrange analysis are recovered. Neither of these scenarios can be excluded based on paleogeographic reconstructions, as following the first opening of the Bering strait (c. 5.32 Ma; Gladenkov et al., 2002), short-term land connections between Eurasia and North America existed at different times during the Pliocene (Miller et al., 2005). Within the North American group, a number of strongly supported clades are consistent with the AFLP phylogeny by Whittall & Hodges (2007). In particular, Aquilegia jonesii is sister to the rest of the species, and hawkmoth-pollinated Aquilegia pubescens is sister to the hummingbird-pollinated Aquilegia exima, Aquilegia flavescens and Aquilegia formosa; moreover, the sister relationships between Aquilegia brevistyla and Aquilegia canadensis is recovered. The morphological similarities between A. jonesii and the Eurasian taxa are consistent with its phylogenetic position in the North American radiation, and with Grant's view (1993) on the ancestral condition of melittophily. Grant (1994) hypothesized that the diversification of North American taxa associated with shifts in pollination syndromes evolved in the mid-Pliocene (c. 4 Ma), which caused the sorting of hummingbird-pollinated taxa from the ancestral flora of North America. However, according to our results, the crown age of the North American clade is placed in the upper Miocene (c. 2.99 Ma), when the bee-pollinated A. jonesii separated as sister to the rest of the taxa, followed only in the lower Pleistocene (c. 2.27 Ma) by the evolution of hummingbird pollination (cf. Whittall & Hodges, 2007). These results are consistent with the coarser estimation for the evolution of this pollination syndrome obtained using a distribution of evolutionary rates of ITS sequences across angiosperms (Hodges et al., 2004; Kay et al., 2006). Interestingly, our data also support the previous finding of two origins of hummingbird pollination (Whittall & Hodges, 2007), once in A. canadensis and once for all other hummingbird-pollinated taxa.
The Eurasian clade
Within the Eurasian clade, a first split separates a group including the species that occur in forest and meadows of Central China (C; Group III), and whose morphological similarity is reflected in their intricate taxonomical history (Munz, 1946). The following split separates a clade including the species that occur in the Himalayas, Tien Shan and neighboring regions of Central Asia (D; Group IV). DIVA and Lagrange analyses identified the ancestral area of the Eurasian clade in a continuous distribution between these two regions, before their separation by a vicariance event. This scenario is consistent with lineage separation following the uplift of the Qinghai-Tibetan Plateau (QTP) since the mid-Miocene, which constituted a primary cause of vicariant differentiation in areas around this extensive mountain range (Qiu et al., 2011 and references therein). Consistently, the timescale of the divergence between the Chinese and Central Asian species corresponds roughly to the time of the major uplifting of the QTP during the Plio-/Pleistocene (i.e. c. 3.5–1.6 Ma; Liu et al., 2002).
The sister group to the Central Asian species includes the lineages that led to the colonization of the Caucasus, Europe and Northern Asia. The crown age of this group is estimated in the upper Pliocene, when the European lineage (F) diverged from the Central Asian group. The vast majority of the European species of Aquilegia occur in mountainous areas that refer to the Alpine system sensu Ozenda (2009), along with few representatives of the continental vegetation (i.e. A. vulgaris and A. atrata). Strong biogeographic connections between the Alpine system and the Himalayas has been emphasized by different authors (e.g. Weber, 2003; Kadereit et al., 2008; Ozenda, 2009), and it is accepted that the continuity of mountain ranges between the two regions through the Caucasus (A. olympica) has favored the incursion in Europe of several elements of Chinese origin (Comes & Kadereit, 2003; Kadereit et al., 2008; Ozenda, 2009). Following establishment of Aquilegia in Europe, the divergence of two lineages may be explained by survival in different refugia during the paleogeographical and climatic events of the early Pleistocene (Médail & Diadema, 2009). The first of these lineages is currently represented by the narrow alpine A. bertolonii and A. einseleana, while the second includes the lineage leading to the colonization of North Asia (Group V). Although the retrieved BPs warrant further confirmation for this result, the alternative tree topology with a constrained monophyly of the European taxa, which would move Group V from their nested position within the European clades, was rejected by the AU test – although, even in this hypothesis, Group V is placed in a nested position between the Caucasus and Europe (data not shown), suggesting that the occurrence of the North Asian taxa would still follow migration from a western distribution.
Monophyly of the Central Asian species and Group V was rejected by both SK and AU tests, thus leaving the shared ancestry between the North Asian and at least part of the European taxa as the only plausible scenario supported by our data. A strong correlation between the flora of the Altai (Aquilegia glandulosa, Aquilegia sibirica, and Aquilegia viridiflora) and that of the Alpine system has been emphasized by previous authors (Neuffer et al., 2003; Weber, 2003; Kadereit et al., 2008; Ozenda, 2009), and indeed, c. 22% of the present species of the Swiss Alps show an arctic-alpine-altaic distribution pattern (Neuffer et al., 2003). Both the Alpine system and the Altai mountains show evident biogeographic links to Central Europe (Kuneš et al., 2008; Ozenda, 2009), and evidence of a once continuous Euro-Siberian vegetation could be identified in the current distribution of lowland A. vulgaris and A. sibirica. The separation of the distribution ranges of these morphologically similar taxa coincides with a floristic discontinuity between eastern Europe and western Siberia, probably linked to glacier advance during the early Pleistocene (Franzke et al., 2004). Within Group V, a return to the Far East and Japan (A. oxysepala var. oxysepala and A. buergeriana) occurred from a shared ancestor with A. viridiflora. A vicariant distribution could have established since the mid-Pleistocene (c. 1.19 Ma) concomitantly with the first Pleistocene connection of Japan to the Asian continent (Dobson & Kawamura, 1998).
The European complex
The polytomy at the base of the large European clade reflects a complex evolutionary situation that cannot be resolved by our data. Possibly, recent diversification partly accounts for the lack of phylogenetic signal. In addition, climatic oscillations during the Quaternary period may have produced multiple instances of differentiation as a result of range fragmentation followed by secondary contact and introgression (Petit et al., 2003). Such processes would slow the differentiation of taxa. As such, among species with multiple accessions, only A. reuterii, which occurs in the glacial refugium of the Maritime Alps (Grassi et al., 2009), was monophyletic and could be recognized according to the genotypic clustering criterion (Mallet, 1995). Similarly, with regard to interspecific relationships, the only clade that received significant support included the Corso-Sardinian taxa, indicating that more extreme geographic isolation played a fundamental role for the genetic divergence of these taxa from the European stock.
Overall, the complexity uncovered for the European radiation provides an exciting opportunity to study speciation in the group, because it suggests a very recent and active process of diversification and thus the development of the properties that characterize lineage divergence (De Queiroz, 2007).
We thank M. Lega, P.O. Karis, B. Surina, M. Niketić, I. Dakskobler, A. Podobnik, Z. Nikolov, C. Bonomi, F. Prosser, A. Bertolli, D. Marchetti and R. Bernardello for valuable help in collecting field samples. We are indebted to J. Klackenberg, C. Persson, D. Jeanmonod and P. Cuccuini for access to herbarium material at S, GB, G and FI, respectively. We are grateful to S. Pignatti for helpful discussion. P. Fontana is acknowledged for providing access to computational resources at Fondazione Edmund Mach. Two anonymous reviewers are acknowledged for improvement of the manuscript. This work was supported by the Autonomous Province of Trento (Italy) within the ACE-SAP project (regulation number 23, 12 June 2008, of the Servizio Università e Ricerca Scientifica).