• angiosperm phylogeny;
  • basal angiosperms;
  • diversification time of angiosperms;
  • low-copy nuclear genes;
  • malvids;
  • ortholog identification;
  • COM clade


  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information
  • Organismal phylogeny provides a crucial evolutionary framework for many studies and the angiosperm phylogeny has been greatly improved recently, largely using organellar and rDNA genes. However, low-copy protein-coding nuclear genes have not been widely used on a large scale in spite of the advantages of their biparental inheritance and vast number of choices.
  • Here, we identified 1083 highly conserved low-copy nuclear genes by genome comparison. Furthermore, we demonstrated the use of five nuclear genes in 91 angiosperms representing 46 orders (73% of orders) and three gymnosperms as outgroups for a highly resolved phylogeny.
  • These nuclear genes are easy to clone and align, and more phylogenetically informative than widely used organellar genes. The angiosperm phylogeny reconstructed using these genes was largely congruent with previous ones mainly inferred from organellar genes. Intriguingly, several new placements were uncovered for some groups, including those among the rosids, the asterids, and between the eudicots and several basal angiosperm groups.
  • These conserved universal nuclear genes have several inherent qualities enabling them to be good markers for reconstructing angiosperm phylogeny, even eukaryotic relationships, further providing new insights into the evolutionary history of angiosperms.


  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Phylogenetic relationships between organisms are the foundation of evolutionary biology and many other disciplines, such as biodiversity and biogeography, functional evolution of development and physiology, comparative genomics, and crop breeding. The characteristics for phylogenetic studies have evolved from morphology and development, to physiology and biochemistry, and more recently to protein and DNA sequences. The latter provides a nearly unlimited number of characters for comparison and has been made increasingly accessible with the rapid development of sequencing technology (Judd et al., 1999; Soltis et al., 2005). During recent decades, organellar and ribosomal RNA genes have been widely used for resolving organismal relationships. However, these markers have some limitations; for example, rDNA genes have high copy number and experience concerted evolution (Letsch & Kjer, 2011). In particular, the extent of rDNA sequence homogenization might differ both between gene regions and among different loci. These variations might increase the uncertainty of organismal phylogenies inferred from these sequences (Buckler et al., 1997).

The importance of using low-copy nuclear genes for phylogenetic analysis has long been recognized (Strand et al., 1997; Fulton et al., 2002; Sang, 2002; Mort & Crawford, 2004; Hughes et al., 2006; Whittall et al., 2006; Wu et al., 2006; Yuan et al., 2009; Duarte et al., 2010). Nuclear protein-coding genes represent the overwhelming majority of the cellular genome and are important for many diverse functions, providing markers to track organismal evolution through both male and female Mendelian inheritance. For example, protein-coding nuclear genes have been used to reconstruct phylogenetic relationships among nearly 200 fungal species, illustrating their effectiveness (James et al., 2006). Phylogenomics using many nuclear genes further demonstrated their power for molecular evolutionary analyses, especially in studies of fungal and animal relationships partly due to the availability of many large genomic and EST sequence datasets (Rokas et al., 2003; Moreau et al., 2006; Regier et al., 2010; Kocot et al., 2011; Smith et al., 2011; Struck et al., 2011). However, until now, only a few nuclear genes have been used for resolving the backbone of flowering plant phylogeny (Soltis et al., 1997; Mathews & Donoghue, 1999; Finet et al., 2010; Burleigh et al., 2011; Lee et al., 2011).

Flowering plants (angiosperms) are one of the most successful groups of organisms on earth, with c. 300 000 species, and provide humans and animals with foods, fibres, medicines and other materials (Judd et al., 1999). The angiosperm phylogeny establishes the evolutionary history and depicts the phylogenetic relationships among various species and groups, facilitating comparative analysis between model plants and crops (Soltis & Soltis, 2003). The recent angiosperm phylogeny, which revolutionized the view of angiosperm taxonomy mainly based on morphological evidence, provides strong support for the monophyly of major groups such as the eudicots, monocots and magnoliids (Bremer et al., 2002; Jansen et al., 2007; Moore et al., 2007, 2010; APG III, 2009; Soltis et al., 2011). Also, there is an increasing consensus on Amborella being the sister to all other extant flowering plants (Zanis et al., 2002; Moore et al., 2007, 2010; Soltis et al., 2011), although alternative hypotheses still remain (Goremykin et al., 2003, 2009). Nevertheless, several enigmatic relationships are still to be resolved, such as those among eudicots, monocots, magnoliids, Chloranthales and Ceratophyllales, those among Dilleniaceae, Saxifragales, Caryophyllales, rosids and asterids, and those among the major clades within rosids. It has been proposed that these relationships are difficult to resolve because of rapid radiation (Hilu et al., 2003; Moore et al., 2007, 2010; Wang et al., 2009).

Until now, reconstructions of angiosperm phylogeny have relied largely on plastid and mitochondrial genes (Chase et al., 1993; Savolainen et al., 2000; Hilu et al., 2003; Zhu et al., 2007; Qiu et al., 2010) and sometimes entire plastid genomes (Jansen et al., 2007; Moore et al., 2007, 2010), while the use of nuclear genes only has been rare (Soltis et al., 1997; Mathews & Donoghue, 1999; Finet et al., 2010; Lee et al., 2011). Widespread gene duplication events represent a major challenge in selecting effective nuclear genes as phylogenetic markers (Zhang, 2003; Kellis et al., 2004; Dehal & Boore, 2005; Soltis et al., 2009; Zhou et al., 2010); recent genomic studies have shown that all extant angiosperms have experienced polyploidization events during evolution (Jiao et al., 2011). Gene duplications, in turn, make it difficult to distinguish orthologs from paralogs (Philippe et al., 2005). Even worse, in some cases, single-copy paralogs resulting from gene duplication and subsequent lineage-specific losses could be mistaken as orthologs, contributing to incorrect inference of organismal relationships (Nei & Kumar, 2000; Koonin, 2005; Scannell et al., 2006). Another difficulty in using nuclear genes for phylogenetic analysis is their relatively complex gene structure, making them hard to clone and align with confidence. With the advance of sequencing technologies, more and more plant genomic and transcriptomic datasets are being generated, so selecting suitable low-copy nuclear genes as phylogenetic markers is becoming feasible.

In this work, to facilitate the use of nuclear genes in phylogenetic analyses in angiosperms, we compared the genomes of one moss and seven angiosperm species and identified over 1000 genes as potential phylogenetic markers. To test their usability, sequence of five genes (SMC1, SMC2, MSH1, MLH1 and MCM5) belonging to four gene families were also obtained from 91 angiosperm species of 46 orders (73% of all orders). These genes are easy to clone and align and, compared with widely used organellar genes, they are phylogenetically much more informative. The resultant phylogenetic trees, which are generally concordant with those of previous studies, suggest that these nuclear genes are excellent candidates for reconstruction of angiosperm phylogeny at both above- and below-order levels.

Materials and Methods

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Ortholog identification

In order to identify low-copy nuclear genes common to angiosperms, annotated genomes of seven angiosperm species (Arabidopsis thaliana, Populus tricocarpa, Prunus persica, Vitis vinifera, Mimulus guttatus, Oryza sativa, Sorghum bicolor), and one moss genome (Physcomitrella patens) were retrieved from Phytozome v7.0 ( These genomes were used to search for putative orthologous genes by OrthoMCL v1.4 (Li et al., 2003) with default parameters by identifying clusters with 7–9 gene sequences and at least one gene from each of the selected seven angiosperms. These low-copy nuclear genes were used to search against the moss genome by using HaMStR_v8 to determine the set of genes conserved among land plants (Ebersberger et al., 2009). To facilitate gene cloning and phylogenetic analyses, those included for further analysis encoded proteins with their length and sequence identities being no ≥ 300 amino acids (the number of amino acids is according to the A. thaliana protein) and 60%, respectively. Then, predicted gene functions of these genes were analysed in TAIR10 Gene Ontology ( (See Supporting Information Fig. S1).

Public sequence retrieval and cDNA cloning

The identified low-copy nuclear genes were statistically overrepresented for predicted functions related to DNA/RNA metabolism compared with the percentage of genes with such functions in the whole genome. In addition, extensive gene family phylogenetic studies in this laboratory and others indicated that the SMC, MCM, MSH, MLH, RFC, RDRP and RAD51 families with highly conserved functions in DNA/RNA metabolism showed long-term maintenance of low copies during eukaryotic evolution despite genome duplications (Forsburg, 2004; Lin et al., 2006, 2007; Surcel et al., 2008; Zong et al., 2009). Therefore, gene copy numbers of 28 low-copy genes belonging to these seven gene families were inspected in 15 angiosperm genomes, with the Arabidopsis genes as queries. To test whether other low-copy nuclear genes with different functions, sizes or sequence identities can also be used as phylogenetic markers, 20 randomly selected genes (Figs S1–S3) were also inspected similarly for copy numbers.

In order to test the usability of low-copy nuclear genes for resolving angiosperm phylogeny, five genes (SMC1, SMC2, MSH1, MLH1 and MCM5) of 91 angiosperm species from 46 orders and three gymnosperm species were also obtained. The sequences of 18 angiosperm species (15 species with fully sequenced genomes and three species with large EST data) and three gymnosperm species were retrieved from public databases (Table S1). In total, 107 sequences were retrieved from public databases. The five genes for the additional 73 angiosperm species were obtained using PCR amplification from cDNAs (Fig. S1); detailed procedures for cDNA cloning and sequencing are described in Methods S1, and the primers used in this study are listed in Tables S2, S3.

Phylogenetic analyses and Approximately Unbiased (AU) test

For the reconstruction of angiosperm phylogeny using five genes (SMC1, SMC2, MCM5, MSH1 and MLH1), in order to maximize sequence data for a few taxa, sequences from different species of the same genus were combined, including Chloranthus elatior & C. spicatus, Schisandra propinqua & S. sphenanthera. Protein sequences were initially aligned by using Muscle v3.6 (Edgar, 2004), then adjusted manually with Genedoc (Nicholas et al., 1997), and transformed into the nucleotide matrices with the aid of PAL2NAL (Suyama et al., 2006). Concatenated nucleotide and amino acid matrices were generated with SeaView v4.2.12 (Gouy et al., 2010).

First, five single-gene trees were generated with PhyML v3.0 (Guindon et al., 2010) using the nucleotide sequence matrices to determine the gene copy number in each species with a fully sequenced genome and to test for possible contamination. If a species had two or more copies generated by lineage-specific duplication, the gene with slowest evolutionary rate was chosen for further analyses (see Tables S3–S8, Fig. S7a–e). The evolutionary model was specified as GTR + I + Γ, which was estimated by ModelTest v3.7 (Posada & Crandall, 1998). The supporting values were estimated by using the time-saving and accurate nonparametric statistical method; that is, by SH-like approximate likelihood ratios (Guindon et al., 2010).

The concatenated 5-gene trees were reconstructed based on both nucleotide and amino acid matrices by using the maximum likelihood and Bayesian methods. The maximum likelihood method was performed with PhyML v3.0 and RAxML v7.0.4 (Stamatakis, 2006), and the Bayesian method, MrBayes v3.1.2 (Ronquist & Huelsenbeck, 2003) and PhyloBayes v3.2e (Lartillot & Philippe, 2004). For the nucleotide matrix, the most suitable evolutionary model was determined by the Akaike Information Criterion (AIC) with the aid of ModelTest v3.7. In PhyML analysis, the evolutionary model was specified as GTR + I + Γ and the supporting values were estimated by using bootstrap analysis (100 replicates). In the MrBayes inference, one cold and three incrementally heated Markov chain Monte Carlo (MCMC) chains were run simultaneously. The Markov chain Monte Carlo (MCMC) convergence in Bayesian phylogenetic inference was monitored by AWTY ( (Nylander et al., 2008). Trees were sampled per 100 generations. The first 25% trees were discarded as burnin. The remaining trees were used for generating the consensus tree. During RAxML analysis, the GTRCAT model was specified, 500 rapid bootstrap analyses were performed to infer the supporting values. In PhyloBayes analysis, two independent chains were run simultaneously until the value of maxdiff was < 0.1. For the amino acid matrix, the evolutionary models were specified as JTT, mixed, PROTGAMMAJTT, and CAT in PhyML, MrBayes, RAxML and PhyloBayes, respectively.

It is thought that the 3rd codon positions may suffer from mutation saturation and bring noise to phylogenetic analysis (Nei & Kumar, 2000). Therefore, a nucleotide matrix excluding the 3rd codon positions was generated and the phylogenetic tree was inferred with RAxML v7.0.4. and MrBayes v3.1.2, respectively. In addition, sampling – especially that including genes showing long branches – has also been considered to affect the topology. So four nucleotide matrices including different samplings were generated, and the corresponding trees were inferred by using MrBayes v3.1.2 and RAxML v7.0.4.

In order to determine statistic support for alternative relationships of major groups of angiosperms, 69 alternative topologies previously proposed in other studies, were tested against the best ML tree. First, per site log likelihoods for each topology were computed in RAxML under the GTR + Γ model, and secondly AU and WSH tests were conducted using CONSEL v0.1j (Shimodaira & Hasegawa, 2001) (details are described in Table S9).

In order to test the effect of marker gene numbers used here on resolution in the angiosperm phylogeny, trees using four (SMC1, MLH1, MSH1 and MCM5), three (MLH1, MSH1 and MCM5), and two (MLH1 and MSH1) concatenated genes were also reconstructed, using nucleotide sequences with RAxML under the same settings as mentioned above.

In order to test the usability of 20 randomly selected low-copy genes, single-gene trees were reconstructed with PhyML v3.0 based on amino acid sequences using the JTT model (Fig. S3). Furthermore, to test whether other groups of five genes can generate congruent species trees, two concatenated 5-gene trees were generated under the same settings as above using so-called ‘good genes’ (whose single-gene trees are mostly congruent with the species tree) or ‘bad genes’ (with fewer consistent nodes) (Fig. S4).

Divergence time estimation

Whether the five genes we used fitted the molecular clock hypothesis was tested by using MEGA v5.0 with the evolutionary model being GTR + I + Γ (Tamura et al., 2011). To estimate the timing of divergences, species whose missing data exceeded 50% were excluded, including all three gymnosperm species and three angiosperm species: Aristolochia fimbriata, Cabomba caroliniana and Alisma plantago-aquatica. To facilitate comparison with previous similar studies (Moore et al., 2007, 2010; Bell et al., 2010), four time constraints were used, the eudicot crown was set to a mean age of 125 Ma (million years ago); the most recent common ancestor (MRCA) of Fagales and Fabales was set to a minimum of 85 Ma; the MRCAs of Caryophyllales and of Sapindales were set to minima of 83.5 and 65 Ma, respectively. For the eudicot crown, prior was treated as fitting normal distribution with the stdev of 1 and the mean set to 125, while the recent common ancestor of Fabales and Fagales, prior was treated as fitting uniform distribution with the lower and upper being 85 and 100, respectively. For the MRCAs of Caryophyllales and of Sapindales, priors were both treated as fitting lognormal distribution, with the stdev of 1.5 and the offset set to 83.5 and 65, respectively. Tree prior was specified as Yule process. The divergence times were estimated twice independently using the BEAST v1.6.1 (Drummond & Rambaut, 2007) under the uncorrelated lognormal relaxed clock model. Five genes used in this work were presumed to evolve independently under different models as inferred by ModelTest v3.7. The chain-length of MCMC and the sampling frequency were set to 50 000 000 and 5000, respectively. Then convergence was assessed by effective sample sizes (ESS) using the Tracer v1.5 ( Finally, the divergence times were estimated by TreeAnnotator v1.6.1 with half of the trees treated as burnin and the divergence times visualized using Figtree v1.3.1 (


  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Highly conserved single-copy genes provide excellent candidate phylogenetic markers

In order to identify low-copy nuclear genes, orthogroups containing 7–9 gene sequences from seven angiosperms (at least one gene from each species) were obtained in this study; to increase the efficiency of further gene cloning from taxa without a sequenced genome, genes with coding regions of fewer than 300 aa were excluded in further analyses. Under this criterion, 1402, 1087 and 699 orthogroups were identified with 7, 8 and 9 gene sequences, respectively. After screening against the moss genome, 1030 (77.2%), 773 (71.1%), and 489 (70%) groups remained, respectively (Table S12). Further analyses indicated that the size of encoded proteins ranged from 300 to 5098 aa, with 66.5% of them no ≥ 600 aa (Fig. 1a). The average identities of each gene varied from 28.9% to 94.2% among the seven angiosperms, with the distribution fitting normal distribution (Fig. 1b). Compared with non-low-copy genes (> 9 genes in 7 taxa and at least one gene from each genome), the distributions of gene sizes and sequence identities largely overlapped (Fig. 1a,b). Among these low-copy nuclear genes, 1083 highly conserved genes with the identities > 60% were selected as the most promising candidate markers (see Table S12). GO analyses and chi-square test indicated that these low-copy genes are overrepresented in the following categories: cell organization and biogenesis, developmental processes, other cellular processes, other metabolic processes, transport, and especially DNA/RNA metabolism, but underrepresented in signal transduction and transcription factor categories (Fig. 1c).


Figure 1. Characteristics of low-copy nuclear genes. Gene sizes (a), sequence identities (b), and functional categories (c) of low-copy genes are indicated in green, while sizes, sequence identities of non-low-copy nuclear genes and functional categories of Arabidopsis genome are denoted in yellow.

Download figure to PowerPoint

It has been reported that genes related to DNA/RNA metabolism tend to be lost after gene or genome duplication and remain single-copy (Blanc & Wolfe, 2004). Therefore, such genes might be suitable candidate markers for reconstructing angiosperm phylogeny. In addition, extensive phylogenetic analyses have shown that members of SMC, MCM, MSH, MLH, RAD51 and RDRP families, which function in DNA/RNA synthesis and repair, are maintained as one copy in most species in spite of their ancient origin from before the divergence of plants and animals (Forsburg, 2004; Lin et al., 2006, 2007; Surcel et al., 2008; Zong et al., 2009). The maintenance of long-term orthology even after whole duplications in both plant and animal lineages suggests that these genes are good candidates for phylogenetic analysis. To test this idea, we inspected their copy numbers in 15 angiosperms with fully sequenced genomes and found that most species had only one copy for most genes (Fig. 2); one notable exception is Glycine max, which likely experienced two recent genome duplications (Schmutz et al., 2010). In order to test whether other low-copy nuclear genes are also suitable for angiosperm phylogeny, 20 randomly selected genes were inspected for gene copy number and most of them have only one copy in each species (Fig. S2). Single-gene trees showed that only one (AT5G52910) was completely congruent with the reference species tree, suggesting that some low-copy genes might not be suitable for phylogenetic analysis (Fig. S3).


Figure 2. Gene copy number of candidate marker genes in angiosperm species with sequenced genome. The organismal relationships modified from our results were shown on the left. The gene names are provided at the bottom, gene copy numbers are donated by different colours: green, orange and red indicating only one, two and three or more copies, respectively.

Download figure to PowerPoint

Then we used SMC1, SMC2, MCM5, MSH1 and MLH1 as representatives to study the angiosperm phylogeny and obtained their sequences from additional 73 species by degenerate PCR. The percentages of successful PCR for SMC1, SMC2, MCM5, MSH1 and MLH1 were 93.7%, 87%, 94.4%, 91.5%, and 90.1%, respectively (see Tables S4–S8). These, along with gene sequences retrieved from public databases, include in total, 92 SMC1, 94 SMC2, 89 MCM5, 89 MSH1 and 83 MLH1genes, from 91 angiosperm species belonging to 75 families in 46 representative orders (73% of orders defined by APG III), (APG III, 2009).

These five genes ranged from 654 to 3201 bp and had both highly conserved and relatively divergent regions (Figs 3a–e, S5). For instance, the amino acid sequence identities of SMC1 homologs ranged from 50% to 90% depending on the position of a sliding window, providing phylogenetically informative sites for both shallow and deep relationships (Fig. 3). Furthermore, highly conserved regions greatly facilitate amplification using universal primers, as mentioned above (Tables S4–S8), and sequence alignment, requiring the introduction of only a few gaps to obtain high quality alignments.


Figure 3. Domains and amino acid sequence conservation of SMC1 (a), SMC2 (b), MCM5 (c), MSH1 (d) and MLH1 (e). Left, percentage of sequence identities among angiosperms calculated by SWAPP with the window size and step size of 100 and 10, respectively, with the x axis being the positions of sequence alignments. Right, domains predicted by SMART were shown as rectangles, primers used in this study were marked as arrows, representative conserved and divergent regions were shown by the Weblogo.

Download figure to PowerPoint

At the same time, regions of sequence divergence can also be used for resolving relationships among relatively closely related taxa. To test their usability, we compared Arabidopsis thaliana and A. lyrata, two sibling species diverged c. 10 Ma (Beilstein et al., 2010), focusing on introns because they evolve more rapidly. For these five genes, the nucleotide sequence identities for each intron were between 32% and 96.9% and some indels were also observed, providing additional resolving power even between closely related species (Fig. S6a–e). In comparison, the sequence identity of ITS (internal transcribed spacers), the most widely used molecular marker in resolving relationships among low-rank taxonomic hierarchies, between these two species is 95%.

As a test for possible phylogenetic markers, single-gene trees were constructed for each of the five genes and they were largely consistent with well-established organismal relationships, suggesting that these genes are likely orthologous (Fig. S7a–e). Although multiple sequences were occasionally obtained in some species, they always formed adjacent terminal branches in phylogenetic trees, indicating that they resulted from recent lineage-specific duplication and should not affect phylogenetic relationships of more distantly related groups in this study. Further ModelTest analyses of these gene sequences showed that GTR + I + Γ was the fittest model for four of the five genes, with the exception of MLH1 genes, for which the best model was TVM + I + G. This result indicates that the genes used in this study may have evolved under an essentially very similar evolutionary pattern (Table S9). Further analyses suggest that these genes are phylogenetically highly informative, with their average frequency of parsimony-informative (Pi) sites being 70%, much higher than those of the widely used organellar genes, such as rbcL and atpB (52%, 54%, respectively; Table 1) (Savolainen et al., 2000).

Table 1.  Sequence and tree statistics for five selected low-copy genes
GeneLengthConstantPi charactersPi (%)CIRIRCTree lengthVariable sitesRates of change
  1. Pi, parsimony informative; CI, consistency index; RI, retention index; Rates of change, steps/variable characters.

SMC1 3222598232172.00.1760.4270.07532 266262412.30
SMC2 3171698212867.10.1980.4080.08126 208247310.60
MCM5 67520841160.90.1690.4800.081561546712.02
MLH1 109818982375.00.1950.4260.08310 14290911.16
MSH1 1854345135573.10.1800.3950.07118 818150912.47
5 genes10 0202038703870.20.1840.4160.07693 538798211.97

Strong support for most angiosperm phylogenetic relationships using five nuclear genes

In order to use these nuclear genes for phylogenetic analysis, we aligned the amino acid sequences of the SMC1, SMC2, MCM5, MSH1 and MLH1 proteins and generated the corresponding nucleotide matrices. In total, the length of the concatenated 5-gene nucleotide matrix reached 10 020 bp, with < 15% being missing data. Topologies recovered by different methods were essentially the same when using the same matrix, and c. 83% of groupings were in agreement between the 5-gene nucleotide tree (NUtree hereafter) and the corresponding amino acid tree (AAtree hereafter) when the same phylogenetic method was used (Figs 4, S10). The nodes that were different usually had very low support values and have been considered as unresolved in all previous works (Jansen et al., 2007; Moore et al., 2007). In the NUtree generated by MrBayes, 94% of nodes had 0.95 or greater posterior possibilities (PP); for the NUtree conducted by RAxML, 87.5% of nodes had 70% or higher bootstrap values (BP). After comparing five single-gene trees with the NUtree, we found that SMC1 performed best among them. Compared with the NUtree, there are 72, 65, 48, 45 and 61 congruent nodes with supporting values > 50% in SMC1, SMC2, MCM5, MLH1 and MSH1 single-gene trees, respectively (Fig. S8). The latter three genes had relatively short cloned regions in this study. Furthermore, the concatenated 4 (SMC1, MCM5, MSH1, and MLH1), 3 (MLH1, MSH1, and MCM5), and 2(MLH1 and MSH1) gene trees were reconstructed. Using the NUtree as a reference, for most nodes more genes led to higher supporting values, but not always for controversial relationships (Fig. S9).


Figure 4. A phylogram of the best ML tree conducted by RAxML (− loge L = −382 176.038) based on concatenated 5-gene nucleotide sequences. Asterisks indicate supporting values of posterior probabilities (PP) = 1 and bootstrap (BP) > 95. Numbers associated with internal branches are also supporting values (PP/BP). For backbone nodes, additional supporting values estimated using the amino acid matrix were also shown in the following order, PPs from nucleotide and amino acid matrices by MrBayes, respectively; BPs from nucleotide and amino acid matrices using RAxML, respectively. Zamia, Pinus and Picea were specified as outgroups. The scale bar indicates number of changes per site.

Download figure to PowerPoint

Our results provide full support for the monophyly of many previously defined groups, including all the orders analysed here and several larger groups: eudicots, monocots, magnoliids, Mesangiospermae and the major subgroups of eudicots (such as core eudicots, rosids and asterids), irrespective of taxon sampling (Fig. S11a–d) or 3rd codon position inclusions/exclusions (Fig. S12). Also well supported were Ranunculales, Gunneraceae and Acorales as the basalmost positions within eudicots, core eudicots and monocots, respectively. Strong support was also obtained for Amborellales, Nymphaeales and Austrobaileyales being successive sister groups of the extant angiosperms, especially in AAtrees (Figs 4, S10). Compared with the latest comprehensive phylogenetic work using 83 plastid genes from 86 species, c. 78.3% (54/69) of groupings were the same. Among the differences, 57.1% (8/14) of nodes had support of BP value of at least 70% and 21.4% (3/14) were with BP of 100% in our analyses.

The phylogeny here showed strong support for a few relationships that are different from those described by previous studies. Specifically, among rosids, all eight analyses (Figs 4, S10) and AU tests using five concatenated genes (Table S10) supported the affinity between malvids (including Brassicales, Malvales and Sapindales) and the COM clade (Celastrales, Oxalidales, Malpighiales), while the nitrogen-fixing clade of fabids (Cucurbitales, Fagales, Fabales, Rosales) were only distantly related to malvids and the COM clade; the same pattern was also recovered in all five single-gene trees (Fig. S7a–d); in addition, this placement was very robust to sampling and position inclusions/exclusions (Figs S11, S12). This placement, which was also observed in independent recent studies in spite of low supporting values or sparse sampling (Zhu et al., 2007; Finet et al., 2010; Qiu et al., 2010; Shulaev et al., 2010; Burleigh et al., 2011; Lee et al., 2011), is different from the previous grouping of fabids and the COM clade as eurosids I, and malvids as eurosids II. In addition, Ericales and Cornales together formed a sister group of the core asterids (Figs 4, S10), instead of being successive sisters to other asterids, as was reported in previous studies (APG III, 2009). All 5-gene trees, except for the one based on the matrix excluding the 3rd codon positions, also supported the placement of Caryophyllales as sister to rosids (Figs 4, S10, S12); however, it was always placed as sister to asterids previously (APG III, 2009). An additional major difference is the sister relationship of Chloranthaceae and Ceratophyllaceae as supported by almost all analyses (Fig. S10). Our results also supported a possible CCMM clade including Chloranthaceae, Ceratophyllaceae, magnoliids and monocots, in apparent divergence from results based on plastid genomes that placed Ceratophyllaceae and monocots as successive sisters of eudicots (Moore et al., 2007, 2010).

Divergence times for major groups were similar to those from previous studies

Molecular phylogeny and related sequence analysis can be used to estimate divergence times among lineages (Nei & Kumar, 2000). Similar to the patterns for rbcL, atpB, rps4 and 18S rDNA (Soltis et al., 2002), the five nuclear genes included in this study also have evolved with unequal evolutionary rates over different lineages of the tree. Nevertheless, by using a relaxed molecular clock model, divergence time estimations with these five genes yielded results (Fig. 5 and Tables 2, S11) similar to previous findings (Wikström et al., 2001; Moore et al., 2007, 2010; Bell et al., 2010; Zhang et al., 2011), except for the places where grouping patterns are different. Specifically, the divergence times of major groups of angiosperms were as follows: Angiospermae 198 Ma, Mesangiospermae 145 Ma, Monocotyledoneae 124 Ma, Magnoliidae 116 Ma, Pentapetalae 109 Ma, rosids 99 Ma and asterids 93 Ma (Fig. 5). Clearly, in addition to phylogenetic estimates, these low-copy nuclear genes can also be used for molecular clock analyses.


Figure 5. Chronogram showing angiosperm divergence times as estimated by the BEAST using concatenated five genes. Four calibrations used in this study were marked with solid circles. Diversification times were described in detail in Supporting Information Table S11.

Download figure to PowerPoint

Table 2.  Divergence times and ranges (95% HPD) for major angiosperm groups as estimated by BEAST
  1. Ma, million years ago.

Angiospermae198 (163–256)
Nymphaeales + Austrobaileyales + Mesangiospermae179 (158–211)
Austrobaileyales + Mesangiospermae161 (146–185)
Mesangiospermae145 (133–163)
Chloranthus + Ceratophyllum + Magnoliidae + monocots137 (124–154)
Chloranthus + Ceratophyllum + Magnoliidae131 (113–153)
Chloranthus + Ceratophyllum119 (90–147)
Monocotyledoneae124 (108–142)
Eudicotyledoneae126 (123–127)
Magnoliidae116 (89–144)
Gunnera + Pentapetalae112 (107–116)
Pentapetalae109 (104–114)
Asterids93 (81–103)
Rosids99 (94–103)


  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Conserved low-copy nuclear genes are excellent markers for angiosperm phylogeny

Current phylogenetic relationships among angiosperm groups have been obtained by using organellar genes, largely because of their nearly certain orthology and the ease of obtaining gene sequences. However, associated limitations hinder their usability to some extent. First of all, compared with the low-copy nuclear genes used in this study, the plastid and mitochondrial genes are so conserved that they do not provide sufficient phylogenetically informative sites to resolve middle and low rank taxonomic relationships (Clegg et al., 1994; Knoop, 2004). In addition, the sizes of organellar genomes are much smaller than those of nuclear genomes; therefore, some hard-to-resolve relationships still remain in angiosperm phylogeny even when the sequences of entire plastid genomes were used (Jansen et al., 2007; Moore et al., 2007, 2010). Another factor that prevents organellar genes from being used for all plants lies in the loss of some chloroplast genes from parasitic plants (Palmer, 1990; Keeling & Palmer, 2008), making them difficult to be used universally. Moreover, contrary to nuclear genes, organellar genes are mostly inherited uniparentally, so that only a partial evolutionary history can be traced. However, hybrid speciation has occurred frequently in nature, and only nuclear genes provide a biparental record of the history of the evolutionary process (Birky, 2001; Hansen et al., 2007; Ness et al., 2011). For these reasons, in recent years, plant systematists working on specific groups have often indicated that more nuclear genes with higher variable regions should be used to uncover the complex evolutionary history of angiosperms (Cruz-Mazo et al., 2009; Lu et al., 2010; Li et al., 2011b).

Low-copy nuclear genes have long been considered to be important for reconstructing angiosperm phylogeny (Strand et al., 1997; Sang, 2002; Small et al., 2004; Wu et al., 2006). However, due to the difficulty in identifying orthologs and obtaining gene sequences, only a few genes have been used to resolve the relationships of low-rank taxonomic groups (Baldwin, 1992; Sang et al., 1997; Galloway et al., 1998; Zhu & Ge, 2005; Yuan et al., 2009; Ness et al., 2011). Among them, ITS has been the most widely used marker: a recent search indicated that Baldwin’s paper, which firstly proposed ITS as phylogenetic marker, has been cited ∼1300 times (Baldwin et al., 1995). However, as we discussed above, ITS is a part of the rRNA genes that is subject to incomplete homogenization for several species (Álvarez & Wendel, 2003; Ayalew et al., 2011), possibly confounding phylogenetic accuracy. In addition, the length of this marker is relatively short, with ITS-1 and ITS-2 both being shorter than 300 bp, and thus its resolving power for most groups is not very high.

In this work, > 1000 low-copy putative orthologous genes were identified as candidate phylogenetic markers by comparing genomes of moss and seven representative angiosperms. Their amino acid sequence identities ranged from 28.9% to 94.2%, suggestive of their variable evolutionary rates; therefore, they include both genes with relatively slow and rapid evolutionary rates for resolving relationships among both high- and low-rank taxonomic hierarchies. We then thoroughly inspected a set of conserved low-copy nuclear genes related to DNA/RNA metabolism and demonstrated their suitabilities for reconstruction of angiosperm phylogeny. Because these genes perform conserved functions necessary for all organisms, they are much less likely to be involved in adaptive fitness or under environmentally driven selective pressure than genes for signalling or regulatory processes. We also showed that they were easy to amplify from highly divergent angiosperm species, representing the first report of using low-copy nuclear genes for reconstructing angiosperm phylogeny on a large scale. As a proof of principle, our phylogenetic analyses of angiosperms demonstrated the power of these nuclear genes with sufficient phylogenetic information to resolve both deep and shallow relationships, including several relationships that have been difficult to determine. Also, many distinct intron sequences described here between two closely related species indeed provide easily amplified candidates from genomic PCR for resolving relationships among low-level taxonomic hierarchies and for DNA barcoding purposes (Hebert et al., 2003; Kress et al., 2005; Lahaye et al., 2008; Li et al., 2011a). With the improvement of sequencing technology, more and more genome and transcriptome data will be available, so, in addition to organellar genes, low-copy nuclear genes identified here can become new choices for plant systematists in future studies.

An angiosperm phylogeny with support for alternative hypotheses

At the beginning, several different genes were used to construct single angiosperm phylogenies, then more genes or even whole plastid genomes were combined to improve the phylogeny. After decades of efforts, the recently proposed APG III classification has many consistently supported relationships (Chase et al., 1993; Soltis et al., 1997; Jansen et al., 2007; APG III, 2009; Moore et al., 2010). However, the placements of some groups are still uncertain with low support values. In the phylogeny reported here, 78.3% of nodes for phylogenetic level of order or higher levels are congruent with the latest comprehensive analysis using 83 plastid genes (Moore et al., 2010), indicating that these relationships are the most strongly supported by both chloroplast and nuclear genes.

At the same time, our results support hypotheses distinct from previous ones (Jansen et al., 2007; Moore et al., 2007, 2010). Within rosids, our analyses strongly support the sister relationship of malvids and the COM clade, consistent with some recent independent results with variable degrees of confidence using either mitochondrial or nuclear genes (Zhu et al., 2007; Finet et al., 2010; Qiu et al., 2010; Shulaev et al., 2010; Lee et al., 2011). The sisterhood of the COM clade and malvids is also supported moderately by floral structural features (Endress & Matthews, 2006). Another obvious difference involves Cornales and Ericales, which together formed a sister group to euasterids here, unlike the previous hypothesis of Cornales and Ericales being successive sisters to the remaining asterids (Albach et al., 2001; Bremer et al., 2002; Moore et al., 2010). In addition, Vitaceae and Saxifragales were sisters with high support in all analyses except in the NUtree excluding the 3rd codon positions. This sister relationship was further supported by statistical tests (Table S4) and by recent analyses using plastid genomes and nuclear ribosomal protein genes (Finet et al., 2010; Moore et al., 2010). Different topologies recovered by genes from different genomes might have resulted from their different evolutionary histories, which may be caused by different manners of heritage. These apparent differences reinforce the necessity of obtaining evidence from both organellar and nuclear genes.

Some relationships recovered in this study with moderate support were also different from previous topologies. For example, in monocots, all analyses moderately supported Dioscoreales as the sister group of commelinids, a widely accepted monophyletic group composed of Poales, Commelinales, Arecales and Zingiberales (Chase et al., 2006), followed by Asparagales and Liliales as successive sisters, although five alternative hypotheses with Asparagales, Liliales and Dioscoreales as successive sisters of commelinids in different order (Table S10) could not be excluded using AU tests. By contrast, Asparagales or Liliales were placed as sister to commelinids in analyses using, respectively, plastid genes (Davis et al., 2004; Chase et al., 2006; Graham et al., 2006; Moore et al., 2010; Soltis et al., 2011) or mitochondrial genes (Qiu et al., 2010) with variable degrees of confidence, indicating that the relationships among these four major groups are still uncertain. Also, Caryophyllales has usually been placed previously as sister to asterids using plastid and/or mitochondrial genes (Hilu et al., 2003; APG III, 2009; Moore et al., 2010; Soltis et al., 2011); however, our results supported its sisterhood to rosids in all matrices/methods (Figs 4, S11b,c) except in the NUtree excluding the 3rd codon positions (Fig S12), similar to a recent topology recovered using 77 nuclear ribosomal genes (Finet et al., 2010). Among the major lineages of core eudicots, our NUtrees supported the placement of the clade of Vitaceae and Saxifragales in a group called super-rosids, as shown in a recent study with plastid genomes (Moore et al., 2010). However, our AAtrees supported a basal position in the core eudicots, in agreement with the study using nuclear ribosomal genes (Finet et al., 2010). Therefore, the placements of the clades of Vitaceae and Saxifragales are still uncertain. In Lamiidae, our findings that Lamiales and Solanales cluster together, with Gentianales being their sister, were similar to those from several previous analyses (Bremer et al., 2002; Hilu et al., 2003). However, other studies using different genes and methods have yielded two alternative topologies (Albach et al., 2001; Finet et al., 2010; Moore et al., 2010), which could not be rejected by AU tests, either (Table S10). The difficulty in resolving these relationships indicates that more genes from all three genomes – especially nuclear genes – and denser sampling are required to address these issues.

In addition, weak support was obtained for positions of several major groups, such as those among the Mesangiospermae – that is, the eudicots, monocots, magnoliids, Ceratophyllaceae and Chloranthaceae. Understanding this relationship is one of the most difficult problems in plant phylogenetics (Qiu et al., 2006) and there are various hypotheses for the inner relationships among Mesangiospermae (Qiu et al., 1999, 2006; Hilu et al., 2003; Moore et al., 2007, 2010). Recently, Ceratophyllaceae was placed as sister of eudicots and Chloranthaceae as sister of magnoliids, with moderate support even using entire plastid genomes (Moore et al., 2010). Here, almost all analyses supported weakly the sister relationship of Ceratophyllaceae and Chloranthaceae, as was also proposed in a recent study using four mitochondrial genes (Qiu et al., 2010); floral morphological evidence also supported this combination (Endress & Doyle, 2009). Furthermore, the monophyletic CCMM clade, although speculative, suggests a distinct hypothesis that warrants further investigations. Therefore, additional gene markers and/or more taxa, as well as careful morphological inspection of live and fossil material, are required for relationships that are difficult to resolve.


We describe here a set of highly conserved single- or low-copy nuclear genes as excellent candidate phylogenetic markers. Detailed sequence analyses of five representative genes revealed that they include both highly conserved and more divergent regions; the former allows easy alignment and design of primers for amplification whereas the latter provides informative sites for phylogenetic analysis, as well as DNA barcoding purposes. Indeed, we were able to obtain their homologs from dozens of flowering plants covering the entire spectrum of angiosperms and representing most orders, with five single-gene phylogenies being largely consistent with well-supported organismal relationships. These highly conserved nuclear genes are present in all eukaryotes, allowing integration of plant phylogenies into the eukaryotic tree of life. The small number of highly informative genes also facilitates the analysis of many more organisms in a single study, because of both the ease of their amplification and the economy of computational time.

The angiosperm phylogeny we obtained showed maximum support for most clades, largely consistent with previous hypotheses, indicating that both nuclear and organellar genes support well-established angiosperm relationships. The strongly supported differences in placements for some groups suggest different evolutionary histories for nuclear and organellar genes; therefore, both kinds of markers are necessary for reconstruction of angiosperm phylogeny. The highly informative and easily cloned nuclear genes will facilitate future investigation of the angiosperm phylogeny with expanded taxa and promote understanding of structural and functional evolution of flowering plants.


  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

We thank Yonghong Hu, Yuqin Wang, Chunce Guo and Ming Ding for their help in plant sample collection. We also thank Ji Yang, Lu Lu and Liangsheng Zhang for helpful discussions and Fan Lu for specimen identification. We are very grateful to Yaqiong Wang and Haifeng Wang for developing Perl scripts for this work, to Hongxing Yang and Qiang Zhang for help with software use, to Ji Qi for improvement on the appearance of figures. This study was supported by grants from Natural Science Foundation of China (grant no. 31100156), Postdoctoral Science Foundation of China (grant no. 20100480549 and 201003241), the Ministry of Sciences and Technology of China (2011CB944600) and funds from Fudan University (including the 211 and 985 programs).


  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information
  • Albach DC , Soltis PS , Soltis DE , Olmstead RG . 2001 . Phylogenetic analysis of asterids based on sequences of four genes . Annals of the Missouri Botanical Garden 88 : 163212 .
  • Álvarez I , Wendel JF . 2003 . Ribosomal ITS sequences and plant phylogenetic inference . Molecular Phylogenetics and Evolution 29 : 417434 .
  • APG III . 2009 . An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III . Botanical Journal of the Linnean Society 161 : 105121 .
  • Ayalew MB , Megan JJ , Rebekah FA . 2011 . Incomplete homogenization of 18 S ribosomal DNA coding regions in Arabidopsis thaliana . BMC Research Notes 4 : 93 .
  • Baldwin BG . 1992 . Phylogenetic utility of the internal transcribed spacers of nuclear ribosomal DNA in plants: an example from the Compositae . Molecular Phylogenetics and Evolution 1 : 316 .
  • Baldwin BG , Sanderson MJ , Porter JM , Wojciechowski MF , Campbell CS , Donoghue MJ . 1995 . The ITS region of nuclear ribosomal DNA: a valuable source of evidence on angiosperm phylogeny . Annals of the Missouri Botanical Garden 82 : 247277 .
  • Beilstein MA , Nagalingum NS , Clements MD , Manchester SR , Mathews S . 2010 . Dated molecular phylogenies indicate a Miocene origin for Arabidopsis thaliana . Proceedings of the National Academy of Sciences, USA 107 : 18 72418 728 .
  • Bell CD , Soltis DE , Soltis PS . 2010 . The age and diversification of the angiosperms re-revisited . American Journal of Botany 97 : 12961303 .
  • Birky CW Jr . 2001 . The inheritance of genes in mitochondria and chloroplasts: laws, mechanisms, and models . Annual Review of Genetics 35 : 125148 .
  • Blanc G , Wolfe KH . 2004 . Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution . Plant Cell 16 : 16791691 .
  • Bremer B , Bremer K , Heidari N , Erixon P , Olmstead RG , Anderberg AA , Källersjö M , Barkhordarian E . 2002 . Phylogenetics of asterids based on 3 coding and 3 non-coding chloroplast DNA markers and the utility of non-coding DNA at higher taxonomic levels . Molecular Phylogenetics and Evolution 24 : 274301 .
  • Buckler ES IV , Ippolito A , Holtsford TP . 1997 . The evolution of ribosomal DNA: divergent paralogues and phylogenetic implications . Genetics 145 : 821832 .
  • Burleigh JG , Bansal MS , Eulenstein O , Hartmann S , Wehe A , Vision TJ . 2011 . Genome-scale phylogenetics: inferring the plant tree of life from 18,896 gene trees . Systematic Biology 60 : 117125 .
  • Chase MW , Fay MF , Devey DS , Maurin O , Rønsted N , Davies TJ , Pillon Y , Petersen G , Seberg O , Tamura MN et al. 2006 . Multigene analyses of monocot relationships: a summary . Aliso 22 : 6375 .
  • Chase MW , Soltis DE , Olmstead RG , Morgan D , Les DH , Mishler BD , Duvall MR , Price RA , Hills HG , Qiu YL et al. 1993 . Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid gene rbcL . Annals of the Missouri Botanical Garden 80 : 528580 .
  • Clegg MT , Gaut BS , Learn GH , Morton BR . 1994 . Rates and patterns of chloroplast DNA evolution . Proceedings of the National Academy of Sciences, USA 91 : 67956801 .
  • Cruz-Mazo G , Buide M , Samuel R , Narbona E . 2009 . Molecular phylogeny of Scorzoneroides (Asteraceae): evolution of heterocarpy and annual habit in unpredictable environments . Molecular Phylogenetics and Evolution 53 : 835847 .
  • Davis JI , Stevenson DW , Petersen G , Seberg O , Campbell LM , Freudenstein JV , Goldman DH , Hardy CR , Michelangeli FA , Simmons MP et al. 2004 . A phylogeny of the monocots, as inferred from rbcL and atpA sequence variation, and a comparison of methods for calculating jackknife and bootstrap values . Systematic Botany 29 : 467510 .
  • Dehal P , Boore JL . 2005 . Two rounds of whole genome duplication in the ancestral vertebrate . PLoS Biology 3 : e314 .
  • Drummond A , Rambaut A . 2007 . BEAST: Bayesian evolutionary analysis by sampling trees . BMC Evolutionary Biology 7 : 214 .
  • Duarte J , Wall PK , Edger P , Landherr L , Ma H , Pires JC , Leebens-Mack J . 2010 . Identification of shared single copy nuclear genes in Arabidopsis, Populus, Vitis and Oryza and their phylogenetic utility across various taxonomic levels . BMC Evolutionary Biology 10 : 61 .
  • Ebersberger I , Strauss S , Von Haeseler A . 2009 . HaMStR: Profile hidden markov model based search for orthologs in ESTs . BMC Evolutionary Biology 9 : 157 .
  • Edgar RC . 2004 . MUSCLE: multiple sequence alignment with high accuracy and high throughput . Nucleic Acids Research 32 : 17921797 .
  • Endress PK , Doyle JA . 2009 . Reconstructing the ancestral angiosperm flower and its initial specializations . American Journal of Botany 96 : 2266 .
  • Endress PK , Matthews ML . 2006 . First steps towards a floral structural characterization of the major rosid subclades . Plant Systematics and Evolution 260 : 223251 .
  • Finet C , Timme RE , Delwiche CF , Marlétaz F . 2010 . Multigene phylogeny of the green lineage reveals the origin and diversification of land plants . Current Biology 21 : 22172222 .
  • Forsburg SL . 2004 . Eukaryotic MCM proteins: beyond replication initiation . Microbiology and Molecular Biology Reviews 68 : 109131 .
  • Fulton TM , Van der Hoeven R , Eannetta NT , Tanksley SD . 2002 . Identification, analysis, and utilization of conserved ortholog set markers for comparative genomics in higher plants . Plant Cell 14 : 14571467 .
  • Galloway GL , Malmberg RL , Price RA . 1998 . Phylogenetic utility of the nuclear gene arginine decarboxylase: an example from Brassicaceae . Molecular Biology and Evolution 15 : 13121320 .
  • Goremykin VV , Hirsch-Ernst KI , Wölfl S , Hellwig FH . 2003 . Analysis of the Amborella trichopoda chloroplast genome sequence suggests that Amborella is not a basal angiosperm . Molecular Biology and Evolution 20 : 14991505 .
  • Goremykin VV , Viola R , Hellwig FH . 2009 . Removal of noisy characters from chloroplast genome-scale data suggests revision of phylogenetic placements of Amborella and Ceratophyllum . Journal of Molecular Evolution 68 : 197204 .
  • Gouy M , Guindon S , Gascuel O . 2010 . SeaView version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building . Molecular Biology and Evolution 27 : 221224 .
  • Graham SW , Zgurski JM , McPherson MA , Cherniawsky DM , Saarela JM , Horne EFC , Smith SY , Wong WA , O’Brien HE , Biron VL et al. 2006 . Robust inference of monocot deep phylogeny using an expanded multigene plastid data set . Aliso 22 : 321 .
  • Guindon S , Dufayard JF , Lefort V , Anisimova M , Hordijk W , Gascuel O . 2010 . New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0 . Systematic Biology 59 : 307321 .
  • Hansen AK , Escobar LK , Gilbert LE , Jansen RK . 2007 . Paternal, maternal, and biparental inheritance of the chloroplast genome in Passiflora (Passifloraceae): implications for phylogenetic studies . American Journal of Botany 94 : 4246 .
  • Hebert PDN , Cywinska A , Ball SL , DeWaard JR . 2003 . Biological identifications through DNA barcodes . Proceedings of the Royal Society of London Series B: Biological Sciences 270 : 313321 .
  • Hilu KW , Borsch T , Müller K , Soltis DE , Soltis PS , Savolainen V , Chase MW , Powell MP , Alice LA , Evans R et al. 2003 . Angiosperm phylogeny based on matK sequence information . American Journal of Botany 90 : 17581776 .
  • Hughes J , Longhorn SJ , Papadopoulou A , Theodorides K , De Riva A , Mejia-Chang M , Foster PG , Vogler AP . 2006 . Dense taxonomic EST sampling and its applications for molecular systematics of the Coleoptera (beetles) . Molecular Biology and Evolution 23 : 268278 .
  • James TY , Kauff F , Schoch CL , Matheny PB , Hofstetter V , Cox CJ , Celio G , Gueidan C , Fraker E , Miadlikowska J et al. 2006 . Reconstructing the early evolution of Fungi using a six-gene phylogeny . Nature 443 : 818822 .
  • Jansen RK , Cai Z , Raubeson LA , Daniell H , Depamphilis CW , Leebens-Mack J , Müller KF , Guisinger-Bellian M , Haberle RC , Hansen AK . 2007 . Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns . Proceedings of the National Academy of Sciences, USA 104 : 19 36919 374 .
  • Jiao YN , Wickett NJ , Ayyampalayam S , Chanderbali AS , Landherr L , Ralph PE , Tomsho LP , Hu Y , Liang H , Soltis PS et al. 2011 . Ancestral polyploidy in seed plants and angiosperms . Nature 473 : 97100 .
  • Judd WS , Campbell CS , Kellogg E , Stevens PF , Donoghue MJ , Kellogg EA . 1999 . Plant systematics: a phylogenetic approach . Sunderland, MA, USA : Sinauer Associates .
  • Keeling PJ , Palmer JD . 2008 . Horizontal gene transfer in eukaryotic evolution . Nature Reviews Genetics 9 : 605618 .
  • Kellis M , Birren BW , Lander ES . 2004 . Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae . Nature 428 : 617624 .
  • Knoop V . 2004 . The mitochondrial DNA of land plants: peculiarities in phylogenetic perspective . Current Genetics 46 : 123139 .
  • Kocot KM , Cannon JT , Todt C , Citarella MR , Kohn AB , Meyer A , Santos SR , Schander C , Moroz LL , Lieb B et al. 2011 . Phylogenomics reveals deep molluscan relationships . Nature 477 : 452456 .
  • Koonin EV . 2005 . Orthologs, paralogs, and evolutionary genomics . Annual Review Genetics 39 : 309338 .
  • Kress WJ , Wurdack KJ , Zimmer EA , Weigt LA , Janzen DH . 2005 . Use of DNA barcodes to identify flowering plants . Proceedings of the National Academy of Sciences, USA 102 : 83698374 .
  • Lahaye R , Van Der Bank M , Bogarin D , Warner J , Pupulin F , Gigot G , Maurin O , Duthoit S , Barraclough TG , Savolainen V . 2008 . DNA barcoding the floras of biodiversity hotspots . Proceedings of the National Academy of Sciences, USA 105 : 29232928 .
  • Lartillot N , Philippe H . 2004 . A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process . Molecular Biology and Evolution 21 : 10951109 .
  • Lee EK , Cibrian-Jaramillo A , Kolokotronis SO , Katari MS , Stamatakis A , Ott M , Chiu JC , Little DP , Stevenson DW , McCombie WR et al. 2011 . A functional phylogenomic view of the seed plants . PLoS Genetics 7 : e1002411 .
  • Letsch H , Kjer K . 2011 . Potential pitfalls of modelling ribosomal RNA data in phylogenetic tree reconstruction: evidence from case studies in the Metazoa . BMC Evolutionary Biology 11 : 146 .
  • Li L , Stoeckert CJ , Roos DS . 2003 . OrthoMCL: identification of ortholog groups for eukaryotic genomes . Genome Research 13 : 21782189 .
  • Li DZ , Gao LM , Li HT , Wang H , Ge XJ , Liu JQ , Chen ZD , Zhou SL , Chen SL , Yang JB et al. 2011a . Comparative analysis of a large dataset indicates that internal transcribed spacer (ITS) should be incorporated into the core barcode for seed plants . Proceedings of the National Academy of Sciences, USA 108 : 19 64119 646 .
  • Li JH , Liu ZJ , Salazar GA , Bernhardt P , Perner H , Tomohisa Y , Jin XH , Chung SW , Luo YB . 2011b . Molecular phylogeny of Cypripedium (Orchidaceae: Cypripedioideae) inferred from multiple nuclear and chloroplast regions . Molecular Phylogenetics and Evolution 61 : 8320 .
  • Lin ZG , Kong HZ , Nei M , Ma H . 2006 . Origins and evolution of the recA/RAD51 gene family: evidence for ancient gene duplication and endosymbiotic gene transfer . Proceedings of the National Academy of Sciences, USA 103 : 10 32810 333 .
  • Lin ZG , Nei M , Ma H . 2007 . The origins and early evolution of DNA mismatch repair genes-multiple horizontal gene transfers and co-evolution . Nucleic Acids Research 35 : 75917603 .
  • Lu L , Fritsch PW , Cruz BC , Wang H , Li DZ . 2010 . Reticulate evolution, cryptic species, and character convergence in the core East Asian clade of Gaultheria (Ericaceae) . Molecular Phylogenetics and Evolution 57 : 364379 .
  • Mathews S , Donoghue MJ . 1999 . The root of angiosperm phylogeny inferred from duplicate phytochrome genes . Science 286 : 947950 .
  • Moore MJ , Bell CD , Soltis PS , Soltis DE . 2007 . Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms . Proceedings of the National Academy of Sciences, USA 104 : 19 36319 368 .
  • Moore MJ , Soltis PS , Bell CD , Burleigh JG , Soltis DE . 2010 . Phylogenetic analysis of 83 plastid genes further resolves the early diversification of eudicots . Proceedings of the National Academy of Sciences, USA 107 : 46234628 .
  • Moreau CS , Bell CD , Vila R , Archibald SB , Pierce NE . 2006 . Phylogeny of the ants: diversification in the age of angiosperms . Science 312 : 101104 .
  • Mort ME , Crawford DJ . 2004 . The continuing search: low-copy nuclear sequences for lower-level plant molecular phylogenetic studies . Taxon 53 : 257261 .
  • Nei M , Kumar S . 2000 . Molecular evolution and phylogenetics . New York, NY, USA : Oxford University Press .
  • Ness RW , Graham SW , Barrett SCH . 2011 . Reconciling gene and genome duplication events: using multiple nuclear gene families to infer the phylogeny of the aquatic plant family Pontederiaceae . Molecular Biology and Evolution 28 : 30093018 .
  • Nicholas KB , Nicholas H , Deerfield D . 1997 . GeneDoc: analysis and visualization of genetic variation . Embnew News 4 : 14 .
  • Nylander JAA , Wilgenbusch JC , Warren DL , Swofford DL . 2008 . AWTY (are we there yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics . Bioinformatics 24 : 581583 .
  • Palmer JD . 1990 . Loss of photosynthetic and chlororespiratory genes from the plastid genome of a parasitic flowering plant . Nature 348 : 337339 .
  • Philippe H , Delsuc F , Brinkmann H , Lartillot N . 2005 . Phylogenomics . Annual Review of Ecology, Evolution, and Systematics 36 : 541562 .
  • Posada D , Crandall KA . 1998 . Modeltest: testing the model of DNA substitution . Bioinformatics 14 : 817818 .
  • Qiu YL , Lee J , Bernasconi-Quadroni F , Soltis DE , Soltis PS , Zanis M , Zimmer EA , Chen ZD , Savolainen V , Chase MW . 1999 . The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes . Nature 402 : 404407 .
  • Qiu YL , Li L , Hendry TA , Li R , Taylor DW , Issa MJ , Ronen AJ , Vekaria ML , White AM . 2006 . Reconstructing the basal angiosperm phylogeny: evaluating information content of mitochondrial genes . Taxon 55 : 837856 .
  • Qiu YL , Li L , Wang B , Xue JY , Hendry TA , Li RQ , Brown JW , Liu Y , Hudson GT , Chen ZD . 2010 . Angiosperm phylogeny inferred from sequences of four mitochondrial genes . Journal of Systematics and Evolution 48 : 391425 .
  • Regier JC , Shultz JW , Zwick A , Hussey A , Ball B , Wetzer R , Martin JW , Cunningham CW . 2010 . Arthropod relationships revealed by phylogenomic analysis of nuclear protein-coding sequences . Nature 463 : 10791083 .
  • Rokas A , Williams BL , King N , Carroll SB . 2003 . Genome-scale approaches to resolving incongruence in molecular phylogenies . Nature 425 : 798804 .
  • Ronquist F , Huelsenbeck JP . 2003 . MrBayes 3: Bayesian phylogenetic inference under mixed models . Bioinformatics 19 : 15721574 .
  • Sang T . 2002 . Utility of low-copy nuclear gene sequences in plant phylogenetics . Critical Reviews in Biochemistry and Molecular Biology 37 : 121147 .
  • Sang T , Donoghue MJ , Zhang DM . 1997 . Evolution of alcohol dehydrogenase genes in peonies (Paeonia): phylogenetic relationships of putative nonhybrid species . Molecular Biology and Evolution 14 : 9941007 .
  • Savolainen V , Chase MW , Hoot SB , Morton CM , Soltis DE , Bayer C , Fay MF , De Bruijn AY , Sullivan S , Qiu YL . 2000 . Phylogenetics of flowering plants based on combined analysis of plastid atpB and rbcL gene sequences . Systematic Biology 49 : 306362 .
  • Scannell DR , Byrne KP , Gordon JL , Wong S , Wolfe KH . 2006 . Multiple rounds of speciation associated with reciprocal gene loss in polyploid yeasts . Nature 440 : 341345 .
  • Schmutz J , Cannon SB , Schlueter J , Ma J , Mitros T , Nelson W , Hyten DL , Song Q , Thelen JJ , Cheng J et al. 2010 . Genome sequence of the palaeopolyploid soybean . Nature 463 : 178183 .
  • Shimodaira H , Hasegawa M . 2001 . CONSEL: for assessing the confidence of phylogenetic tree selection . Bioinformatics 17 : 12461247 .
  • Shulaev V , Sargent DJ , Crowhurst RN , Mockler TC , Folkerts O , Delcher AL , Jaiswal P , Mockaitis K , Liston A , Mane SP et al. 2010 . The genome of woodland strawberry (Fragaria vesca) . Nature Genetics 43 : 109116 .
  • Small RL , Cronn RC , Wendel JF . 2004 . Use of nuclear genes for phylogeny reconstruction in plants . Australian Systematic Botany 17 : 145170 .
  • Smith SA , Wilson NG , Goetz FE , Feehery C , Andrade SCS , Rouse GW , Giribet G , Dunn CW . 2011 . Resolving the evolutionary relationships of molluscs with phylogenomic tools . Nature 480 : 364367 .
  • Soltis DE , Albert VA , Leebens-Mack J , Bell CD , Paterson AH , Zheng C , Sankoff D , Depamphilis CW , Wall PK , Soltis PS . 2009 . Polyploidy and angiosperm diversification . American Journal of Botany 96 : 336348 .
  • Soltis DE , Smith SA , Cellinese N , Wurdack KJ , Tank DC , Brockington SF , Refulio-Rodriguez NF , Walker JB , Moore MJ , Carlsward BS et al. 2011 . Angiosperm phylogeny: 17 genes, 640 taxa . American Journal of Botany 98 : 704730 .
  • Soltis DE , Soltis PS . 2003 . The role of phylogenetics in comparative genetics . Plant Physiology 132 : 17901800 .
  • Soltis DE , Soltis PS , Endress PK , Chase MW . 2005 . Phylogeny and evolution of angiosperms . Sunderland, MA, USA : Sinauer Associates .
  • Soltis DE , Soltis PS , Nickrent DL , Johnson LA , Hahn WJ , Hoot SB , Sweere JA , Kuzoff RK , Kron KA , Chase MW et al. 1997 . Angiosperm phylogeny inferred from 18S ribosomal DNA sequences . Annals of the Missouri Botanical Garden 84 : 149 .
  • Soltis PS , Soltis DE , Savolainen V , Crane PR , Barraclough TG . 2002 . Rate heterogeneity among lineages of tracheophytes: integration of molecular and fossil data and evidence for molecular living fossils . Proceedings of the National Academy of Sciences, USA 99 : 44304435 .
  • Stamatakis A . 2006 . RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models . Bioinformatics 22 : 26882690 .
  • Strand A , Leebens-Mack J , Milligan B . 1997 . Nuclear DNA-based markers for plant evolutionary biology . Molecular Ecology 6 : 113118 .
  • Struck TH , Paul C , Hill N , Hartmann S , Hösel C , Kube M , Lieb B , Meyer A , Tiedemann R , Purschke G et al. 2011 . Phylogenomic analyses unravel annelid evolution . Nature 471 : 9598 .
  • Surcel A , Zhou XF , Quan L , Ma H . 2008 . Long-term maintenance of stable copy number in the eukaryotic SMC family: origin of a vertebrate meiotic SMC1 and fate of recent segmental duplicates . Journal of Systematics and Evolution 46 : 405423 .
  • Suyama M , Torrents D , Bork P . 2006 . PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments . Nucleic Acids Research 34 : W609W612 .
  • Tamura K , Peterson D , Peterson N , Stecher G , Nei M , Kumar S . 2011 . MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods . Molecular Biology and Evolution 28 : 27312739 .
  • Wang HC , Moore MJ , Soltis PS , Bell CD , Brockington SF , Alexandre R , Davis CC , Latvis M , Manchester SR , Soltis DE . 2009 . Rosid radiation and the rapid rise of angiosperm-dominated forests . Proceedings of the National Academy of Sciences, USA 106 : 38533858 .
  • Whittall JB , Medina-Marino A , Zimmer EA , Hodges SA . 2006 . Generating single-copy nuclear gene data for a recent adaptive radiation . Molecular Phylogenetics and Evolution 39 : 124134 .
  • Wikström N , Savolainen V , Chase MW . 2001 . Evolution of the angiosperms: calibrating the family tree . Proceedings of the Royal Society of London Series B: Biological Sciences 268 : 22112220 .
  • Wu F , Mueller LA , Crouzillat D , Pétiard V , Tanksley SD . 2006 . Combining bioinformatics and phylogenetics to identify large sets of single-copy orthologous genes (COSII) for comparative, evolutionary and systematic studies: a test case in the euasterid plant clade . Genetics 174 : 14071420 .
  • Yuan YW , Liu C , Marx HE , Olmstead RG . 2009 . The pentatricopeptide repeat (PPR) gene family, a tremendous resource for plant phylogenetic studies . New Phytologist 182 : 272283 .
  • Zanis MJ , Soltis DE , Soltis PS , Mathews S , Donoghue MJ . 2002 . The root of the angiosperms revisited . Proceedings of the National Academy of Sciences, USA 99 : 68486853 .
  • Zhang Q , Antonelli A , Field TS , Kong HZ . 2011 . Revisiting taxonomy, morphological evolution, and fossil calibration strategies in Chloranthaceae . Journal of Systematics and Evolution 49 : 315329 .
  • Zhang JZ . 2003 . Evolution by gene duplication: an update . Trends in Ecology & Evolution 18 : 292298 .
  • Zhou XF , Lin ZG , Ma H . 2010 . Phylogenetic detection of numerous gene duplications shared by animals, fungi and plants . Genome Biology 11 : R38 .
  • Zhu QH , Ge S . 2005 . Phylogenetic relationships among A-genome species of the genus Oryza revealed by intron sequences of four nuclear genes . New Phytologist 167 : 249265 .
  • Zhu XY , Chase M , Qiu YL , Kong HZ , Dilcher D , Li JH , Chen ZD . 2007 . Mitochondrial matR sequences help to resolve deep phylogenetic relationships in rosids . BMC Evolutionary Biology 7 : 217 .
  • Zong J , Yao X , Yin JY , Zhang DB , Ma H . 2009 . Evolution of the RNA-dependent RNA polymerase (RdRP) genes: duplications and possible losses before and after the divergence of major eukaryotic groups . Gene 447 : 2939 .

Supporting Information

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Fig. S1 The workflow applied in this paper.

Fig. S2 Gene copy number of 20 randomly selected low-copy nuclear genes in 15 angiosperms with sequenced genome.

Fig. S3 Single gene tree of 15 species with sequenced genome and information of 20 randomly selected low-copy genes.

Fig. S4 Comparison between the concatenated five good gene tree and the concatenated five bad gene tree.

Fig. S5 Size of SMC1 proteins and sequence identities between species with sequenced genomes.

Fig. S6 Comparison between Arabidopsis thaliana and A.  lyrata of introns of (a) SMC1, (b) SMC2, (c) MCM5, (d) MLH1 and (e) MSH1.

Fig. S7 ML tree inferred by PhyML 3.0 using the nucleotide matrix of (a) SMC1, (b) SMC2, (c) MCM5, (d) MLH1 and (e) MSH1.

Fig. S8 Comparison of five single gene trees with the best ML tree inferred by RAxML using the concatenated 5 genes.

Fig. S9 Comparison of five trees reconstructed by one gene and the concatenated 2–5 genes.

Fig. S10 Cladogram of the best ML tree conducted by RAxML based on the concatenated 5 gene nucleotide sequences.

Fig. S11 Phylogram inferred by MrBayes 3.0 based on the concatenated 5 gene nucleotide sequences of (a) 42 species, (b) eudicot species, (c) eudicot species excluding Saxifragales and Vitaceae species and (d) eudicot species excluding Caryophyllales.

Fig. S12 Phylogram of the best ML tree conducted by RAxML based on the concatenated 5 gene nucleotide matrix excluding the 3rd codon positions.

Methods S1 Supplemental methods.

Table S1 Taxon sampling

Table S2 Degenerate primers used in this study

Table S3 Specific primers used in this study

Table S4 Information of SMC1, four regions were separated to obtain the majority of gene sequences

Table S5 Information of SMC2, four regions were separated to obtain the majority of gene sequence

Table S6 Information of MCM5

Table S7 Information of MLH1

Table S8 Information of MSH1, two regions were separated to obtain gene sequences

Table S9 Evolutionary models inferred by ModelTest

Table S10 AU test results, P values > 0.05 are in bold

Table S11 Divergence time estimation inferred from BEAST analysis, ranges correspond to 95% highest posterior density (HPD). The divergence times of major groups were given

Table S12 Orthogroups identified by genome comparison with the length no < 300 amino acid

Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office.

NPH_4212_sm_FigS1-S12.ppt4186KSupporting info item
NPH_4212_sm_MethodsS1-TableS1-S11.pdf1908KSupporting info item
NPH_4212_sm_TableS12.xls334KSupporting info item