- Top of page
- Materials and Methods
- Supporting Information
Phylogenetic relationships between organisms are the foundation of evolutionary biology and many other disciplines, such as biodiversity and biogeography, functional evolution of development and physiology, comparative genomics, and crop breeding. The characteristics for phylogenetic studies have evolved from morphology and development, to physiology and biochemistry, and more recently to protein and DNA sequences. The latter provides a nearly unlimited number of characters for comparison and has been made increasingly accessible with the rapid development of sequencing technology (Judd et al., 1999; Soltis et al., 2005). During recent decades, organellar and ribosomal RNA genes have been widely used for resolving organismal relationships. However, these markers have some limitations; for example, rDNA genes have high copy number and experience concerted evolution (Letsch & Kjer, 2011). In particular, the extent of rDNA sequence homogenization might differ both between gene regions and among different loci. These variations might increase the uncertainty of organismal phylogenies inferred from these sequences (Buckler et al., 1997).
The importance of using low-copy nuclear genes for phylogenetic analysis has long been recognized (Strand et al., 1997; Fulton et al., 2002; Sang, 2002; Mort & Crawford, 2004; Hughes et al., 2006; Whittall et al., 2006; Wu et al., 2006; Yuan et al., 2009; Duarte et al., 2010). Nuclear protein-coding genes represent the overwhelming majority of the cellular genome and are important for many diverse functions, providing markers to track organismal evolution through both male and female Mendelian inheritance. For example, protein-coding nuclear genes have been used to reconstruct phylogenetic relationships among nearly 200 fungal species, illustrating their effectiveness (James et al., 2006). Phylogenomics using many nuclear genes further demonstrated their power for molecular evolutionary analyses, especially in studies of fungal and animal relationships partly due to the availability of many large genomic and EST sequence datasets (Rokas et al., 2003; Moreau et al., 2006; Regier et al., 2010; Kocot et al., 2011; Smith et al., 2011; Struck et al., 2011). However, until now, only a few nuclear genes have been used for resolving the backbone of flowering plant phylogeny (Soltis et al., 1997; Mathews & Donoghue, 1999; Finet et al., 2010; Burleigh et al., 2011; Lee et al., 2011).
Flowering plants (angiosperms) are one of the most successful groups of organisms on earth, with c. 300 000 species, and provide humans and animals with foods, fibres, medicines and other materials (Judd et al., 1999). The angiosperm phylogeny establishes the evolutionary history and depicts the phylogenetic relationships among various species and groups, facilitating comparative analysis between model plants and crops (Soltis & Soltis, 2003). The recent angiosperm phylogeny, which revolutionized the view of angiosperm taxonomy mainly based on morphological evidence, provides strong support for the monophyly of major groups such as the eudicots, monocots and magnoliids (Bremer et al., 2002; Jansen et al., 2007; Moore et al., 2007, 2010; APG III, 2009; Soltis et al., 2011). Also, there is an increasing consensus on Amborella being the sister to all other extant flowering plants (Zanis et al., 2002; Moore et al., 2007, 2010; Soltis et al., 2011), although alternative hypotheses still remain (Goremykin et al., 2003, 2009). Nevertheless, several enigmatic relationships are still to be resolved, such as those among eudicots, monocots, magnoliids, Chloranthales and Ceratophyllales, those among Dilleniaceae, Saxifragales, Caryophyllales, rosids and asterids, and those among the major clades within rosids. It has been proposed that these relationships are difficult to resolve because of rapid radiation (Hilu et al., 2003; Moore et al., 2007, 2010; Wang et al., 2009).
Until now, reconstructions of angiosperm phylogeny have relied largely on plastid and mitochondrial genes (Chase et al., 1993; Savolainen et al., 2000; Hilu et al., 2003; Zhu et al., 2007; Qiu et al., 2010) and sometimes entire plastid genomes (Jansen et al., 2007; Moore et al., 2007, 2010), while the use of nuclear genes only has been rare (Soltis et al., 1997; Mathews & Donoghue, 1999; Finet et al., 2010; Lee et al., 2011). Widespread gene duplication events represent a major challenge in selecting effective nuclear genes as phylogenetic markers (Zhang, 2003; Kellis et al., 2004; Dehal & Boore, 2005; Soltis et al., 2009; Zhou et al., 2010); recent genomic studies have shown that all extant angiosperms have experienced polyploidization events during evolution (Jiao et al., 2011). Gene duplications, in turn, make it difficult to distinguish orthologs from paralogs (Philippe et al., 2005). Even worse, in some cases, single-copy paralogs resulting from gene duplication and subsequent lineage-specific losses could be mistaken as orthologs, contributing to incorrect inference of organismal relationships (Nei & Kumar, 2000; Koonin, 2005; Scannell et al., 2006). Another difficulty in using nuclear genes for phylogenetic analysis is their relatively complex gene structure, making them hard to clone and align with confidence. With the advance of sequencing technologies, more and more plant genomic and transcriptomic datasets are being generated, so selecting suitable low-copy nuclear genes as phylogenetic markers is becoming feasible.
In this work, to facilitate the use of nuclear genes in phylogenetic analyses in angiosperms, we compared the genomes of one moss and seven angiosperm species and identified over 1000 genes as potential phylogenetic markers. To test their usability, sequence of five genes (SMC1, SMC2, MSH1, MLH1 and MCM5) belonging to four gene families were also obtained from 91 angiosperm species of 46 orders (73% of all orders). These genes are easy to clone and align and, compared with widely used organellar genes, they are phylogenetically much more informative. The resultant phylogenetic trees, which are generally concordant with those of previous studies, suggest that these nuclear genes are excellent candidates for reconstruction of angiosperm phylogeny at both above- and below-order levels.
- Top of page
- Materials and Methods
- Supporting Information
Fig. S1 The workflow applied in this paper.
Fig. S2 Gene copy number of 20 randomly selected low-copy nuclear genes in 15 angiosperms with sequenced genome.
Fig. S3 Single gene tree of 15 species with sequenced genome and information of 20 randomly selected low-copy genes.
Fig. S4 Comparison between the concatenated five good gene tree and the concatenated five bad gene tree.
Fig. S5 Size of SMC1 proteins and sequence identities between species with sequenced genomes.
Fig. S6 Comparison between Arabidopsis thaliana and A. lyrata of introns of (a) SMC1, (b) SMC2, (c) MCM5, (d) MLH1 and (e) MSH1.
Fig. S7 ML tree inferred by PhyML 3.0 using the nucleotide matrix of (a) SMC1, (b) SMC2, (c) MCM5, (d) MLH1 and (e) MSH1.
Fig. S8 Comparison of five single gene trees with the best ML tree inferred by RAxML using the concatenated 5 genes.
Fig. S9 Comparison of five trees reconstructed by one gene and the concatenated 2–5 genes.
Fig. S10 Cladogram of the best ML tree conducted by RAxML based on the concatenated 5 gene nucleotide sequences.
Fig. S11 Phylogram inferred by MrBayes 3.0 based on the concatenated 5 gene nucleotide sequences of (a) 42 species, (b) eudicot species, (c) eudicot species excluding Saxifragales and Vitaceae species and (d) eudicot species excluding Caryophyllales.
Fig. S12 Phylogram of the best ML tree conducted by RAxML based on the concatenated 5 gene nucleotide matrix excluding the 3rd codon positions.
Methods S1 Supplemental methods.
Table S2 Degenerate primers used in this study
Table S3 Specific primers used in this study
Table S4 Information of SMC1, four regions were separated to obtain the majority of gene sequences
Table S5 Information of SMC2, four regions were separated to obtain the majority of gene sequence
Table S6 Information of MCM5
Table S7 Information of MLH1
Table S8 Information of MSH1, two regions were separated to obtain gene sequences
Table S9 Evolutionary models inferred by ModelTest
Table S10 AU test results, P values > 0.05 are in bold
Table S11 Divergence time estimation inferred from BEAST analysis, ranges correspond to 95% highest posterior density (HPD). The divergence times of major groups were given
Table S12 Orthogroups identified by genome comparison with the length no < 300 amino acid
Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office.