DEEP DIVISION IN THE CHLOROPHYCEAE (CHLOROPHYTA) REVEALED BY CHLOROPLAST PHYLOGENOMIC ANALYSES1

Authors


  • 1

    Received 11 June 2007. Accepted 15 October 2007.

Abstract

The Chlorophyceae (sensu Mattox and Stewart) is a morphologically diverse class of the Chlorophyta displaying biflagellate and quadriflagellate motile cells with varying configurations of the flagellar apparatus. Phylogenetic analyses of 18S rDNA data and combined 18S and 26S rDNA data from a broad range of chlorophycean taxa uncovered five major monophyletic groups (Chlamydomonadales, Sphaeropleales, Oedogoniales, Chaetophorales, and Chaetopeltidales) but could not resolve their branching order. To gain insight into the interrelationships of these groups, we analyzed multiple genes encoded by the chloroplast genomes of Chlamydomonas reinhardtii P. A. Dang. and Chlamydomonas moewusii Gerloff (Chlamydomonadales), Scenedesmus obliquus (Turpin) Kütz. (Sphaeropleales), Oedogonium cardiacum Wittr. (Oedogoniales), Stigeoclonium helveticum Vischer (Chaetophorales), and Floydiella terrestris (Groover et Hofstetter) Friedl et O’Kelly (Chaetopeltidales). The C. moewusii, Oedogonium, and Floydiella chloroplast DNAs were partly sequenced using a random strategy. Trees were reconstructed from nucleotide and amino acid data sets derived from 44 protein-coding genes of 11 chlorophytes and nine streptophytes as well as from 57 protein-coding genes of the six chlorophycean taxa. All best trees identified two robustly supported major lineages within the Chlorophyceae: a clade uniting the Chlamydomonadales and Sphaeropleales, and a clade uniting the Oedogoniales, Chaetophorales, and Chaetopeltidales (OCC clade). This dichotomy is independently supported by molecular signatures in chloroplast genes, such as insertions/deletions and the distribution of trans-spliced group II introns. Within the OCC clade, the sister relationship observed for the Chaetophorales and Chaetopeltidales is also strengthened by independent data. Character state reconstruction of basal body orientation allowed us to refine hypotheses regarding the evolution of the flagellar apparatus.

Abbreviations:
AU

approximately unbiased

BV

bootstrap values

CCW

counterclockwise

cpDNA

chloroplast DNA

CS

Chlamydomonadales + Sphaeropleales

CW

clockwise

DO

directly opposed

GTR

general time reversible

indel

insertion/deletion

ML

maximum likelihood

MP

maximum parsimony

OCC

Oedogoniales + Chaetophorales + Chaetopeltidales

ORF

open reading frame

RELL

resampling of the estimated log likelihood

Of the five classes recognized for the green algae, the Chlorophyceae [sensu Mattox and Stewart (1984)] exhibits the most variability in terms of the arrangement of the flagellar apparatus. This class is thought to have emerged relatively late during the evolutionary history of the Chlorophyta (Lewis and McCourt 2004), a major division comprising four green algal classes and occupying a sister position relative to the Streptophyta (land plants and green algae belonging to the Charophyceae). Recent analyses based on whole mitochondrial and chloroplast genome data favor the hypothesis that the Chlorophyceae is sister to the Ulvophyceae and that the Trebouxiophyceae is sister to the Chlorophyceae + Ulvophyceae clade, with the Prasinophyceae representing the deepest divergence (Pombert et al. 2004, 2005). Members of the Chlorophyceae are morphologically diverse, ranging from coccoid to swimming unicells, colonies, and simple flattened thalli to unbranched and branched filaments (Lewis and McCourt 2004). Motile cells are vegetative cells, zoospores (asexual), or gametes with usually two or four flagella. Within the cell, the flagellar basal bodies are generally directly opposed (DO) or displaced in a clockwise (CW) direction (O’Kelly and Floyd 1984). In both the Ulvophyceae and Trebouxiophyceae, the flagellar apparatus features a counterclockwise (CCW) arrangement of the basal bodies, supporting the notion that the DO and CW arrangements are derived conditions that arose with the emergence of the Chlorophyceae (Mattox and Stewart 1984, O’Kelly and Floyd 1984).

Analyses of nuclear-encoded 18S rDNA data uncovered five major monophyletic groups within the Chlorophyceae: the Chlamydomonadales, Sphaeropleales, Chaetophorales, Chaetopeltidales, and Oedogoniales (Lewis and McCourt 2004). Within each group, vegetative morphology is variable, with the coccoid form being nearly ubiquitous. The Chlamydomonadales include biflagellate unicells with the CW configuration of basal bodies as well as quadriflagellate unicells with various basal body arrangements (Nozaki et al. 2003, Lewis and McCourt 2004, Watanabe et al. 2006). Using three chloroplast protein-coding genes, Nozaki et al. (2003) determined that three distinct quadriflagellate lineages differing in the ultrastructure of their flagellar apparatus are basal to all biflagellate lineages of the Chlamydomonadales. The phylogenies inferred by these authors support the idea that the CW orientation might have evolved from the CCW orientation in ancestral quadriflagellate chlamydomonadalean green algae, giving rise ultimately to the biflagellate cells characterizing most genera of this algal group. In the Sphaeropleales, all zoospores produced by the vegetatively nonmotile unicellular or colonial taxa display two flagella with the DO configuration (Lewis and McCourt 2004). Motile cells in the Chaetophorales and Chaetopeltidales consist of quadriflagellates. The motile members of the Chaetophorales are polymorphic for flagellar orientation (DO + CW) (Manton 1964, Melkonian 1975, Floyd et al. 1980, Bakker and Lokhorst 1984, Watanabe and Floyd 1989, Buchheim et al. 2001), whereas the taxa belonging to the Chaetopeltidales have a perfectly cruciate DO flagellar orientation (DO + DO) (O’Kelly and Floyd 1984, O’Kelly et al. 1994). On the other hand, the motile members of the Oedogoniales display a stephanokont arrangement of flagella (i.e., an anterior ring of flagella) (Pickett-Heaps 1975). The strikingly unusual ultrastructure of the flagellar apparatus in the latter green algae precludes homology assessment with flagellar characters occurring in the other groups.

A well-resolved phylogeny for the Chlorophyceae is necessary to understand the suite of evolutionary events that led to the marked changes seen in cellular ultrastructure and morphology. At present, the divergence order of the five main groups of chlorophycean green algae remains ambiguous. Although most internal nodes in trees inferred from the nuclear-encoded 18S and 26S rRNA genes are poorly supported, there is good support for the Chlamydomonadales and Sphaeropleales sharing a sister relationship, and it appears that the Chaetophorales, Oedogoniales, and Chaetopeltidales diverged earlier than the Chlamydomonadales and Sphaeropleales (Buchheim et al. 2001, Shoup and Lewis 2003, Müller et al. 2004, Alberghina et al. 2006). The interrelationships of the Chaetophorales, Oedogoniales, and Chaetopeltidales could not be resolved with any reasonable degree of confidence in these analyses with a broad taxon sampling, suggesting that the phylogenetic signal provided by the 18S and 26S rRNA genes is too limited for resolution of these groups.

To unravel the branching order of the major chlorophycean groups, we undertook the sequencing and comparative sequence analysis of whole chloroplast genomes from representative taxa. In recent reports, we described the chloroplast genomes of Sc. obliquus (Sphaeropleales) (de Cambiaire et al. 2006) and St. helveticum (Chaetophorales) (Bélanger et al. 2006) and compared them to their homologue in C. reinhardtii (Chlamydomonadales) (Maul et al. 2002). More recently, we have partly sequenced the chloroplast DNAs (cpDNAs) of C. moewusii (Chlamydomonadales), O. cardiacum (Oedogoniales), and F. terrestris (Chaetopeltidales). The present study used sequence data from multiple proteins and genes derived from these six chloroplast genomes to gain insight into the phylogenetic relationships between the chlorophycean lineages. In addition to phylogenomic inferences, we performed a detailed structural analysis of the genes examined to validate the branching order observed for chlorophycean lineages. The results that emerged from these analyses offer new insights into the evolution of the Chlorophyceae.

Materials and methods

Strains and culture conditions.  The mating-type plus (mt+) wildtype strain of C. moewusii was obtained from the Culture Collection of Algae at the University of Texas at Austin (UTEX 97) and grown in the minimal medium of Gowans (1960). F. terrestris (UTEX 1709) was grown in modified Volvox medium (McCracken et al. 1980). O. cardiacum originated from the Sammlung von Algenkulturen Göttingen, Germany (SAG 575-1b), and was cultured in medium C (Andersen et al. 2005). All three cultures were subjected to alternating 12:12 light:dark (L:D) periods.

Sequencing of cpDNAs and gene analyses.  A + T-rich organelle DNA was separated from nuclear DNA by CsCl-bisbenzimide isopycnic centrifugation (Turmel et al. 1999) and used to prepare a random plasmid clone library (Turmel et al. 2003). DNA templates were prepared from selected clones with the QIAprep 96 Miniprep kit (Qiagen Inc., Mississauga, Canada) and sequenced as described previously (Turmel et al. 2005). PCR fragments spanning cloned regions were also subjected to sequencing. Sequences were edited and assembled using SEQUENCHER 4.6 (GeneCodes, Ann Arbor, MI, USA). The sequence assemblies available on 3 April 2007 were analyzed in the present study. Genes were identified by BLAST homology searches (Altschul et al. 1990) against the nonredundant database of the National Center for Biotechnology and Information (NCBI; Betheseda, MD, USA) server. Boundaries of protein-coding genes and open reading frames (ORFs) were determined as described previously (Bélanger et al. 2006). Intron boundaries were determined by modeling intron secondary structures (Michel et al. 1989, Michel and Westhof 1990) and by comparing intron-containing genes with intronless homologues using FRAMEALIGN of the Genetics Computer Group (Madison, WI, USA) software (version 10.3) package.

Phylogenomic analyses.  We analyzed nucleotide and amino acid data sets derived from 44 chloroplast protein-coding genes of 11 chlorophytes and nine streptophytes as well as nucleotide and amino acid data sets derived from 57 chloroplast protein-coding genes of six chlorophycean green algae. Table S1 (in the supplementary material) reports the accession numbers of the C. moewusii, Floydiella, and Oedogonium gene sequences used for these phylogenetic inferences. The corresponding gene sequences of the 17 following green algae/land plants were retrieved from GenBank files: Anthoceros formosae (NC_004543), Chaetosphaeridium globosum (NC_004115), Chara vulgaris (NC_008097), Chlamydomonas reinhardtii (NC_005353), Chlorella vulgaris (NC_001865), Chlorokybus atmophyticus (NC_008822), Marchantia polymorpha (NC_001319), Mesostigma viride (NC_002186), Nephroselmis olivacea (NC_000927), Oltmannsiellopsis viridis (NC_008099), Ostreococcus tauri (NC_008289), Pseudendoclonium akinetum (NC_008114), Physcomitrella patens (NC_005087), Scenedesmus obliquus (NC_008101), Staurastrum punctulatum (NC_008116), Stigeoclonium helveticum (NC_008372), and Zygnema circumcarinatum (NC_008117).

The two data sets of concatenated protein sequences were prepared as described previously (Turmel et al. 2003) and analyzed using maximum likelihood (ML), maximum parsimony (MP), and LogDet-distance methods. ML trees were computed with PHYML 2.4.5 (Guindon and Gascuel 2003) under the cpREV45 model of amino acid substitutions (Adachi et al. 2000) + proportion of invariable sites (I) + gamma rate heterogeneity (Γ), with parameters estimated in PHYML. Bootstrap values (BV) for the various nodes were assessed using 100 bootstrapping replicates. MP trees were inferred using PROTPARS in PHYLIP 3.65 (Felsenstein 1995), and confidence of branch points was assessed by bootstrap percentages after 100 replications. For the LogDet-distance analyses, distances were calculated with LDDist (Thollesson 2004), and the proportion of invariant sites was estimated using the capture-recapture method of Steel et al. (2000). LogDet-distance trees were computed using PAUP* 4.0b10 (Swofford 2003) with the neighbor-joining search setting, and confidence of branch points was estimated by 100 bootstrap replications.

The two nucleotide data sets, which contain the gene sequences (first and second codon positions only) represented in the amino acid data sets, were prepared as described previously (Turmel et al. 2006) and analyzed using ML, MP, and LogDet-distance methods. ML trees were constructed using PHYML 2.4.5 under the general time reversible (GTR) model +I +Γ, with parameters estimated in PHYML. This model was selected by MODELTEST 3.6 (Posada and Crandall 1998) as the one best fitting our nucleotide data. MP trees were inferred using PAUP* 4.0b10; the full heuristic option was used for tree searches, and optimization was performed by branch-swapping using tree bisection and reconnection. In both ML and MP analyses, confidence of branch points was estimated by 100 bootstrap replications. LogDet-distance trees were computed using PAUP* 4.0b10 with the neighbor-joining search setting. The LogDet-distances were calculated with LDDist, and the proportion of invariant sites was estimated using the capture-recapture method of Steel et al. (2000). Confidence of branch points was estimated by 100 bootstrap replications.

Approximately unbiased (AU) tests (Shimodaira 2002) were performed with CONSEL 0.1i (Shimodaira and Hasegawa 2001) on the amino acid and nucleotide data sets derived from the 57 chloroplast protein-coding genes to evaluate the three possible topologies regarding the branching order of the Oedogoniales, Chaetophorales, and Chaetopeltidales. Site-wise log-likelihoods for each test tree were computed with TREE-PUZZLE 5.2 (Schmidt et al. 2002) using the –wsl options. Bootstrap support of individual genes for the alternative topologies was determined under the GTR+Γ model by resampling of the estimated log likelihood (RELL) using BASEML (option MGENE=1) in PAML 3.15 (Yang 1997). Specific rates of evolution for the individual genes were also estimated under the GTR+Γ model using BASEML (option MGENE=0).

Results

Chloroplast sequence data sampled in this study.  The chloroplast genomes of three additional chlorophycean taxa were examined in this study, bringing to six the total number of chlorophycean cpDNAs scrutinized to date. The partial genome sequences we generated for C. moewusii and Oedogonium include the full sequences of all 64 protein-coding genes shared by the completely sequenced C. reinhardtii, Scenedesmus and Stigeoclonium cpDNAs (Bélanger et al. 2006). However, the sequence data we gathered for Floydiella revealed partial sampling of 10 of these protein-coding genes and no evidence for the presence of five protein-coding genes (atpA, rps7, rps11, rps18, and rps19). Missing sequence data were excluded from the phylogenetic and structural analyses of protein-coding genes reported in the present study. As indicated in Table S1, we used the full coding sequences of 49 genes and the partial sequences of eight genes to assemble the amino acid and nucleotide data sets derived from the six chlorophycean green algae. This amino acid data set contains a total of 11,375 sites, 1,717 of which are phylogenetically informative, whereas the nucleotide data set (first and second codon positions) contains 24,716 sites, 4,053 of which are phylogenetically informative. In addition, amino acid and nucleotide data sets with a broader taxon sampling were constructed from the full sequences of 38 genes and the partial sequences of six genes from 11 chlorophytes and nine streptophytes. Of the 7,825 amino acid sites that were sampled for these 20 green algal/land plant taxa, 2,913 are phylogenetically informative, whereas 5,745 of the 16,626 sites in the nucleotide data set (first and second codon positions) are phylogenetically informative.

Phylogenomic analyses.  ML, MP, and LogDet-distance analyses of the amino acid and nucleotide data sets derived from 44 chloroplast genes of 20 taxa were conducted to decipher the relationships between the five monophyletic groups of the Chlorophyceae and to examine the phylogenetic position of this class within the Chlorophyta (Fig. 1). The best trees inferred with all three methods identified two major lineages within the Chlorophyceae: a clade uniting the Chlamydomonadales and Sphaeropleales (CS clade), and a clade uniting the Oedogoniales, Chaetophorales, and Chaetopeltidales (OCC clade). Each of these clades received 100% bootstrap support in all analyses. The branching order observed in the OCC clade was poorly resolved and differed depending on the method of analysis or the data set. Both ML and MP analyses of the amino acid data set favored the hypothesis that Oedogonium is sister to Stigeoclonium and Floydiella; however, LogDet analysis of the same data set identified Floydiella as sister to Oedogonium and Stigeoclonium (55% BV) (Fig. 1A). In ML analysis of the nucleotide data set, Floydiella also shared a sister relationship with Oedogonium and Stigeoclonium (45% BV), but MP analysis placed Stigeoclonium as sister to Oedogonium and Floydiella (92% BV), and LogDet analysis identified Oedogonium as sister to Stigeoclonium and Floydiella (53% BV) (Fig. 1B). In agreement with previously reported phylogenetic studies of chlorophytes based on chloroplast genomic data (Pombert et al. 2005, Lemieux et al. 2007), the amino acid and nucleotide data recovered conflicting topologies regarding the position of the Chlorophyceae within the Chlorophyta (Fig. 1). The amino acid data placed the Chlorophyceae as sister to the Ulvophyceae and Trebouxiophyceae (Fig. 1A), whereas the nucleotide sequence data resolved the Chlorophyceae as sister to the Ulvophyceae (Fig. 1B). With regard to the divergence order of the main streptophyte lineages, the trees reported here are also consistent with recent chloroplast phylogenomic studies (Turmel et al. 2006, Lemieux et al. 2007).

Figure 1.

 Relationships among the five monophyletic groups of the Chlorophyceae as inferred from 44 chloroplast protein-coding genes of 11 chlorophytes and nine streptophytes. (A) Best maximum-likelihood (ML) tree based on amino acid sequences. (B) Best ML tree based on nucleotide sequences (first and second codon positions). The chlorophyte phylogeny was rooted using streptophytes belonging to the Charophyceae and Embryophyta as outgroup. The nodes that received 100% bootstrap support in ML, maximum-parsimony (MP), and LogDet-distance analyses are denoted by asterisks. For the other nodes, the values obtained in ML, MP, and LogDet-distance analyses are listed in this order from left to right. The 44 genes analyzed are indicated in Table S1 (See the Supplementary material).

To gain deeper insights into the relationships between the Chaetopeltidales, Chaetophorales, and Oedogoniales, ML, MP, and LogDet-distance phylogenies were inferred from the amino acid and nucleotide data sets derived from the 57 protein-coding genes of the six chlorophycean green algae. As shown in Figure 2, both the amino acid and nucleotide data strongly support the hypothesis that the Oedogoniales diverged before the Chaetophorales and Chaetopeltidales (T1 topology). In AU tests based on the amino acid data, the alternative topologies showing Stigeoclonium (T2) or Floydiella (T3) as the deepest branch of the OCC clade were rejected at the 5% confidence level (T2, = 0.003; T3, = 4e-04). Similarly, T3 was determined as significantly worse than T1 at the 5% confidence level (= 0.009) in AU tests based on the nucleotide data; however, T2 was not rejected (= 0.056).

Figure 2.

 Branching order of the Chaetopeltidales, Chaetophorales, and Oedogoniales as inferred from 57 chloroplast protein-coding genes of six chlorophycean green algae. (A) Best maximum-likelihood (ML) tree based on amino acid sequences. (B) Best ML tree based on nucleotide sequences (first and second codon positions). The nodes denoted by asterisks received 100% bootstrap support in ML, maximum-parsimony (MP), and LogDet-distance analyses. For the remaining node, the bootstrap values obtained in ML, MP, and LogDet-distance analyses are listed in this order from left to right. The 57 genes analyzed are listed in Table 1.

The phylogenies reported in both Figures 1B and 2B were inferred from nucleotide data sets including the first and second codon positions only. The inclusion of third codon positions in nucleotide data sets is well known to cause potential problems for phylogenetic inferences because changes at this codon position occur frequently and can result in saturation of substitutions. Furthermore, if taxa vary significantly in base composition, as we observed for the two data sets analyzed in this study (data not shown), heterogeneity at this position can violate the basic assumption that base composition over the sequences being studied is at equilibrium. Although the use of amino acid sequence characters has been advocated to circumvent these problems, systematic bias in amino acid composition or convergence of amino acid characters can also result in misleading phylogenies (Lockhart et al. 1999, Simmons 2000). When sequences differ markedly in nucleotide or amino acid compositions, LogDet distances allow the recovery of the correct tree for cases where substitution processes are otherwise uniform across the underlying tree (Lockhart et al. 1994). In the present investigation, we inferred trees from nucleotide data sets incorporating all three codon positions but found no significant changes in branching order or degree of resolution compared to the corresponding trees inferred from data sets displaying the first two codon positions (data not shown). Furthermore, the LogDet trees inferred from the data sets containing all three codon positions were not significantly different from ML and MP trees (data not shown).

To gain insight into the phylogenetic contributions of individual genes, we assessed the levels of support for the T1, T2, and T3 topologies that are provided by each of the 57 genes in the nucleotide data set. These analyses revealed no significant conflict among these genes. Most of the genes yielded a phylogenetic signal too weak to discriminate with confidence the T1, T2, and T3 topologies, implying that concatenation of the gene sequences was essential to obtain a robust phylogeny. As shown in Table 1, all 57 genes favor one of the three topologies, but only two genes, rpoB and psaC, are able to reject either one or both of the alternative topologies at the 5% confidence level. The rpoB gene supports T1 and rejects T2, whereas psaC favors T2 and rejects both T1 and T3. Although the genes supporting T1 are less numerous compared to those supporting T2, they feature a larger number of phylogenetically informative sites (Table 1). This last observation explains why the concatenated data set recovered T1 as the best tree.

Table 1.   Individual gene support for the T1, T2, and T3 topologies in the maximum-likelihood analysis presented in Figure 2B.
GeneSitesaRate of evolutioncRELL bootstrap supportb
TotalVariableT1T2T3
  1. aNumber of sites corresponding to each gene in the concatenated nucleotide data set.

  2. bFor each gene, RELL (resampling of the estimated log likelihood) bootstrap probabilities are provided for the three topologies. The asterisks denote the topologies that proved to be significantly worse than the best tree in the Shimodaira and Hasegawa (1999) test. T1, {(Floydiella, Stigeoclonium), Oedogonium}; T2, {(Floydiella, Oedogonium), Stigeoclonium}; T3, {(Stigeoclonium, Oedogonium), Floydiella}.

  3. cThe gene-specific rate of atpB has been arbitrarily set to 1.00 in this BASEML analysis.

  4. dPartial gene sequences.

Supporting T1 (18 genes; 3,127 variable sites)
atpE270771.520.5650.0530.382
 atpF3221222.290.6380.1820.179
 cemA5141432.070.5440.3180.138
 chlL538730.720.8010.1320.067
 clpP7722533.220.6500.2550.095
 petB428470.570.7240.1270.149
 psaB1,4701620.810.6670.0110.322
 psbE162260.600.8530.0490.098
 psbK90291.870.6890.2050.106
 rpl5358871.430.5590.1460.295
 rpl16266741.500.5410.1080.351
 rpl20216812.670.5390.2780.183
 rpl3672201.050.7460.0630.190
 rpoBd1,8144702.970.927 0.004*0.070
 rpoC11,0563583.780.7590.1890.051
 rpoC22,1128047.320.8510.1200.029
 rps2d86523.270.6400.0440.316
 rps35582494.220.7440.0240.233
Supporting T2 (23 genes; 1,662 variable sites)
 atpB9521421.000.1310.8580.011
 atpH164240.600.2990.6250.076
 atpI472961.150.0380.5550.407
 ccsA4941422.700.1560.5550.288
 chlNd486941.160.3410.5300.129
 petD318560.870.0760.6120.313
 psaC160260.65 0.006*0.991 0.004*
 psaJ82220.990.1820.5540.263
 psbAd158290.820.3190.4920.189
 psbB1,0101200.790.0800.8560.065
 psbD702640.540.3300.3880.282
 psbF86271.540.1200.6190.262
 psbH156531.850.0390.6400.321
 psbJ82311.920.1520.6300.218
 psbT62120.350.4170.4460.137
 psbZ124341.700.0590.6870.255
 rbcL948800.520.4580.4840.058
 rpl14232521.070.4200.5700.010
 rpl23176772.650.1530.6550.192
 rpoA4121864.050.3620.5230.115
 rps4d3801282.260.4110.5830.006
 tufA8341331.050.1170.8460.037
 ycf1258343.810.1090.5800.311
Supporting T3 (16 genes; 1,001 variable sites)
 chlBd7701300.980.4240.1070.469
 petG72130.430.2870.1900.524
 petL62242.130.0940.3790.527
 psaAd1,2401330.810.0320.0750.893
 psbCd478670.730.0420.1440.814
 psbI68130.580.0730.2130.715
 psbL74170.650.0600.1870.753
 psbM66150.980.0730.2080.719
 psbN88200.760.3670.2350.398
 rpl25501331.560.1440.1540.702
 rps8242882.250.2250.3480.426
 rps9238902.250.1930.2640.543
 rps12256550.980.0270.0350.938
 rps14198722.300.4910.0050.504
 ycf3330380.420.3530.0910.555
 ycf4332931.670.2370.2450.518

Structural features of individual genes supporting the phylogenetic inferences.  The sequence alignments of the proteins encoded by the 57 genes examined were searched for unambiguous insertions/deletions (indels) supporting the branching order observed for the chlorophycean lineages in phylogenetic trees. As shown in Table 2, a total of eight indels present in six genes support the basal split of the Chlorophyceae into the CS clade and the OCC clade. The indel of >800 codons in rps4 is conspicuous not only because of its exceptionally large size but also because it is correlated with the presence/absence of a highly conserved region corresponding to the last 40 codons of the gene. This conserved region and the associated insertion are missing from the Scenedesmus, C. reinhardtii, and C. moewusii cpDNAs. In Oedogonium and Stigeoclonium, insertions of 2,562 and 2,682 codons, respectively, immediately precede the 3′ conserved region of rps4. A prominent insertion is also found at this locus in Floydiella, but its sequence was only partly recovered in the sequence assembly examined in this study. No indel unambiguously supports one of the three possible hypotheses for the branching order in the OCC clade.

Table 2.   Indels in coding regions of chloroplast genes supporting the basal split of chlorophycean lineages.
Gene siteaSize (codons)OcShFtSoCrCm
  1. Plus and minus signs denote the presence and absence of the insertion, respectively. Oc, Oedogonium cardiacum; Sh, Stigeoclonium helveticum; Ft, Floydiella terrestris; So, Scenedesmus obliquus; Cr, Chlamydomonas reinhardtii; Cm, Chlamydomonas moewusii.

  2. aThe position of each indel is indicated after the gene name and is given relative to the corresponding gene in Mesostigma cpDNA; the position of the codon immediately preceding the insertion is reported.

  3. bThe Scenedesmus, C. reinhardtii, and C. moewusii rps4 genes lack the highly conserved 3′ terminal region occurring in virtually all green plants. In Oedogonium and Stigeoclonium, this conserved region is preceded by an insertion of 2,562 and 2,682 amino acids, respectively. The exact size of the corresponding Floydiella insertion remains unknown because this region has not been completely sequenced.

atpH_821+++
psbB_901+++
psbL_115–21+++
rps3_252+++
rps3_591+++
rps4_321+++
rps4_163>800b+++
ycf3_1673–8+++

We also examined the distributions of introns, inteins, and fragmented genes among the 57 gene sequences to identify any possible correlations between the presence/absence of these genomic features and the branching order inferred from sequence data (Fig. 3 and Table 3). For the introns of the group II family, particularly those that are putatively trans-spliced at the RNA level, the observed distributions revealed congruence with our phylogenetic results (Fig. 3). The trans-spliced group II introns inserted at common positions in the petD and psaC genes of Oedogonium, Floydiella and Stigeoclonium represent synapomorphies that distinguish the major clade uniting these taxa from the CS clade. In turn, the synapomorphic characters represented by the trans-spliced introns found at two distinct positions in rbcL of Stigeoclonium and Floydiella support the shared ancestry of the Chaetophorales and Chaetopeltidales and the emergence of these lineages after the divergence of the Oedogoniales. The irregular distribution displayed by cis-spliced introns of the group I and group II families is most probably accounted for by horizontal DNA transfers and events of intron loss.

Figure 3.

 Distributions of introns in the chlorophycean chloroplast gene sequences analyzed in this study. Circles denote the presence of group I introns, and squares denote the presence of group II introns. Divided squares represent trans-spliced group II introns. Open symbols denote the absence of intron ORFs, whereas filled symbols denote their presence. Intron insertion sites are given relative to the corresponding genes in Mesostigma cpDNA; for each site, the position of the nucleotide immediately preceding the intron is reported.

Table 3.   Distribution of fragmented genes in the six chlorophycean chloroplast genomes examined in this study.
GeneOcShFtSoCrCm
  1. Plus and minus signs denote the presence and absence of a fragmented gene, respectively. Oc, Oedogonium cardiacum; Sh, Stigeoclonium helveticum; Ft, Floydiella terrestris; So, Scenedesmus obliquus; Cr, Chlamydomonas reinhardtii; Cm, Chlamydomonas moewusii.

  2. aIt could not be determined whether the Floydiella rps2 is fragmented, because this gene has been only partially sequenced.

rpoB++++++
rpoC1+
rpoC2+
rps2nda++

As observed for the cis-spliced introns, we determined that the distribution of inteins in chlorophycean chloroplast genomes is not consistent with our phylogenomic analyses. Only the C. eugametos clpP and Stigeoclonium rpoB were previously known to contain intein sequences (Huang et al. 1994, Wang and Liu 1997, de Cambiaire et al. 2007); these inteins are designated as Ceu ClpP and She RPB2, respectively, in InBase (Perler 2002). Analyses of the sequences determined in the present investigation revealed that the C. moewusii clpP product contains an intein sequence (Cmo ClpP) at the same site as Ceu ClpP, and that no intein is encoded by the clpP genes of the five other chlorophyceans examined. In addition, we observed that the C. moewusii and Floydiella rpoB genes encode inteins (Cmo RPB2 and Fte RPB2, respectively) that are positionally homologous to the She RPB2 intein. These three chlorophycean rpoB inteins share sequence similarity not only among themselves but also with the Ceu ClpP and Cmo ClpP inteins. Detailed information on the domains carried by chlorophycean chloroplast inteins can be found in InBase (http://tools.neb.com/inbase/).

Table 3 shows the distributions of genes known to be fragmented into two distinct ORFs in one or more of the six chlorophycean green algae examined. In contrast to the genes carrying trans-spliced introns, the distinct pieces containing the coding sequences of fragmented genes are usually contiguous in the genome. Of the four genes that were identified in this category (rpoB, rpoC1, rpoC2, and rps2), two (rpoC1 and rpoC2) are lineage-specific and occur in different lineages of the Chlamydomonadales. The rpoB gene has a fragmented structure in all six chlorophyceans, whereas rps2 is fragmented in the sphaeroplealean Scenedesmus and in the chlamydomonadalean C. reinhardtii, but not in C. moewusii. From these structural data, we conclude that there exists no correlation between the presence/absence of gene fragmentation and the branching order of chlorophycean lineages. For rps2, our observations rather suggest that this gene was fragmented convergently in independent lineages of the CS clade.

Discussion

By providing unequivocal support for the division of the Chlorophyceae into two major lineages (the CS and OCC clades), our study sheds new light on the evolution of this group of green algae. The dichotomy of the Chlorophyceae is strongly supported not only by phylogenetic analyses of conserved coding regions from multiple chloroplast genes (Figs. 1 and 2), but also by independent molecular signatures in chloroplast genes, such as indels (Table 2) and the presence/absence of trans-spliced introns (Fig. 3). Previously reported phylogenies of the Chlorophyceae favored the notion that the Chlamydomonadales and Sphaeropleales are sister lineages but could not resolve the branching order of the Oedogoniales, Chaetophorales, and Chaetopeltidales, which were generally identified as earlier divergences relative to the CS clade (Buchheim et al. 2001, Shoup and Lewis 2003, Müller et al. 2004, Alberghina et al. 2006). The topologies observed for the latter three lineages varied depending on the taxa sampled and the method of phylogenetic inference used. In this context, it is interesting to mention that a phylogenetic study of combined 18S and 26S rDNA sequences by Buchheim et al. (2001) using the substitution rate calibration method disclosed the basal dichotomy reported here for the Chlorophyceae; however, statistical support for the nodes associated with the OCC clade was very weak (<50% BV).

The robust support received by the CS and OCC clades demonstrates that comparative analysis of chloroplast genomes offers a powerful approach for resolving chlorophyte phylogenies at the order and class levels. Our phylogenomic analyses of >40 chloroplast genes from a limited number of taxa were able to identify with confidence the branching order of chlorophycean lineages that could not be resolved in trees inferred from the nuclear-encoded 18S and/or 26S rRNA genes from a broad range of taxa (Buchheim et al. 2001, Wolf et al. 2002, Shoup and Lewis 2003, Müller et al. 2004, Alberghina et al. 2006). Multiple gene sequences derived from the chloroplast genome were recently used to gain valuable insights into problematic nodes (Leebens-Mack et al. 2005, Pombert et al. 2005, Cai et al. 2006, Qiu et al. 2006, Turmel et al. 2006, 2007, Lemieux et al. 2007, Rogers et al. 2007), and, as reported here, phylogenetic conclusions were strengthened by the analysis of structural genomic features (gene content, gene order, gene structure, or intron content) in a number of cases (Pombert et al. 2005, Qiu et al. 2006, Turmel et al. 2006, Lemieux et al. 2007).

On the other hand, genome-scale phylogenetic studies with sparse taxon sampling have been criticized because they can yield well-supported trees that do not reflect true organismal relationships (Soltis et al. 2004). This explains why phylogenetic hypotheses derived from such studies need to be supported by independent data. Of course, the more diversified and numerous these independent data, the better will be the confidence in the trees inferred. When independent data consist of structural genomic characters, as is the case in the present study, there is the possibility that some of these characters will be interpreted incorrectly due to poor knowledge about their evolution. Finally, lateral DNA transfer is another potential source of error that may confound phylogenomic studies. Even if a given phylogenetic hypothesis is strongly reinforced by structural data derived from the same genome that was used for phylogenetic reconstruction, the possibility still remains that it does not reflect the true organismal relationships. This exceptional situation can be encountered if, in a specific lineage, large genomic segments or even the whole genome analyzed were once subjected to lateral DNA transfer. The latter scenario has been proposed to explain the inconsistent positions observed for the Charales in streptophyte trees inferred from chloroplast and mitochondrial phylogenomic trees (Turmel et al. 2003, 2006, 2007). In the study reported here, we assumed that the chloroplast genome was vertically inherited as a single unit in all lineages of the Chlorophyceae, but the validity of this assumption will need to be tested by inferring robust chlorophycean phylogenies based on nuclear or mitochondrial genes.

Our phylogenomic results do not support the hypothesis that the Oedogoniales represent the earliest offshoot of the Chlorophyceae (Buchheim et al. 2001, Wolf et al. 2002, Shoup and Lewis 2003, Müller et al. 2004, Alberghina et al. 2006) and therefore refute the proposal that the Oedogoniales be considered as a separate class (Booton et al. 1998), which is sister to the Chlorophyceae. Considering the early diversification of the Chlorophyceae into two separate clades, the possibility of raising each clade to the class level would seem a more reasonable proposal. To assess whether there are grounds for such a taxonomic change, future efforts should be deployed to identify ultrastructural and morphological characters distinguishing the CS and OCC clades.

The branching order of the three lineages within the OCC clade could not be resolved without any ambiguity, because the hypothesis that the Chaetophorales are sister to the Oedogoniales and Chaetopeltidales (T2) could not be eliminated in the AU test. Nevertheless, we are confident that the T1 topology reflects the true organismal relationships for the following three reasons. First, the sister relationship of the Chaetophorales and Chaetopeltidales is well supported in trees inferred from the 18S rRNA gene (Wolf et al. 2002) and from the combined 18S and 26S rRNA genes (Buchheim et al. 2001, Müller et al. 2004). Second, this sister relationship is consistent with the similarities in zoospore structure observed between the Chaetophorales and Chaetopeltidales (O’Kelly et al. 1994). More specifically, these similarities include the DO or near-DO orientation of the basal bodies, the absence of proximal fibers, the presence of proximal sheaths subtending the basal bodies, the attachment points and positions of rhizoplasts within the cell, and an array of microtubule-associated components at the proximal end of the “d” rootlets. Third and most importantly, trans-spliced group II introns are present at two identical positions in the rbcL genes of Floydiella and Stigeoclonium but are absent from the corresponding Oedogonium gene, arguing for a close alliance between the Chaetophorales and Chaetopeltidales. The fragmentation of a cis-spliced intron into a bipartite trans-spliced intron is an infrequent event, and when it occurs, the trans-spliced intron is predicted to be highly stable. Indeed, deletion of a trans-spliced intron through the mechanism proposed for cis-spliced introns (i.e., through homologous recombination of a reverse-transcribed mature mRNA; Bonen and Vogel 2001) would appear to be very unlikely, as the two intron pieces and their associated exons map to distant genomic loci. Given the rarity of the events leading to the creation and disappearance of a trans-spliced intron, the finding that the Floydiella and Stigeoclonium rbcL genes share not only one but two trans-spliced introns at the same sites is compelling evidence for the sister relationship of the Chaetophorales and Chaetopeltidales. This sister relationship and the early divergence of the Oedogoniales will need to be confirmed in future studies by analyzing chloroplast genes from additional taxa of the OCC group and by comparing the fine structures and organizations of the Oedogonium, Floydiella, and Stigeoclonium chloroplast genomes. In addition to supplementary gene sequences for phylogenetic analyses, the latter approach would provide other structural genomic features to test phylogenetic analyses. Among the structural features to be compared, gene order is expected to represent a powerful marker to track phylogenetic relationships, provided that gene syntheny has been conserved to a sufficient extent in chloroplast genomes of the OCC clade, as is the case for the CS clade (de Cambiaire et al. 2006).

The relationships we inferred for the Chlorophyceae allow us to refine current hypotheses regarding the evolution of the flagellar apparatus. Character state reconstruction of basal body orientation and flagella number using topology T1 as a phylogenetic framework clearly predicts that the last common ancestor of all chlorophycean green algae featured quadriflagellate motile cells with the DO + DO orientation (Fig. 4). Changes from the DO to CW condition occurred convergently in the CS and OCC clades and marked the evolution of the Chlamydomonadales and the Chaetophorales. Buchheim et al. (2001) could not unequivocally infer the evolutionary patterns of flagellar orientation following their mapping of character states on a consensus chlorophycean 18S and 26S topology with polytomies at the unresolved OCC and CS nodes; however, they concluded that the CW condition probably arose independently in the Chlamydomonadales and the Chaetophorales, and that the DO condition is likely plesiomorphic within the Chlorophyceae. The evolutionary history reconstructed from our phylogenetic data also suggests that the stephanokont flagellar apparatus displayed by the Oedogoniales arose from the DO condition, and more importantly, it supports the prediction of O’Kelly and Floyd (1984) that state characters changed in the order CCW→DO→CW.

Figure 4.

 Evolution of the absolute orientation of the flagellar apparatus in the Chlorophyceae. The ancestral states of this character were reconstructed using topology T1 and MacClade 4.08 (Maddison and Maddison 2000). The most parsimonious reconstruction of character states is shown. The boxes at the terminal tips of the tree denote the orientation patterns observed for biflagellate and quadriflagellate cells within the CS and OCC clades. In the case of the Oedogoniales stephanokonts, the orientation pattern is ambiguous. CCW, counterclockwise; CS, Chlamydomonadales + Sphaeropleales; CW, clockwise; DO, directly opposed; OCC, Oedogoniales + Chaetophorales + Chaetopeltidales.

Acknowledgments

We thank Harold Anglehart for his assistance in determining the Oedogonium chloroplast genome sequence. This work was supported by the Natural Sciences and Engineering Research Council of Canada (to C. L. and M. T.).

Ancillary