Chloroplast genomes of photosynthetic eukaryotes


(fax +1 604 822 6089; e-mail


Chloroplast genomes have retained a core set of genes from their cyanobacterial ancestor, most of them required for the light reactions of photosynthesis or functions connected with transcription and translation. Other genes have been transferred to the nucleus or were lost in a lineage-specific manner. The genomes are distinguished by the selection of genes retained, whether or not transcripts are edited, presence/absence of introns and small repeats and their physical organization. Plants and green algae have kept fewer plastid genes than either the red algae or the chromistan algae, which obtained their plastids from red algae by secondary endosymbiosis. Photosynthetic dinoflagellates have the fewest (fewer than 20), but still grow photoautotrophically. All chloroplast genomes map as a circle, but there have been extensive rearrangements of gene order even between related species. Genome sizes vary much more than gene content, depending on the extent of gene duplication and small repeats and the size of intergenic spacers.


The evolution of oxygenic photosynthesis changed the world. The build-up of atmospheric oxygen that started around 2.7 billion years ago was largely due to the ability of cyanobacteria to extract electrons from water (releasing molecular oxygen) with the help of two photosystems linked in series (Holland, 2006; Farquhar et al., 2010). Some billion or so years later there was a second significant development: an endosymbiotic relationship between a cyanobacterium and a heterotrophic eukaryote, whose ancestors had already acquired a mitochondrion via endosymbiosis of a proteobacterium. In the subsequent billion years, this change gave rise to the wide variety of photosynthetic eukaryotes on earth today. Many lines of evidence support the idea that the genomes of both chloroplasts and mitochondria are the remains of the genomes of their prokaryotic endosymbionts. This article is an overview of the chloroplast genomes of modern eukaryotes. More details about specific aspects of organelle genome evolution can be found in a number of recent reviews (Reyes-Prieto et al., 2007; Gould et al., 2008; Archibald, 2009; Kleine et al., 2009; Keeling, 2010).

I will use the terms ‘chloroplast’ and ‘plastid’ interchangeably. The latter term properly includes non-photosynthetic plastids lacking chlorophyll, such as those of roots and seeds as well as those of symbionts and parasites. In the interests of simplicity, algae will be referred to by their generic name only, unless there is more than one species of the genus under discussion.

The Tree of Life is undergoing a major shake-up

The concepts of endosymbiosis and gene transfer from endosymbiont to host are central to our studies of organelle genomes. But in order to understand the similarities and differences among photosynthetic eukaryotes and their chloroplast genomes, we need to start with a quick look at current ideas about the Tree of Life (Figure 1).

Figure 1.

 The eukaryotic Tree of Life showing the five eukaryotic supergroups. Thin lines with arrowheads indicate the establishment of endosymbiotic relationships between a photosynthetic endosymbiont and a non-photosynthetic eukaryotic host. Figure modified from Neilson and Durnford (2010).

Thanks to whole genome sequencing and the proliferation of expressed sequence tag (EST) projects, the Tree of Life is undergoing major revisions. Being able to calculate a phylogenetic tree using 100 or more shared protein sequences provides a much more robust result, especially when there are large evolutionary distances between taxa (Burki et al., 2007, 2008; Rodriguez-Ezpeleta et al., 2007; Hampl et al., 2009). This area is the new field of ‘phylogenomics’. As a result, branches of the Tree have formed and coalesced and moved around over the last 10 years, so that the current picture in Figure 1 bears little resemblance to the usual textbook diagram of even a few years ago, and may be quite a surprise to some readers.

The current consensus is that all eukaryotes belong to one of just five major lineages (supergroups), although there is still considerable debate about the details (Hackett et al., 2007; Burki et al., 2008; Hampl et al., 2009; Nozaki et al., 2009; Green, 2011). The only group with no photosynthetic members, sometimes referred to as the Unikonta, is made up of the animals plus the fungi (together called the Opisthokonta) plus the Amoebozoa. That group is reduced to its proper position in the order of things in Figure 1 and will be ignored for the rest of this article.

The group of most interest to readers of this review is the Plantae, all of whom have a plastid, or lost it relatively recently. This includes the green plants, several branches of green algae, the red algae and the glaucophyte algae. They all have primary plastids, i.e. with only two membranes around them. Where the plastids differ most noticeably is in their light-harvesting machinery. The green algae and plants have membrane-intrinsic antenna proteins [members of the light-harvesting complex (LHC) superfamily] that bind both chlorophylls a and b and are associated with both photosystems, while the red algae have members of the same family that bind only Chl a and are associated only with Photosystem I (Gantt et al., 2003; Green, 2003, 2007). The glaucophytes do not have this type of antenna, but along with the red algae they have extrinsic antennas called phycobilisomes, which bind linear tetrapyrroles (Neilson and Durnford, 2010). The membrane-intrinsic antenna proteins are all nucleus-encoded and therefore not properly part of this review, but they are introduced here because their pigments were the original basis for classifying photosynthetic eukaryotes and they still provide a unifying framework for many groups (Graham and Wilcox, 2000).

The chromalveolates are a large and complex supergroup that includes all the algae with LHCs binding chlorophylls a and c: the cryptophytes (cryptomonads), the haptophytes, and a very diverse group called heterokonts or stramenopiles (Cavalier-Smith, 2002). The latter include many of the better-known marine algae such as the kelps and the diatoms. These three will be collectively referred to as ‘chromists’ in this review because of their plastid similarities, although there is some controversy about how they are related (reviewed in Archibald, 2009; Green, 2011; Keeling, 2010).

The ‘alveolate’ branch of the chromalveolates includes a fourth group of algae with Chl c, the dinoflagellates. The dinoflagellates are unique in many respects (Hackett et al., 2004; Archibald, 2009), particularly in their minicircular chloroplast genes (Green, 2004; Howe et al., 2008; Zhang et al., 1999). Their closest relatives are the apicomplexan parasites, which have a tiny relict plastid with a small genome, even though none of them are photosynthetic. The apicomplexans include Plasmodium, an obligate parasite that is the causal agent of malaria, and other medically important pathogens such as Toxoplasma, Babesia and Cryptosporidium (Lim and McFadden, 2010). Two photosynthetic protists sharing some characteristics of both dinoflagellates and apicomplexans have recently been reported (Janouśkovec et al., 2010; Moore et al., 2008).

However, the most interesting thing about the chromalveolates is the evidence that they acquired their plastids via one or more secondary endosymbioses, i.e. endosymbioses that involved two eukaryotes: a non-photosynthetic host and a red algal endosymbiont (Figure 2). Similarly, the Excavates and the Rhiziaria (which are otherwise non-photosynthetic) each include a small clade with plastids acquired by secondary endosymbiosis, but in this case the plastids were acquired from green algae (green circles in Figure 1). As endosymbiosis of various sorts has played such a role in the evolution of photosynthetic eukaryotes and their organelles, we need to look in more detail at endosymbioses and their functional consequences.

Figure 2.

 The process of secondary endosymbiogenesis. A heterotrophic eukaryotic host engulfs (or is invaded by) a red or green alga with a primary plastid. Eventually, a stable endosymbiotic relationship is established, after many genes from the endosymbiont nucleus (N1) have been transferred to the host nucleus (N2) or have been lost. The endosymbiont’s mitochondria (small grey circles) and most of its cytoplasmic compartment have also been lost. At the intermediate stage, the endosymbiont’s nucleus has been reduced to a nucleomorph (Nm) with three small chromosomes and 400–600 genes. The cryptophytes and chlorarachniophytes have not (yet) progressed beyond this stage. In most groups with secondary plastids, the nucleomorph has completely disappeared with the transfer of more genes to the host nucleus; the periplastid space is very small. The small black circles represent the host mitochondria. ER, endoplasmic reticulum.

Endosymbiosis and organelle genome evolution

The concept of endosymbiosis is central to understanding cellular evolution. Both the mitochondrion and the chloroplast are believed to have originated from eubacterial cells that became intracellular symbionts of a host cell. Over time, genes got transferred from the endosymbiont genome to the host nucleus, and some acquired targeting sequences that enabled their products (now synthesized on cytoplasmic ribosomes) to be retargeted back to the organelle. In addition, some host genes acquired targeting sequences that enabled their products to be imported into the organelle (reviewed in Andersson et al., 2003; Archibald, 2009; Kleine et al., 2009). The process of complete integration of a proto-organelle would have been a lengthy process even on an evolutionary time-scale because it would have required the evolution of complex membrane transport mechanisms for organic and inorganic nutrients, lipids, proteins and other macromolecules. One important consequence of mixing nuclear and organellar genes was the evolution of many new or modified metabolic pathways and control networks (Gould et al., 2008; Lim and McFadden, 2010).

The standard dogma is that the mitochondrion represents the remains of an alpha-proteobacterium that established an endosymbiotic relationship with a primitive anaerobic eukaryotic cell, which must have possessed the rudiments of an endomembrane system and a nuclear envelope (Andersson et al., 2003). Although various scenarios for the origin of the eukaryotic cell have been proposed (Martin and Müller, 1998; Rivera and Lake, 2004), the molecular evidence strongly supports the relationship of many mitochondrial genes to those of proteobacteria (Fitzpatrick et al., 2005). Over a very long period of evolutionary time, mitochondrial genomes lost all but a handful of genes to the nucleus (reviewed by Adams and Palmer, 2003), and a few single-cell organisms lost their mitochondrial genomes altogether, along with the ability to perform oxidative phosphorylation (van der Giezen and Tovar, 2005; Embley and Martin, 2006). However, the genes of mitochondrial origin in the nuclear genomes of the amitochondriates show that their ancestors did have mitochondria (Embley and Martin, 2006). All the photosynthetic eukaryotes have functional mitochondria, even though some of them have transferred more genes to the nucleus than others (Box 1).

Table Box 1.   Basic principles of organelle genomics (both chloroplasts and mitochondria)
How much genetic information remains in the organelle genome?
 • In general, mitochondrial and chloroplast genomes have a  basic set of genes related to the function of the organelle and  its reproduction.
 • Differences are mainly due to gene transfer to the nucleus or  outright loss, particularly tRNA and ribosomal protein genes.
 • In contrast, the physical sizes of organelle genomes  vary enormously, due to duplications, repeats, introns,  non-coding sequence and acquisitions from elsewhere.
How is the genome organized?
 • Most organelle genomes can be assembled into a circular map,  except for a few short linear genomes.
 • However, the real physical organization can involve long  linear concatamers, branched forms and very few circular  molecules.
 • Rearrangement of gene order has been extensive, even  between closely related taxa.
What are the most variable characteristics?
 • Repeats (of many types) distinguish different genomes.
 • Introns (mobile or not) can vary even between ecotypes.
 • Editing of gene transcripts.
Can organelle genes be used to determine evolutionary  relationships?
 • Yes, for taxa in the same major clade.
 • Only with caution for distantly related taxa, because rates of  evolution have not been constant over evolutionary time.

Most readers will be familiar with the idea that chloroplasts are the result of another endosymbiosis, involving a cyanobacterial endosymbiont and a eukaryotic host cell that already had a mitochondrion. This event is usually referred to as primary endosymbiosis (ignoring the mitochondrion) and the plastids are referred to as primary plastids. Primary plastids bounded by two envelope membranes are what define the Plantae (Figure 1). Many cyanobacterial genes were transferred to the host nucleus and their products targeted back to the chloroplast. Many others were lost, as their products were no longer needed, and some nuclear host genes were recruited for chloroplast duty with the addition of the appropriate targeting sequences (reviewed by Kleine et al., 2009). It is estimated that at least 2000 of the Arabidopsis nuclear genes are of cyanobacterial origin (Martin et al., 2002).

At some point, more than a billion years ago by some molecular clock estimates (Zimmer et al., 2007), the Plantae diverged into three separate lineages: the glaucophytes, the rhodophytes (red algae) and the ‘green lineage’, which includes all the plants and green algae. It is generally believed that the glaucophytes branched off first because their plastids have retained some cyanobacterial characteristics like the peptidoglycan layer between the two envelope membranes (Löffelhardt et al., 1997), and they lack the three-helix LHCs (Green, 2003; Neilson and Durnford, 2010). However, the evolutionary distances are so large that phylogenetic trees are unable to resolve the branching order convincingly (Nozaki et al., 2009; Deschamps and Moreira, 2009; Baurain et al., 2010; Rodriguez-Ezpeleta et al., 2005).

The Plantae are just the beginning of the endosymbiosis story. The photosynthetic members of the Chromalveolata acquired their plastids by engulfing or being invaded by a red alga that became an intracellular symbiont in its turn (Figure 2). The red nucleus and mitochondrion were eventually lost, but only after many red algal genes were transferred to the host nucleus. Only the red plastid remained, now surrounded by four membranes: the two inner ones belonging to the red plastid, the next one from the red algal plasma membrane, and the outermost one an extension of the host endomembrane system (Gibbs, 1981; Gillott and Gibbs, 1980; Cavalier-Smith, 2002; Gould et al., 2008; Archibald, 2009). This event is referred to as secondary endosymbiosis.

One of the strongest pieces of evidence for this scenario comes from the cryptophyte algae (cryptomonads), which retain a relict of the red nucleus called the nucleomorph (Nm) (Gibbs, 1981; Ludwig and Gibbs, 1985). All the nucleomorphs examined to date have three tiny chromosomes with 300–450 genes (Archibald, 2007). Similarities among the plastid genomes of the three chromist groups (heterokonts, haptophytes, cryptophytes) suggest that they may be the result of a single secondary endosymbiotic event, but the phylogenetic evidence is not conclusive (Baurain et al., 2010; reviewed in Green, 2011). The dinoflagellate plastids are so unique that it has been suggested that they may be the product of an independent endosymbiosis, or that they acquired their plastids from an algal group that already had a secondary plastid, i.e. via tertiary endosymbiosis (Sanchez-Puerta et al., 2007).

Green algal members of the Plantae have also become secondary endosymbionts, and this has occurred independently at least twice (Figure 1). Most Excavates are heterotrophic protists (e.g. kinetoplastids) except for the Euglenophytes, which appear to have acquired a plastid secondarily from a prasinophyte alga (Turmel et al., 2008). The Rhizaria are a diverse group of non-photosynthetic protists such as foraminiferans, cercozoans and filose amoebae. Within this clade are the Chlorarachniophytes, a small group of amoebo-flagellates with secondary plastids derived from a different type of green alga, probably a core chlorophyte (Rogers et al., 2006; Turmel et al., 2008). The fascinating thing about the Chlorarachniophytes is that they also have nucleomorphs with three miniature chromosomes (Ludwig and Gibbs, 1989; Archibald, 2007). However, these nucleomorphs have retained a different collection of nuclear genes than the cryptophyte nucleomorphs (Gilson et al., 2006; Lane et al., 2007).

Gene transfer to the nucleus hasn’t stopped

Gene transfer to the nucleus is still ongoing (reviewed in Kleine et al., 2009). Large chunks of chloroplast DNA have been found in plant nuclear genomes, and the rice genome even has an intact mitochondrial genome integrated into one of its chromosomes. There seems to be a constant flood of organellar DNA bombarding the nucleus and integrating into the nuclear genome (Martin, 2003; Kleine et al., 2009). Nuclear transformation wasn’t invented by scientists working on tobacco and Arabidopsis! Most of this DNA eventually degrades, but occasionally some little bit lands in a spot where it can be transcribed, or may form a hybrid gene with a pre-existing nuclear gene. As an additional complication, some mitochondria have taken up chloroplast DNA and integrated tRNA genes into their genomes to replace mitochondrial tRNAs that have been lost. Furthermore, there is evidence of functional genes in the process of being copied to the nucleus even after secondary endosymbiosis (see below).

What chloroplast genomes share

The standard picture of a plastid genome is a circular DNA molecule 100–200 kbp in size with a ‘quadripartite’ structure consisting of two large inverted repeats (IRs) dividing the circle into a large and a small single copy region. The IRs always include the ribosomal RNA genes and usually a number of other genes. In general, the genome has 16S, 23S and 5S rRNA genes and 27–31 tRNA genes, enough for translation of all amino acids, and at least three of the four subunits of a prokaryotic-type RNA polymerase (rpoB, C1, C2). It also has the majority of genes for the polypeptides of Photosystem I, Photosystem II, the cytochrome b6f complex and ATP synthase (Table 1). There is a variable number of ribosomal protein genes, but they often retain some of the same order as in the cyanobacterial ribosomal operons (Stoebe and Kowallik, 1999). The characters generally shared by all chloroplast genomes support a common ancestor for the three branches of the Plantae. They also suggest that most of the cyanobacterial genes were lost or transferred to the nucleus before the lineages diverged. However, with the exception of Rubisco, all the genes for the carbon fixation reactions of photosynthesis have been transferred to the nucleus, or replaced by genes of nuclear or bacterial origin (Martin and Schnarrenberger, 1997).

Table 1.   Basic plastid gene list
  1. Genes generally found in plastid genomes: those in bold are almost universal except in reduced genomes (dinoflagellates, non-photosynthetic plastids). The conserved hypotheticals (ycfs) are not included, even though some are widely shared, because many were not identified when the genomes were published and deposited in Genbank. Some are taxon-specific (e.g. diatom-specific ycf88, ycf89, ycf90, Oudot-Le Secq et al., 2007). Many of the older plastid genomes in Genbank need to be updated. For the additional genes found only in red algae, see Hagopian et al. (2004).

 Ribosomalrns, rnl, rrn54.5S rRNA in plants only
 TransfertrnA(ugc), trnC(gca), trnD(guc), trnE(uuc), trnF(gaa), trnG(gcc), trnG(ucc), trnH(gug), trnI(cau), trnI(gau), trnK(uuu), trnL(caa), trnL(uaa), trnM(cau), trnN(guu), trnP(ugg), trnQ(uug), trnR(acg), trnR(ccg), trnR(ucu), trnS(gcu), trnS(uga), trnT(ugu), trnV(uac), trnW(cca), trnY(gua) 
 OthersrnpB (ribonuclease P), ffs RNA (SRP), ssra(tmRNA) 
TranscriptioncbbX, rbcR, rpoA, rpoB, rpoC1, rpoC2, matKmatK in greenline, cbbX and rbcR in redline
Ribosomal proteins
 Small subunitrps2, rps3, rps4, rps5, rps6, rps7, rps8, rps9, rps10, rps11, rps12, rps13, rps14, rps16, rps17, rps18, rps19, rps20All plastid in redline
 Large subunitrpl1, rpl2, rpl3, rpl4, rpl5, rpl6, rpl11, rpl12, rpl13, rpl14, rpl16, rpl18, rpl19, rpl20, rpl21, rpl22, rpl23, rpl24, rpl27, rpl29, rpl31, rpl32, rpl33, rpl34, rpl35, rpl36All plastid in redline and in some greens
 ATP synthaseatpA, atpB, atpD, atpE, atpF, atpG, atpH, atpIAll plastid in redline
 Photosystem IpsaA, psaB, psaC, psaD, psaE, psaF, psaI, psaJ, psaL, psaMpsaD, E, F not in plants
 Photosystem IIpsbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbN, psbT, psbV, psbX, psbY, psbZ, psb28 
 Cytochrome complexpetA, petB, petD, petF, petG, petL(ycf7), petM (ycf31), petN(ycf6)petF nuclear in plants and many algae
 NADH dehydrogenasendhA, ndhB, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhKPlants only, except some gymnosperms
MetabolismaccD, acpP, chlB, chlI, chlL, chlN, rbcL*, rbcS*, thiG, thiS, cysAaccD plants only, *rbcL and rbcS have different origins in red and green line (see text)
Protein quality controlclpC, clpP, dnaB, dnaK, ftsH(ycf25), groEL 
Assembly, membrane insertionccs1, ccsA, secA, secG, secY, sufB, sufC, tatCRedline only

The big differences that distinguish the plastid genomes of different lineages are the selection of genes retained, the presence/absence of introns and repeats, editing of transcripts, and the physical organization of the genome including intergenic spacers and rearrangements (Box 1).

Genome size and gene content in the green lineage

All members of the green lineage (land plants and green algae) have retained fewer genes in their chloroplast genomes than have the red or glaucophyte algae, or even the chromists (Table 1). Most land plants, including the non-vascular ones, have genome sizes between 120 and 160 kb (Table 2). Much of the difference in genome size is due to the number of genes that are included in the IRs (and are therefore duplicated) while the number of identified protein-coding genes and tRNAs are very similar (reviewed in Chumley et al., 2006; Gao et al., 2010; Palmer, 1990; Raubeson and Jansen, 2005; Ravi et al., 2008). At one extreme is the highly inflated genome of the angiosperm Pelargonium, which is due to a three-fold increase in length of the IRs as well as a large number of repeats and some pseudogenes (Chumley et al., 2006). However, it carries a normal number of protein-coding genes and the usual 29 tRNAs. On the other end of the scale are the early-diverging Gnetales that have smaller genomes due to loss of protein-coding genes as well as smaller introns and intergenic sequences (Wu et al., 2009). It is unusual for a plant to have fewer than the standard set of 27–31 tRNA genes, but the lycopod Selaginella is unique in having lost 12 of its tRNA genes (Tsuji et al., 2007).

Table 2.   Plastid genomes across the Tree of Life
Taxonomic groupGenome length (kbp)aTotal no. genesbSpecial features, exceptions
  1. aFrom Genbank or literature.

  2. bConserved genes (found in more than one genome) include ycf ’s and may include ORFs. Number is approximate because of differences in counting genes duplicated in IR. Some older annotations need to be updated to take into account newly identified genes and ycf ’s (see Oudot-Le Secq et al., 2007).

Rhodophyta150–192232–251Compact, no IR, most gene-rich
Glaucophyta (Cyanophora)136190Compact
Streptophyta90–100More than 100 species sequenced. Gnetales and some conifers are smaller, Pelargonium (218 kbp) inflated by large IR’s and repeats
 Land plants: mosses, ferns, lycopods, seed plants120–166
 UTC clade124–52194–99Inflated by repeats in Chlorophyceae, e.g. Floydiella (521 kbp), Dunaliella (269 kbp), Chlamydomonas (204 kbp)
 Prasinophyceae42–201 Polyphyletic
Excavata (Euglena)14396149 introns including twintrons
Rhizaria (Bigelowiella)6987Compact
 Stramenopiles90–160137–197No introns, few repeats
 Haptophytes105144No introns
 Cryptophytes122–136177–1802 introns in Rhodomonas, compact
 Dinoflagellates (peridinin type)?17?DNA minicircles (2.3–6 kbp) carrying 1–5 genes. Very small gene set.
 Photosynthetic apicomplexans86–120 Newly discovered algae connecting dinoflagellates and apicomplexans (Chromera velia, CCMP3155)
 Heterotrophic apicomplexans35–40 Intracellular parasites with non-photosynthetic plastids

A phenomenon that is restricted to land plants with one exception is the substitutional editing of plastid transcripts, in which one base is changed to another so that the mRNA sequence is not the same as the genomic sequence. This change usually has a beneficial result, e.g. the removal of a stop codon, the introduction of a conventional start codon, or the restoration of a conserved amino acid (Tillich et al., 2006; Stern et al., 2010). In most plants, editing is restricted to C to U conversion with some U to C conversions. There is an exceptionally high level of editing in the non-seed plants such as ferns, mosses and hornworts (Kugita et al., 2003; Wolf et al., 2004). Whether this evolved to compensate for inefficient DNA repair mechanisms is not yet clear. There is no evidence for editing in the green algae or any of the red lineage, except for the dinoflagellates, where it appears to be a case of convergent evolution (Dang et al., 2009; Wang and Morse, 2006; Zauner et al., 2004).

Before discussing green algal chloroplasts, a bit of explanation about their relationships is needed. Seed plants, bryophytes and some green algae (labelled as Mesostigmatophyceae in Figure 1) are classified as Streptophyta. They have very similar plastid genome sizes and gene content to the land plants (Turmel et al., 2006). The other green algae form a separate lineage, the Chlorophyta, which consists of four classes: three often grouped together as the ‘UTC clade’ (Chlorophyceae–Trebouxiophyceae–Ulvophyceae) and the more primitive Prasinophyceae (Graham and Wilcox, 2000). In general, species of the UTC clade have the same number of genes but their genome sizes are wildly different (Table 2). Some members of the class Chlorophyceae, which includes the well known model algae Chlamydomonas, Volvox and Dunaliella, have extremely inflated genomes. The record for size currently belongs to Floydiella terrestris at 521 kbp (Brouard et al., 2010) but it may be surpassed if the Volvox carteri genome (>420 kb) can be completely assembled (Smith and Lee, 2009). This expansion is not due to differences in gene content (94–99 genes) but mainly to the presence of large intergenic regions filled with a great variety of repeats (Maul et al., 2002; Smith and Lee, 2009; Brouard et al., 2010; Smith et al., 2010).

The prasinophytes are a heterogeneous class that gives taxonomists a collective headache (summarized in Graham and Wilcox, 2000 and in Turmel et al., 2008). The most interesting ones are the picoplanktonic Mamiellales, particularly Ostreococcus tauri. Ostreococcus tauri is compact in every way, not only in cell size but also in the sizes of organellar and nuclear genomes (Robbens et al., 2007). As all three genomes of this alga have been sequenced, it has been verified that some of the genes missing from the plastid genome are now located in the nuclear genome. However, there is considerable variety in the group as a whole (Table 2) with Nephroselmis at 201 kbp and 128 genes compared with 72 kbp and 88 genes for O. tauri (Turmel et al., 2008).

The two algae with secondary green plastids have little in common, which is hardly surprising as the hosts were not related and neither were the endosymbionts. Bigelowiella natans retained a nucleomorph (Gilson et al., 2006), and acquired some other nuclear genes from a variety of food sources (Archibald et al., 2003). It has a very compact plastid genome, partly due to the loss of some of the genes found in the UTC plastids, which are its closest relatives, and partly to very small intergenic spacers and the absence of introns (Rogers et al., 2006).

In complete contrast, Euglena gracilis has a larger plastid genome than its closest potential chloroplast donor, the prasinophyte Pyramimonas (143 versus 102 kbp) (Hallick et al., 1993; Turmel et al., 2008). Euglena has a tandem array of three and a half rRNA operons rather than an IR, but the main reason it is larger is that it is overloaded with at least 149 introns! Many of them are nested inside other introns, i.e. they are ‘twintrons’ (Hallick et al., 1993).

Genome size and gene content – the red lineage

The red algae and the chromists that got their plastids from red algae have all retained a larger number of chloroplast genes than members of the green lineage (Table 2). However, they lack the ndh genes that encode components of the plastid NADH dehydrogenase, which is involved in cyclic electron flow in many members of the green line. Their genomes also differ in being more compact, with smaller intergenic spacers, few introns, and few repeats. They lack editing of chloroplast transcripts, a process that appears to be prevalent in land plants but unknown elsewhere except in dinoflagellates (see below).

One other big difference is that the genes for large and small subunits of Rubisco (ribulose-bis phosphate carboxylase) were acquired from a proteobacterium rather than from the cyanobacterial ancestor of the plastid. This event was apparently a very ancient lateral gene transfer – a classic case of gene replacement (Assali et al., 1991; Delwiche and Palmer, 1996). Both subunits of the ‘red’ Rubisco are on chloroplast genomes, whereas the universal story in the green lineage is that the small subunit gene has been transferred to the nucleus and is very much under nuclear control.

The red algal chloroplast genomes have retained the largest number of genes of any group of photosynthetic eukaryotes (232–252), but their genomes are relatively compact in organization, with no IRs, no introns and modest intergenic spacers (Hagopian et al., 2004). Porphyra does have two rRNA operons separated by a single-copy region, but they are organized as direct repeats rather than inverted (Reith and Munholland, 1995). The red algal plastids have the genes for almost all the ribosomal proteins (45–47) and 29–37 tRNAs. In addition to the protein-coding genes shared with the chromists (Table 1), the red algae have a number of additional genes involved in biosynthesis of amino acids, fatty acids, menaquinone, thiamine and components of the phycobilisome, as well as sulfate transport and nitrogen uptake. Not all of these genes are shared among all four of the red algal genomes available, indicating that gene loss occurred independently after the lineages diverged (Glöckner et al., 2000; Ohta et al., 2003).

The glaucophytes are a very small taxonomic group with only a few genera (Löffelhardt et al., 1997). In comparison with the red algae, the number of genes in the plastid genome of Cyanophora paradox is intermediate between those of the red algae and the chromists, and considerably higher than in the green lineage (Stirewalt et al., 1995). Sequencing of the C. paradoxa nuclear genome is in progress, and that will tell us whether the same or different genes have survived in the nuclear genome compared with the other lineages.

The chromists that acquired red algal plastids by secondary endosymbiosis have lost a number of the red algal genes, but still have more genes than the primary plastids of the green lineage. The heterokont algae form a very large group and play a major role in the marine environment. Plastid genomes have been sequenced for the diatoms Odontella, Thalassiosira, and Phaeodactylum (Kowallik et al., 1995; Oudot-Le Secq et al., 2007), the macrophytic brown algae Fucus and Ectocarpus (Le Corguilléet al., 2009), two picoplanctonic pelagophytes implicated in disastrous ‘brown tides’ (Ong et al., 2010), as well as Heterosigma (Cattolico et al., 2008), Vaucheria (Rumpho et al., 2008) and two diatom endosymbionts of dinoflagellates (Imanian et al., 2010). Two cryptophyte genomes, Guillardia theta (Douglas and Penny, 1999) and Rhodomonas salina (Khan et al., 2007) and one haptophyte genome, Emiliania huxleyii (Sanchez Puerta et al., 2005) have also been sequenced.

In general, most of these plastid genomes have IRs, no introns, and a moderate number of small repeats. As pointed out by Le Corguilléet al. (2009), the gene set can be viewed as a common core of 86 genes shared with the entire red lineage (except dinoflagellates) plus another 34 that are found in most of them (Table 1). The very small picoplancton species Aureococcus and Aureoumbra not only have small cells but also have the smallest genomes and have lost some genes found in all other heterokonts (Ong et al., 2010). In each species there are one or two genes that are unique to it, e.g. genes for a putative tyrosine recombinase and a two-component His–Asp regulator in Heterosigma (Cattolico et al., 2008) and a bacterial-type DNA polymerase gene (dnaX) in Rhodomonas (Khan et al., 2007).

Thanks to the advent of whole-genome sequencing, we also have draft nuclear genome sequences for some of these algae, and others are in the pipeline. In comparing nuclear and chloroplast genomes, it has become clear that gene transfer to the nucleus did not stop after secondary endosymbiosis. Several diatom genes have been ‘caught in the act’, with both a nuclear and a chloroplast copy (Oudot-Le Secq et al., 2007). In the case of the psb28 gene, Phaeodactylum has only the chloroplast copy but Thalassiosira has both copies, both of which are expressed, and the product of the nuclear gene is successfully imported into the chloroplast (Jiroutováet al., 2010). One transfer has an ecological context: T. oceanica, which can grow under severe Fe limitation in the open ocean, has relocated the petF gene (encoding ferredoxin) to the nuclear genome, where it would be under more direct regulatory control (Lommer et al., 2010). The gene is still on the chloroplast genome in the coastal species T. pseudonana. The psaF gene is on the plastid genome in Aureoumbra and most other chromists, but it is in the nuclear genome in the related Aureococcus (Ong et al., 2010). In E. huxleyii, ESTs for a number of missing ribosomal protein genes show that they have not only been transferred to the nucleus but are also being expressed there (Oudot-Le Secq et al., 2007).

It should be pointed out that genes can be completely lost, too. For example, the genes for the light-independent protochlorophyllide oxidoreductase (chlB, chlL, chlN) have been completely lost rather than transferred to the nucleus in a number of species. They are pseudogenes in some cryptophytes, probably representing an intermediate stage on the way to loss (Fong and Archibald, 2008). It is likely that these losses are tolerated due to the ubiquitous presence of an unrelated light-dependent type of oxidoreductase.

Dinoflagellates break all the rules

Dinoflagellates are more closely related to the apicomplexan parasites than to the chromists (Figure 1). About half of them have lost their plastids, while some others have acquired new plastids from other algae by tertiary endosymbiosis (Archibald, 2009; Keeling, 2010). But the most astonishing thing about the dinoflagellates with regular ‘peridinin-type’ plastids is that their plastid genomes have been broken up into minicircles, each carrying from 1 to 5 genes (Zhang et al., 1999; Nelson et al., 2007). Many genes have been transferred to the nucleus (e.g. Bachvaroff et al., 2004). Summing up results from the few species that have been studied, no more than 17–20 genes have been detected in the plastid genomes, all of them encoding rRNAs (2), tRNAs (3), or core proteins of the photosynthetic electron transport chain (Green, 2004, 2011; Howe et al., 2008).

Another peculiarity is that dinoflagellates are the only chromalveolate group to edit chloroplast transcripts. In contrast to the green lineage where most editing is C-to-U substitution, A-to-G is the most frequent in dinoflagellates, although many types of transitions and transversions occur (Zauner et al., 2004; Wang and Morse, 2006; Dang and Green, 2009). The only other examples of A to G editing are found in animal nuclear genes, particularly those genes highly expressed in nervous tissue (Bass, 2002). This situation is a clear case of convergent evolution.


No discussion of plastid genomes would be complete without mentioning the non-photosynthetic plastids of apicomplexan parasites such as Plasmodium. They have not been broken up into minicircles but map as single circles (Wilson et al., 1996; Lim and McFadden, 2010). They have lost all the genes for photosynthetic proteins but retained genes for rRNA, RNA polymerase and some very divergent ribosomal protein genes. Current thinking is that the plastid is useful for compartmentalizing biosynthetic functions such as fatty acid and isoprenoid synthesis, even though all the enzymes involved are imported from the cytosol (Lim and McFadden, 2010).

It’s hard to understand how these apicomplexans could be nearest neighbours to the dinoflagellates. However, two photosynthetic apicomplexans that appear to be near the branching point of these two lineages have recently been discovered (Moore et al., 2008; Janouśkovec et al., 2010). In terms of photosynthetic function, they appear more similar to the dinoflagellates, but in terms of their plastid genomes they are much more conventional, with a single circular-mapping genome, although they have lost many of the plastid genes found in the chromists (Janouśkovec et al., 2010).

Genome organization

Rearrangement of gene order has been extensive in all taxonomic groups and has been exhaustively discussed in the literature. However, there are certain gene clusters that tend to be widely conserved, particularly those that appear to be relicts of the eubacterial ribosomal protein operons (Stoebe and Kowallik, 1999). Attempts have been made to use inversion of gene blocks and expansion/contraction of IRs as taxonomic characters in both higher plant and algal evolutionary studies (Palmer, 1990; Raubeson and Jansen, 2005; Ravi et al., 2008; Gao et al., 2010). It seems that rearrangement breakpoints tend to be associated with tRNA genes and repeats. Whatever the underlying causes, it is clear that rearrangements are frequent, even between closely related species. So far as introns are concerned, there have been many gains and losses in the green lineage, while introns are rare in the red lineage.

One aspect that is rarely referred to in the literature and has not been investigated seriously in the algae, is the evidence that a substantial fraction of the plastid DNA in plants is not circular but linear molecules several genomes in length, with branched structures probably representing replication intermediates (Bendich, 2004). There is evidence for the presence of both linear and circular dimers in Chlamydomonas (Maul et al., 2002), and the optical mapping of the whole Thalassiosira genome suggested tandem copies of the plastid genome (Armbrust et al., 2004). Perhaps it is due to technical difficulties that this question tends to get ‘swept under the rug’, but because it is related to the question of how organelle genomes replicate, and how they do so without getting tangled up, it should receive more attention than it has.

In looking across the Tree of Life, we can conclude that chloroplast genomes should be regarded as dynamic in structure, even though they are fairly conservative in the amount of genetic information they contain. However, the small number of genes on plastid genomes in dinoflagellates and Chromera suggests that photosynthetic function is unaffected by having most of the relevant genes in the nucleus. On the evolutionary time-scale, it appears that plastids can be very tolerant of where their genes are located or the arrangement of their genomes.


Many thanks to Dion Durnford for Figure 1, and to the Natural Sciences and Engineering Research Council of Canada (NSERC) for long-term support of my research on chloroplasts.