Next-generation sequencing data suggest that certain nonphotosynthetic green plants have lost their plastid genomes



Genomes are the agents of life; they are present, in one form or another, in all living things, and the latter cannot exist without the former. Sometimes, however, genomes exist long after relinquishing control of the ‘life’ that they once yielded. Take, for instance, the mitochondrion and plastid of eukaryotic cells. Both descend from once free-living bacteria, which, over millions of years, slowly surrendered their autonomy to the host cell that engulfed them. Nonetheless, mitochondria and plastids still contain a genome and gene expression system, even though almost all of the proteins required for these two organelles to function are nuclear encoded – a consequence of massive, recurring waves of organelle-to-nucleus gene migration (Kleine et al., 2009). Why, then, have not mitochondria and plastids outsourced all of their genes to the nucleus? Why do they retain a genome and a gene expression infrastructure? Satisfactory answers to these questions, particularly for anaerobic or nonphotosynthetic species, are for the most part lacking, but accumulating data from diverse lineages suggest that some organelles have, in fact, jettisoned their genomes.

The most convincing evidence for organelle genome loss comes from the mitochondrial-derived organelles of various microbes living in anoxic environments (Hjort et al., 2010). In disparate groups across the eukaryotic tree, mitochondria have been subverted into anaerobic organelles – called hydrogenosomes or mitosomes – which no longer perform oxidative phosphorylation, but continue to carry out crucial cellular processes, such as hydrogen production (de Graaf & Hackstein, 2012). Although the vestiges of a mitochondrial chromosome (mtDNA) exist in some hydrogenosomes (Pérez-Brocal & Clark, 2008; de Graaf et al., 2011), many, and perhaps most, anaerobic mitochondrial-derived organelles have disposed of their DNA entirely and rely solely on nuclear-encoded proteins to function (Hjort et al., 2010). The search for plastids without DNA, however, has been less fruitful.

The preservation of plastid DNA

Plastids are found in almost every ecosystem on the planet. Since their archaeplastidal origin through the primary endosymbiosis of a cyanobacterium, plastids have subsequently spread, through eukaryotic–eukaryotic endosymbioses, to remote eukaryotic groups (Archibald, 2009). Consequently, a significant proportion of the known eukaryotic diversity contains a plastid, and with a few potential exceptions (see later), wherever a plastid exists, a genome persists (Keeling, 2010). Even plastids that have lost photosynthetic capabilities, such as those of malaria parasites and heterotrophic plants and algae, are consistently shown to have a genome, albeit one that is highly reduced (c. 30–100 kb) with a much smaller gene content than their counterparts in closely related photosynthetic taxa (Wilson et al., 1996; de Koning & Keeling, 2006; Wicke et al., 2013). Given the ubiquity of plastid DNA (ptDNA) across plastid-bearing taxa, it has been argued that plastids are irreversibly tied to their genomes (Barbrook et al., 2006a; Nair & Striepen, 2011). Nevertheless, certain eukaryotic lineages are believed to have lost their plastids outright (Keeling, 2010), implying that on the way to a plastid-less state there was a transitional stage in which there existed plastid-containing species without ptDNA. Some believe that it is just a matter of time until someone stumbles upon such species (Nickrent et al., 1997; Palmer, 1997).

Well, that time may have come. Two recent studies, published within one month of each other, provided evidence for plastid genome loss in distinct nonphotosynthetic green plants. One is of the parasitic flowering plant Rafflesia (Molina et al., 2014) and the other is of the colorless green alga Polytomella (Smith & Lee, 2014). Both investigations used next-generation sequencing (NGS) results to argue for the absence of ptDNA.

Next-generation organelle-genome sequencing

High-throughput sequencing methods have transformed the field of organelle genomics making it fast, easy, and efficient to sequence mtDNA and ptDNA (Smith, 2012). A single run of whole genomic DNA from a plant or alga on an NGS platform typically yields enough organelle-derived sequences to assemble complete mitochondrial and plastid genomes with > 100-fold coverage (Nock et al., 2011; McPherson et al., 2013). In many cases, 5–25% of the reads generated from high-throughput sequencing of total eukaryotic DNA (or RNA) come from organelles (Smith, 2012), with plastid-derived reads often outnumbering mitochondrial ones (Molina et al., 2014). This is also true for nonphotosynthetic plants and algae, many of which have had their ptDNAs sequenced using next-generation techniques (Arisue et al., 2012; Wicke et al., 2013; Imura et al., 2014). Time and again, NGS of nonphotosynthetic, plastid-bearing species has returned prodigious amounts of ptDNA data, so when researchers carried out intensive Illumina sequencing of Rafflesia and Polytomella, one would have expected them to uncover an abundance of plastid-derived reads. But they found the opposite.

Rafflesia: big flower, but no ptDNA?

Rafflesia is a southeast Asian genus of angiosperms, situated within the rosids, and sometimes called ‘corpse flower’. Credited for having the largest single flower of any plant, it is comprised of putrid-smelling, nonphotosynthetic parasites, which lack stems, roots, and leaves, and rely solely on their host, the vine Tetrastigma, for survival (Nais, 2001). Rafflesia is just one of many land plant genera that harbor nonphotosynthetic, parasitic species; others include Cuscuta, Epifagus, and Orobanche, to name but a few – see Westwood et al. (2010) for a review on the topic.

Over the past 25 yr, biologists have identified and sequenced ptDNA from a wide range of holoparasitic plants (Krause, 2011), such as Epifagus virginiana and Orobanche gracilis (Wolfe et al., 1992; Wicke et al., 2013), strengthening the idea that nonphotosynthetic plastids require a genome. However, PCR and Southern blot experiments failed to identify ptDNA in members of the Rafflesia genus, hinting that this nonphotosynthetic lineage may lack a plastid genome (Nickrent et al., 1997; Davis et al., 2007). Rafflesia mtDNA, on the other, has been highly amenable to sequencing and is well characterized (Xi et al., 2013).

Scientists are now stepping up the search for Rafflesia ptDNA. Molina et al. (2014) isolated whole genomic DNA from a Rafflesia lagascae floral bud, collected in Cagayan province, Philippines, and subjected it to Illumina sequencing. The resulting 440 million paired-end reads were teeming with mtDNA but contained very few ptDNA-like sequences, none of which appeared to come from the R. lagascae plastid. Of the c. 1.5 million R. lagascae Illumina contigs, only c. 45 (0.003%; 11.5 kb) showed similarity to genic and intergenic sequences typically found in land plant plastid genomes. But not one of these plastid contigs contained a complete gene or an intact open reading frame, and they all had low read coverage (c. 1.5× ), contrasting the >300-fold coverage observed for the mtDNA-derived contigs. Moreover, none of the plastid sequences were found to be phylogenetically associated with close relatives of Rafflesia, such as Ricinus or Hevea, and many were affiliated with species closely related to Tetrastigma – the plant that R. lagascae parasitizes (Fig. 1). Based on these findings, the authors argue that plastid sequences recovered from R. lagascae Illumina sequencing are nuclear-located (and in a few cases mitochondrial-located) ptDNA-like sequences, which have been horizontally transferred to R. lagascae from the plastid genome of Tetrastigma. Host-to-parasite horizontal gene transfer is well documented in angiosperms (Davis & Wurdack, 2004), and more than a quarter of the mtDNA-encoded genes in Rafflesia species, including R. lagascae, appear to originate from Tetrastigma (Xi et al., 2013; Molina et al., 2014). If R. lagascae does have a plastid genome it likely is in a cryptic form, has a highly divergent sequence, and/or is found at very low levels. Or perhaps, as the authors suggested, it has vanished altogether. If so, it may not be alone. Similar experiments have indicated that another lineage from the Viridiplantae may have also discarded its ptDNA.

Figure 1.

Tree of chlorophycean and trebouxiophycean green algae and angiosperms showing examples of species that have lost photosynthetic capabilities. Photosynthetic species, green text; nonphotosynthetic, red text. Branching order based on published phylogenetic analyses (Nedelcu, 2001; Smith et al., 2013; Xi et al., 2013, and references cited therein).

Potential plastid genome loss in Polytomella green algae

First described over a century ago, Polytomella is a monophyletic green algal genus of free-living, freshwater unicells, closely related to the model photosynthetic species Chlamydomonas reinhardtii and Volvox carteri (Pringsheim, 1955; Smith et al., 2013) (Fig. 1). Although nonphotosynthetic, Polytomella algae have a plastid (Moore et al., 1970), but early attempts to identify a gene expression system within it were unsuccessful (Nedelcu et al., 1996; Nedelcu, 2001), even though similar techniques identified one in the plastids of other colorless green algae, such as Prototheca wickerhamii and Polytoma uvella, both of which lost photosynthesis independently of Polytomella (Fig. 1).

Recently, Smith & Lee (2014) used high-throughput sequencing to search for ptDNA and plastid gene expression in Polytomella. Illumina sequencing and assemblies of total DNA isolated from each of the four known Polytomella lineages (P. parva, P. piriformis, P. capuana, and P. magna) (Fig. 1) gave > 225 million paired-end reads and 200 Mb of contig sequences. These data were scanned using BLAST- and mapping-based methods for putative Polytomella ptDNA sequences, but none were found. The same methods, however, easily detected Polytomella mtDNA-derived reads and contigs, despite the fact that Polytomella mitochondrial genomes have highly reduced gene contents, elevated rates of nucleotide substitution, fragmented architectures, and/or extreme nucleotide compositions (Smith et al., 2010a).

Illumina RNA sequencing (RNA-seq) and transcriptomic analysis of P. parva also provided no signs of a plastid genome or associated gene expression system (Smith & Lee, 2014). Assembly of c. 50 million RNA-seq reads and annotation of c. 31 000 contigs uncovered thousands of P. parva nuclear transcripts, hundreds of which code for putative plastid-targeted proteins. Close inspection of these presumed plastid proteins indicated that the P. parva plastid performs a diversity of functions, similar to those observed in the plastids of other nonphotosynthetic algae (Borza et al., 2005). Conspicuously absent, however, were any potential plastid-targeted proteins involved in the expression, replication, or repair of ptDNA, such as plastid-like ribosomal proteins. These data, along with the inability to detect ptDNA-derived sequencing reads, ultimately led Smith & Lee (2014) to conclude that the Polytomella plastid genome is nonexistent.

Proof of ptDNA absence or absence of ptDNA proof?

It is possible that a cryptic plastid genome is hiding within R. lagascae and Polytomella algae and that it somehow escaped detection by high-throughput sequencing. Illumina sequencing has its drawbacks: it has been shown to give uneven and poor read coverage across genomic regions with extremely biased base compositions (Oyola et al., 2012), which could impede the identification of a possibly small plastid genome. Moreover, plastid genomes can sometimes have peculiar architectures, such as fragmented chromosomes (Barbrook et al., 2006b), large numbers of introns and repetitive DNA (Smith et al., 2010b), and/or high levels of post-transcriptional editing (Tillich et al., 2006) – features that could hinder the identification of ptDNA via NGS methods.

But even when considering the potential issues associated with NGS and plastid genome architecture, nuclear transcriptome sequencing should still provide evidence of ptDNA expression, replication, and repair. In the case of P. parva, transcriptomic analyses revealed no nuclear-encoded, plastid-targeted proteins with ptDNA-related functions, which is consistent with plastid genome loss. For R. lagascae, unfortunately, there are currently no published data on nuclear transcripts for plastid-targeted proteins. But within the National Center for Biotechnological Information Sequence Read Archive there are 4.4 Gb of paired-end Illumina RNA-seq data for R. cantleyi (accession number SRX157681), which is a close relative of Rlagascae. Searching these RNA-seq reads for nuclear transcripts encoding plastid proteins should be straightforward, and is a critical step in the pursuit of a Rafflesia plastid genome. If Rafflesia has ptDNA there should be dozens of nuclear-encoded proteins with ptDNA-related functions.

Presently, the evidence for plastid genome loss in R. lagascae and Polytomella are based solely on the results of NGS experiments (Molina et al., 2014; Smith & Lee, 2014) as well as some preliminary PCR and/or nucleotide hybridization work (Nedelcu et al., 1996; Nickrent et al., 1997; Nedelcu, 2001; Davis et al., 2007). Further explorations for a potential plastid genome in these lineages could come from fluorescent microscopy using DNA-binding dyes, such as DAPI or SYBR Green as well as from additional analyses of the available Rafflesia and Polytomella NGS data – specifically, analyses using different assembly, BLAST, and mapping approaches than those employed by Molina et al. (2014) and Smith & Lee (2014). If ptDNA does exist in Rafflesia and/or Polytomella species, fluorescent microscopy should reveal nucleoids within their plastids. However, an attempt to do such an experiment in P. parva was complicated by highly reticulated mitochondrial structures, which layered over and obscured potential nucleoid signals from the plastid (Smith & Lee, 2014). Arguably the strongest evidence for plastid genome loss will come from complete nuclear genome sequencing of various Rafflesia and Polytomella taxa, which should provide a complete suite of nuclear genes for plastid-targeted proteins and consequently a better understanding of plastid function within these species.

If plastid genome loss has occurred in the Rafflesia and Polytomella lineages, one will need to explain how they are synthesizing heme. In most plastid-bearing species heme biosynthesis begins in the plastid, via the C5 pathway, and employs a plastid-encoded tRNA glutamate (Beale, 1999). It has been hypothesized that nonphotosynthetic plants and algae retain a plastid genome, and its tRNAGlu, to maintain a functional heme pathway (Barbrook et al., 2006a). In some plastid-bearing species, including the malaria parasite Plasmodium falciparum, the initial steps of heme biosynthesis occur in the mitochondrion through the Shemin pathway (Oborník & Green, 2005), and they are therefore not reliant on a plastid tRNAGlu. It is not known how Rafflesia or Polytomella are synthesizing heme, but the latter, at least, does not appear to be using the Shemin pathway (Smith & Lee, 2014).

More proposed cases of plastid genome loss on the way?

There are a large number of poorly studied eukaryotic microbial groups, many of which harbor species with nonphotosynthetic plastids or potentially with unidentified ‘cryptic’ plastids (Keeling, 2010), including various lineages within the eukaryotic superphylum Alveolata, such as colpodellids (Gile & Slamovits, 2014), perkinsids (Robledo et al., 2011), and Oxyrrhis (Slamovits & Keeling, 2008). As researchers explore these groups they will likely discover more possible examples of plastid genome loss. It is surprising that currently the two best cases for ptDNA loss – Rafflesia and Polytomella – come from lineages whose plastids descend directly from a primary endosymbiosis of a cyanobacterium and not from those whose plastids derive from eukaryote–eukaryote endosymbioses. It is the latter category of plastids that are thought to have been lost completely (genome and all) in certain protist groups, such as Cryptosporidium (Keeling, 2010), whereas there are no purported examples of outright plastid loss in any primary plastid-bearing lineages. There is mounting evidence that the oyster parasite Perkinsus marinus (Alveolata) has a relic, red-algal-derived plastid without a genome (Robledo et al., 2011). And there may well be other protists with undiscovered relic plastids without genomes. However, the next case for plastid genome loss could come from another land plant. A survey of ptDNA gene content across the parasitic plant genus Cuscuta identified, through nucleotide-hybridization work, some species that may have lost their plastid genomes (Braukmann et al., 2013).

It is perplexing to imagine the steps involved in acquiring a photosynthetic organelle – from the endosymbiosis of a free-living photosynthetic organism to the integration of that symbiont into the host cell to the amalgamation of symbiont and host genomes. It is equally perplexing to envision the reverse process – the forfeiting of photosynthetic capabilities, the deterioration of a plastid genome, and the eventual loss of the organelle itself. Either way, both of these processes have a lot to teach us about the evolution and diversity of life.


The authors wish to thank Susann Wicke and two anonymous reviewers for their critical reading of the manuscript and helpful feedback. D.R.S. is supported by a Discovery Grant from the Natural Sciences and Engineering Research Council (NSERC) of Canada.