Ecological and evolutionary genomics of marine photosynthetic organisms


  • Susana M. Coelho,

    Corresponding author
    1. CNRS UMR 7139, The Marine Plants and Biomolecules Laboratory, Roscoff, France
    • UPMC-Université Paris 06, Roscoff, France
    Search for more papers by this author
  • Nathalie Simon,

    1. UPMC-Université Paris 06, Roscoff, France
    2. CNRS UMR 7144, Adaptation and Diversity in the Marine Environment, Oceanic Plankton Group, Roscoff, France
    Search for more papers by this author
  • Sophia Ahmed,

    1. UPMC-Université Paris 06, Roscoff, France
    2. CNRS UMR 7139, The Marine Plants and Biomolecules Laboratory, Roscoff, France
    Search for more papers by this author
  • J. Mark Cock,

    1. UPMC-Université Paris 06, Roscoff, France
    2. CNRS UMR 7139, The Marine Plants and Biomolecules Laboratory, Roscoff, France
    Search for more papers by this author
  • Frédéric Partensky

    Corresponding author
    1. CNRS UMR 7144, Adaptation and Diversity in the Marine Environment, Oceanic Plankton Group, Roscoff, France
    • UPMC-Université Paris 06, Roscoff, France
    Search for more papers by this author

Correspondence: Susana M. Coelho; Frédéric Partensky, Fax: 00 33 298 29 23 85; E-mail:;


Environmental (ecological) genomics aims to understand the genetic basis of relationships between organisms and their abiotic and biotic environments. It is a rapidly progressing field of research largely due to recent advances in the speed and volume of genomic data being produced by next generation sequencing (NGS) technologies. Building on information generated by NGS-based approaches, functional genomic methodologies are being applied to identify and characterize genes and gene systems of both environmental and evolutionary relevance. Marine photosynthetic organisms (MPOs) were poorly represented amongst the early genomic models, but this situation is changing rapidly. Here we provide an overview of the recent advances in the application of ecological genomic approaches to both prokaryotic and eukaryotic MPOs. We describe how these approaches are being used to explore the biology and ecology of marine cyanobacteria and algae, particularly with regard to their functions in a broad range of marine ecosystems. Specifically, we review the ecological and evolutionary insights gained from whole genome and transcriptome sequencing projects applied to MPOs and illustrate how their genomes are yielding information on the specific features of these organisms.


In recent years, genomic approaches have been increasingly used to study questions of ecological importance ranging from the adaptation of organisms in changing environments to the evolution of complex phenotypes (e.g. Wagner 2011; Harrison et al. 2012). Environmental (ecological) genomics aims to understand the relationships between an organism and its biotic and abiotic environment by studying the structure, function and evolution of the genome. The advancement of NGS technologies has enabled an increase in high sequencing throughput by several orders of magnitude compared to traditional methods. Consequently, genomic approaches are now being used by ecologists to study genetic adaptation in rapidly evolving systems as well as by molecular biologists to perform expression analysis and functional assays to study variation among populations and species. The vast reduction in the cost per sequenced base has not only allowed the application of those approaches to a much broader range of species (including a vast array of nonmodel organisms) but has also opened up new fields of genomics such as metagenomics and metatranscriptomics, thereby providing access to a wealth of uncultivated organisms and/or complex biological communities.

Apart from a few completely sequenced marine cyanobacterial genomes (Dufresne et al. 2003; Palenik et al. 2003; Rocap et al. 2003), marine photosynthetic organisms (MPOs) were poorly represented amongst the early genomic models (Cock & Coelho 2011). However, this situation has recently changed, and new insights are being gained into the genetics and evolution of a broad range of MPOs through whole genome sequencing, transcriptomics and targeted metagenomic approaches. Metagenomics is useful not only to describe the genetic diversity of complex communities of MPOs, but also to provide insights into the underlying metabolic capacities of each component of these communities.

In this review, we provide an overview of the way environmental genomics is being used to investigate how photosynthetic species in marine ecosystems function in their environment. The term ‘MPO’ encompasses a diverse collection of taxonomic groups, including both photosynthetic eukaryotes (i.e. micro- and macroalgae) and prokaryotes (cyanobacteria) that share one key feature, the ability to carry out oxygenic photosynthesis. MPOs play an important role in the planet's geochemical cycles, and they are a rich source of novel biomolecules and bioprocesses. The phytoplankton generate approximately half of global primary productivity and regulate global biogeochemical cycles (Falkowski et al. 2008). Multicellular algae (macroalgae) have significant ecological and economical importance: they carry out the bulk of photosynthesis in coastal regions, provide shelter for juvenile fish and invertebrates and bind together the shifting sea bed to form the basis of many coastal ecosystems.

MPOs play a crucial role in the functioning of all marine ecosystems that receive solar radiation, but they may also impact on the dark ocean because photosynthetic cells are important contributors of organic carbon export to deep water masses, a critical process in the survival of deep biological communities (Richardson & Jackson 2007). They also participate in symbiotic associations, feeding heterotrophic host organisms with various forms of fixed carbon. In the long term, it is predicted that climate change may cause alterations in phytoplankton populations, which in turn may impact on global carbon fluxes and the transfer of carbon within food chains. The effect of climate change on evolution at genotypic and phenotypic levels is therefore the subject of a considerable number of studies (see e.g. Piquet et al. 2008; Huertas et al. 2011).

Here we attempt to describe some recent advances in the genomics of MPO's and the implications these advances bring. Prokaryotic and eukaryotic organisms are herein treated separately despite being intimately linked both ecologically, in that they share the same environment where they compete for light and nutrients, and evolutionary, as algal plastids were derived from one or several cyanobacteria. We recommend the following selected reading for related topics: Rynearson & Palenik (2011) briefly describe some insights revealed by recent genome projects; the evolution of photosynthesis has been reviewed in Finazzi et al. (2010) and Sage et al. (2012), while Parker et al. (2008) discussed the forces that shaped the genomes of marine microalgae and the metabolic consequences of such a complex evolutionary history, focusing on carbon, nitrogen and iron.


Ecology and genetic diversity of marine cyanobacteria

The question of whether cyanobacteria, the oldest oxygenic phototrophs on Earth, arose first in marine or freshwater ecosystems has long been debated. A recent analysis combining paleobiological data and phylogenetic comparisons of a wide range of cyanobacteria favoured the second hypothesis. Indeed, Blank & Sanchez-Baracaldo 2010 suggested that ancestral cyanobacteria, which were presumably unicellular and exhibited small cell sizes (<2.5 μm), appeared about 2.7 Gyr ago in freshwater and/or endolithic environments, while the colonization of coastal, brackish and marine environments occurred only about 2.4 Gyr ago (Fig. 1). This diversification of habitats was however a critical event for the evolution of life on Earth, as it is thought to have triggered the sudden rise in atmospheric oxygen that occurred 2.3 Gyr ago (Bekker et al. 2004). Nonetheless, observation that marine groups i) do not form a monophyletic clade but are interspersed with freshwater species within the cyanobacterial radiation (Fig. 1) and ii) display a wide variety of morphology, ecology and habitats (Blank & Sanchez-Baracaldo 2010; Larsson et al. 2011) provides compelling evidence that several independent colonization events of saline waters by different cyanobacterial lineages has occurred during evolution.

Figure 1.

Ancestral state reconstruction-relaxed molecular clock chronogram using a 16S rRNA-RpoC tree, showing that early cyanobacteria were nonmarine. Branch lengths, calculated by penalized maximum likelihood using nucleotide sequence alignments of SSU and rpoC, are proportional to age (the age scale in Gyr, at right-hand side, shows geologic eras). The approximate date of global oxygen rise (GOR) is denoted by a dotted line. Reconstructed ancestral states using maximum parsimony are indicated by circles at nodes, the colour of which corresponds to habitat (as specified in the insert at bottom left). SPM, clade containing Synechocystis, Pleurocapsa and Microcystis; PNT, clade containing Pseudanabaena, Nostocales, Trichodesmium; LPT, clade containing Leptolyngbya, Plectonema, Phormidium and Synechococcus sp. PCC7335; SynPro, clade containing Synechococcus, Prochlorococcus and Cyanobium. Sequenced genomes are indicated by stars. Redrawn and slightly modified from Fig. 5C in Blank & Sanchez-Baracaldo (2010), with permission from authors and publisher.

The taxonomic diversity of free-living, planktonic cyanobacteria in present-day marine waters is surpringly low, with only four major genera known so far, the N2-fixers Trichodesmium and Crocosphaera and the nondiazotrophs Prochlorococcus and Synechococcus, and cultivated representatives of each of these genera have been sequenced (Table 1). Although less ubiquitous, it is worth noting that the planktonic cyanobacterium Nodularia spumigena, another N2-fixing species, which forms toxic surface blooms in brackish coastal waters such as the Baltic Sea (Sivonen et al. 1989) has also been recently sequenced, but its genome is yet to be described.

Table 1. Characteristics of sequenced marine photosynthetic organisms
LineageSpeciesStrain (a.k.a.)Genome size (Mbp)G+C%Sequencing centreGenome statusGenBank accession no.References
  1. IGS, Institute of Genome Science; PSU GSC, Penn State University Genome Sequencing Center; TIGR, The Institute of Genome Research; UCSC GSC, University of California at Santa Cruz Genome Sequencing Center; WUSL GSC, Washington University in Saint Louis Genome Sequencing Center; UCSD GSC, University of California at San Diego Genome Sequencing Center; JGI Joint Genome Institute; WGS, Whole genome shotgun sequencing.

  2. a

    Cells extracted from wild sea squirts (Lissoclinum patella).

Cyanobacteria Acaryochloris marina MBIC11017 (AM1)8.3647.0TGenComplete CP000828 Swingley et al. (2008)
Acaryochloris sp.CCMEE 54107.8847.0JCVIWGS AFEJ01000511 Miller et al. (2011)
Acaryochloris sp.HICR111A8.3747.0UF GSCWGSJN585763 (partial)Mohr et al. (2010a) Pfreundt et al. 2012
Calothrix rhizosoleniae SC0111.5044.0JCVIWGSN/AUnpublished
Crocosphaera watsonii WH00035.8937.7UCSC GSCWGS AESD00000000 Bench et al. (2011)
Crocosphaera watsonii WH85016.2437.1JGIWGS AADV00000000 Bench et al. (2011)
UCYN-AFlow sorted cells1.4431.0454 Life SciencesComplete CP001842 Tripp et al. (2010)
Cyanobium sp.PCC 70012.8368.7JCVIWGSABSE00000000Unpublished
Cyanothece sp.ATCC 511425.4637.9WUSL GSCComplete CP000806 Welsh et al. (2008)
Cyanothece sp.ATCC 514725.4037.9JGIWGS AGJC01000000 Unpublished
Cyanothece sp.CCY01105.8836.7JCVIWGS AAXW00000000 Unpublished
Leptolyngbya sp.PCC 73758.903.9347.8JGICompleteN/AUnpublished
Lyngbya majuscula 3L8.544.0UCSD GSCWGS AEPQ00000000 Jones et al. (2011)
Lyngbya sp.PCC 8106 (CCY9616)7.0441.1JCVIWGS AAVU00000000 Unpublished
Microcoleus chthonoplastes PCC 74208.6545.4JCVIWGS ABRS00000000 Unpublished
Nodularia spumigena CCY94145.3241.3JCVIWGS AAVW00000000 Unpublished
Prochlorococcus marinus AS96011.6731.3JCVIComplete CP000551 Kettler et al. (2007)
Prochlorococcus marinus MIT 92021.6931.1JGIWGS ACDW00000000 Unpublished
Prochlorococcus marinus MIT 92111.7038.0JCVIComplete CP000878 Kettler et al. (2007)
Prochlorococcus marinus MIT 92151.7431.1JGIComplete CP000825 Kettler et al. (2007)
Prochlorococcus marinus MIT 93011.6431.3JCVIComplete CP000576 Kettler et al. (2007)
Prochlorococcus sp.MIT 93032.7050.0JCVIComplete CP000554 Kettler et al. (2007)
Prochlorococcus marinus MIT 93121.7131.2JGIComplete CP000111 Coleman et al. (2006) Kettler et al. (2007)
Prochlorococcus sp.MIT 93132.4050.7JGIComplete BX548175 Rocap et al. (2003) Kettler et al. (2007)
Prochlorococcus marinus MIT 95151.7030.8JCVIComplete CP000552 Kettler et al. (2007)
  Prochlorococcus marinus NATL1A1.8635.0JGIComplete CP000553 Kettler et al. (2007)
Prochlorococcus marinus NATL2A1.8435.1JGIComplete CP000095 Kettler et al. (2007)
Prochlorococcus marinus SS120 (CCMP1375)1.7536.4JGIComplete AE017126 Dufresne et al. (2003) Kettler et al. (2007)
Prochlorococcus marinus MED4 (CCMP1986)1.6630.8JGIComplete BX548174 Rocap et al. (2003) Kettler et al. (2007)
Prochlorococcus marinus UH183011.6531.0JCVIWGSN/AUnpublished
Prochloron didemni a Cell sample P1: Palau6.3342.0TIGR and IGSWGS AGRF00000000 Donia et al. (2011a,b)
Prochloron didemni a Cell sample P2: Fiji7.5541.5IGSWGS AFSJ00000000 Donia et al. (2011a,b)
Prochloron didemni a Cell sample P3: Solomon5.8941.9IGSWGS AFSK00000000 Donia et al. (2011a,b)
Prochloron didemni a Cell sample P4: Papua New Guinea5.6941.9IGSWGS AGGA00000000 Donia et al. (2011a)
Synechococcus sp.BL1072.2854.3JCVIWGS AATZ00000000 Dufresne et al. (2008)
Synechococcus sp.CB01012.6764.1JCVIWGS ADXL00000000 Unpublished
Synechococcus sp.CB02052.4362.3JCVIWGS ADXM00000000 Unpublished
Synechococcus sp.CC93112.6152.4JGIComplete CP000435 Palenik et al. (2006) Dufresne et al. (2008)
Synechococcus sp.CC96052.5159.2JGIComplete CP000110 Dufresne et al. (2008)
Synechococcus sp.CC99022.2354.2JGIComplete CP000097 Dufresne et al. (2008)
Synechococcus sp.PCC 70023.4049.2PSU GSCWGS CP000951 Unpublished
Synechococcus sp.PCC 73355.9648.2 WGS ABRV00000000 Unpublished
Synechococcus sp.RCC3072.2260.8GenoscopeComplete CT978603 Dufresne et al. (2008)
Synechococcus sp.RS99162.6659.8JCVIWGS AAUA00000000 Dufresne et al. (2008)
Synechococcus sp.RS99172.5864.5JCVIWGS AANP00000000 Dufresne et al. (2008)
Synechococcus sp.WH57013.0465.4JCVIWGS AANO00000000 Dufresne et al. (2008)
Synechococcus sp.WH78032.3760.2GenoscopeComplete CT971583 Dufresne et al. (2008)
Synechococcus sp.WH78052.6257.6JCVIWGS AAOK00000000 Dufresne et al. (2008)
Synechococcus sp WH80162.6954.1JGIWGSN/AUnpublished
Synechococcus sp.WH81022.4359.4JGIComplete BX548020 Palenik et al. (2003) Dufresne et al. (2008)
Synechococcus sp.WH81092.1260.1JCVIWGS ACNY00000000 Unpublished
Trichodesmium erythraeum IMS1017.7534.1JGIComplete CP000393 Unpublished
Prasinophytes Ostreococcus lucimarinus CCE990113.260JGIWGS CP000581_CP000601 Palenik et al. 2007
Ostreococcus tauri OTH9512.659Montpellier FranceWGS CR954201_CR954220 Derelle et al. 2006
Ostreococcus sp.RCC80912 JGI  Unpublished
Micromonas sp.RCC29920.9064JGIWGS ACCO00000000 Worden et al. 2009
M. pusilla CCMP 154521.9065JGIWGS ACCP00000000 Worden et al. 2009
Bathycoccus prasinos BBAN718 GenoscopeWGS Unpublished
Stramenopiles Phaeodactylum tricornutum CCP1055/12753.7JGIWGS ABQD01000000 Bowler et al. 2008
Thalassiosira pseudonana CCMP13353247JGIWGS AAFD02000000 Armbrust et al. 2004
Fragilariopsis cylindrus CCMP110281 JGIWGS
Pseudo-nitzschia multiseries CLN-47250 JGIOngoing
Ectocarpus siliculosus Ec3221453.6GenoscopeWGSCABU01000001CABU01013533, FN647682FN649242, FN649726FN649760Cock et al. (2010a,b)
Aureococcus anophagefferens CCMP198456.70 JGIWGS ACJI00000000  
Nannochloropis gaditana CCmP526.29  WGS AGNI00000000 Radakovits et al. 2012
Rhodophytes Chondrus crispus  105 GenoscopeWGS unpublished
Porphyra umbilicalis  300–400 JGIOngoing
Haptophytes Emiliania huxleyi CCMP1516168 JGIWGS Unpublished

Trichodesmium can form large colonies (typically 1–5 mm in diameter) composed of tens to hundreds of aggregated filamentous cells (5–20 μm in length; Capone et al. 1997; Post et al. 2002). This genus is also known to form dense, widespread blooms and remote sensing observations have shown that these are most frequent in the northern Arabian Sea, the western Indian Ocean and the southeastern Pacific, while such blooms occur less than 5–10% of the year in other tropical and subtropical waters (Westberry & Siegel 2006). Moreover, molecular studies have demonstrated that Trichodesmium is the dominant N2-fixing cyanobacterium in several areas such as the Atlantic ocean (Langlois et al. 2008; Goebel et al. 2010) and the South China Sea (Moisander et al. 2008), but may be outnumbered by other diazotrophs in the North and South Pacific ocean gyres (Church et al. 2005; Halm et al. 2012). Phylogenetically, this genus encompasses at least 4 distinct clusters, one of them comprising the noncolonial filamentous taxon Katagnymene, which was erroneously classified as a different genus based on a number of phenotypic differences with Trichodesmium (Lundgren et al. 2005; Hynes et al. 2012). Only one strain of this group, Terythraeum IMS101, has been sequenced thus far (Table 1), but no formal description of this genome is available to date.

Crocosphaera can also form small colonies, with individual cells ranging from 2 to 8 μm in size. It is usually found in warm (>27°C) oligotrophic subsurface waters (Mazard et al. 2004; Campbell et al. 2005) and its abundance, as estimated by the number of taxon-specific nifH gene copies, is generally low (e.g. 61–460 cells/L in the tropical Atlantic; Goebel et al. 2010), although a record concentration at 8 × 106 cells/L was reported in the S Pacific ocean (Moisander et al. 2010). Interestingly, phylogenetic studies of natural populations and strains of Crocosphaera isolated from diverse areas revealed a very low level of genetic divergence, despite a significant variability at the phenotypic level (Zehr et al. 2007; Webb et al. 2009), and this taxon therefore seemingly encompasses a single species ocean-wide, C watsonii. Two strains (WH8501 and WH0003) with distinct cell sizes, growth temperature range and N2 fixation rate, isolated respectively from the south Atlantic and the north Pacific oceans, have been sequenced thus far (Table 1) and comparison of their genomes confirmed a remarkably high similarity at the nucleotide level, genome-wide (over 80% of each genome was >98% identical to the other strain), despite a large number of genome rearrangements, insertions or deletions (Bench et al. 2011).

Prochlorococcus and Synechococcus co-occur in the 45°N/S latitudinal band and are by far the most abundant cyanobacteria (and phytoplanktonic organisms in general) in the ocean. Prochlorococcus, the smallest known free-living phototroph (0.6–1.0 μm), dominates in warm, oligotrophic areas, with typical concentrations of 1–3 × 108 cells/L in the subsurface (Chisholm et al. 1988; Zubkov et al. 1998; Johnson et al. 2006). In contrast, Synechococcus is most abundant in near coastal waters and in areas enriched by local upwellings, often reaching concentrations as high as or higher than Prochlorococcus in these areas, while its cell concentrations dramatically decrease in nutrient-poor waters. These two picocyanobacteria are phylogenetically closely related to one another (Scanlan et al. 2009). Several pieces of evidence suggest that the monophyletic Prochlorococcus group arose fairly recently (possibly only 150 Myr old; Dufresne et al. 2005) and was derived from a Synechococcus-like ancestor, the main phenotypic traits distinguishing these two groups being their strikingly different light-harvesting antenna systems and pigmentation (Partensky et al. 1999; Ting et al. 2002). The intrageneric diversity within each of the latter two taxa is wide (Rocap et al. 2002; Fuller et al. 2003; Mazard et al. 2012) and genome sequences have been obtained for most of the major clades/ecotypes identified so far in both groups (Dufresne et al. 2003, 2008; Palenik et al. 2003, 2006, 2009; Rocap et al. 2003; Kettler et al. 2007; Scanlan et al. 2009). Prochlorococcus comprises distinct ecotypes physiologically and genetically adapted to either high light (HL) or low light (LL) and occupying different light niches within the euphotic layer in stratified, open ocean waters. Furthermore, each light niche may shelter several distinct lineages, namely HLI (a.k.a. eMED4) and HLII (a.k.a. eMIT9312) in the upper mixed layer, LLII, LLIII and LLIV (a.k.a. eSS120, eMIT9211 and eMIT9313, respectively) at the bottom of the euphotic zone and LLI (a.k.a. eNATL) ecotype at intermediate depth (Johnson et al. 2006; Malmstrom et al. 2010). The mechanisms of maintenance of LLII-IV ecotypes in seemingly identical niches are still unclear. For HL clades, however, it was shown that they exhibit distinct growth temperature ranges and geographic distributions, with HLII preferentially thriving in warm, tropical and subtropical waters and HLI preferring cooler waters and extending to higher latitudes.

Marine Synechococcus spp. also display a wide genetic diversity, with three deeply branching groups (called subclusters 5.1–5.3) subdivided into a number of clades (Dufresne et al. 2008; Scanlan et al. 2009), with at least one sequenced representative for most of them (Table 1). Molecular studies have shown that the most abundant groups in tropical and temperate waters of the Atlantic and Indian Ocean belong to subcluster 5.1, with four dominant clades (I–IV; Zwirglmaier et al. 2008; Mella-Flores et al. 2011). Clades I and IV co-occur in temperate waters at high latitude (>30°N/S) often predominating in coastal waters, while clade II is found in warm waters at low latitudes (<35°N/S) and clade III in open ocean waters, but with seemingly no latitudinal restriction. Interestingly, members of subcluster 5.2 were recently found to dominate in subpolar waters of the North Pacific ocean (Huang et al. 2011b). Several other Synechococcus clades also occur in the field, but generally as minor components of the Synechococcus community, although they may sometimes occur at high concentrations at some sites suggesting that they are adapted to specific niches (Zwirglmaier et al. 2008; Choi et al. 2011; Huang et al. 2011b; Mella-Flores et al. 2011). However, field and culture data are often too scarce to precisely set their geographical distributions and ecological preferenda.

Compared to their planktonic counterparts, benthic cyanobacteria are much more diverse phylogenetically, due to the large variety of available niches in coastal environments, including intertidal or infralittoral areas that can be rocky, sandy or muddy. For instance, the much studied Cyanothece sp. ATCC5112 has been isolated from intertidal sands of the Texas Gulf coast (Reddy et al. 1993), Cyanothece sp. CCY010 at the bottom of shallow waters around Zanzibar (L. J. Stal, pers. comm.) and Synechococcus sp. PCC7335 from a snail shell collected in the intertidal zone in the Gulf of California (Mexico). Many benthic cyanobacteria form dense microbial mats, while others live on the surface of macroalgae, seagrasses or mangrove roots (epiphytes) or even inside limestone rocks (endoliths). Intertidal mats are generally composed of filamentous, N-fixing cyanobacteria either possessing cells specialized for this function (i.e. heterocysts, such as e.g. Calothrix or Scytonema) or nonheterocystous (e.g. Lyngbya, Microcoleus, Phormidium or Schizothrix; Hoffmann 1999). Some species like Lyngbya aestuarii and Microcoleus chthonoplastes, the genomes of which were recently sequenced but are as yet unpublished (Table 1), have a ubiquitous distribution, while most others are found over a narrower latitudinal range. Because of their large diversity, benthic cyanobacteria have particularly good potential as sources of novel secondary metabolites (cyclic and linear peptides, guanidines, phosphonates, purines, lipids, macrolides, etc.) of industrial (biofuels) or pharmacological interest (e.g. cytotoxicity, inhibition of proteases).

Several marine cyanobacteria are also involved in symbiotic associations, often with benthic invertebrates. The endosymbiotic association between Prochloron and either ascidians or sponges have been particularly well studied (Munchhöff et al. 2007; Usher 2008). Phylogenetic analyses revealed a low specificity of this cyanobacterium for its hosts and a low genetic variation between individuals retrieved from different hosts, suggesting a lateral transmission of Prochloron cells between hosts (Munchhöff et al. 2007). Recently, several near complete Prochloron didemnii genomes (Table 1), obtained by squeezing cells out of Lissoclinum patella didemnids collected from four remote islands of the South Pacific, were shown to display a remarkable level of synteny and more than 97% DNA sequence identity across 90% of genome length (Donia et al. 2011b). This strongly suggests that the Prochloron life cycle includes a free-living stage, during which cells are transported over long distances, a phenomenon that would contribute to the genetic homogenization of the population (Donia et al. 2011a). Interestingly, different cyanobacterial species may inhabit the same invertebrate host species and evolve complementary pigmentations, a strategy that is likely to reduce competition for light (Hirose et al. 2009). Indeed, Synechocystis trididemni, a close relative of Prochloron, contains large amounts of phycoerythrin and phycocyanin that absorb green and red-orange light, respectively, whilst the main pigments in Prochloron are chlorophylls (Chls) a and b, which absorb blue and red light. Acaryochloris, another atypical cyanobacterium frequently found in association with Prochloron contains yet another major pigment, Chl d, a unique chromophore that absorbs near-infrared light (Miyashita et al. 1997). It is worth noting that Acaryochloris was initially thought to be an endosymbiont of ascidians (Miyashita et al. 2003), but a more recent analysis showed that is in fact a free-living epiphyte of those invertebrates (Kuhl et al. 2005) and it has been retrieved subsequently in a variety of benthic environments, including from underneath the crust of coralline algae living in coral reefs (Behrendt et al. 2010; Mohr et al. 2010a). While Prochloron has never been cultivated, several Acaryochloris has been successfully brought into culture and three strains have been sequenced to date (Swingley et al. 2008; Mohr et al. 2010a; Pfreundt et al. 2012).

Besides the genomes of cyanobacterial isolates, a number of genomes of uncultivated cyanobacteria have recently been obtained using NGS technologies, including the above mentioned endosymbiont Prochloron, the largest metagenome assembled so far (Donia et al. 2011a,b). Also noteworthy are the sequences of two novel Prochlorococcus HL subclades (HNLC1 and 2), characterized by a reduced set of genes encoding Fe-containing proteins. These strains were found to specifically thrive in iron-limited, equatorial and tropical oceanic waters (Rusch et al. 2010; West et al. 2010). Only a ‘consensus genome’ was obtained for each of these lineages by assembling metagenomic data collected during the Global Ocean Sampling (GOS) expedition of the Sorcerer II (Rusch et al. 2010). A more sophisticated approach, combining flow cytometric cell sorting, whole genome amplification and massively parallel pyrosequencing of paired-end reads, was used to characterize the genetic information of an atypical, uncultivated, nitrogen-fixing planktonic cyanobacterium, called UCYN-A (i.e. unicellular cyanobacterial N2-fixer group A; Tripp et al. 2010; Zehr et al. 2008). Although free UCYN-A cells can be observed in seawater by flow cytometry, evidence suggests that this organism in fact lives in symbiotic (or epiphytic) association with a protist (Tripp et al. 2010; Larsson et al. 2011). As for Crocosphaera, the genome of UCYN-A seems to be highly conserved (>97% nucleotide identity) across ocean basins (Tripp et al. 2010). Other associations of cyanobacteria with protists are known in the marine plankton, such as that associating the filamentous, heterocystous species Richelia intracellularis with different diatoms again either as an endosymbiont or as an epiphyte (Gomez et al. 2005), but so far no genome nor metagenome has been reported for this taxon.

Structure and evolution of marine cyanobacterial genomes

Genomes of free-living marine cyanobacteria vary greatly in size from 1.64 Mbp for Prochlorococcus marinus MIT9301 to 8.65 Mbp for Microcoleus chthonoplastes PCC7420 (Table 1). The latter genome size is only slightly smaller than that of the largest nonmarine cyanobacteria genome sequenced so far, i.e. 9.05 Mbp for Nostoc punctiforme PCC73102, a symbiont of cycads (Larsson et al. 2011). Furthermore, no free-living cyanobacteria with smaller genome sizes than 2.5 Mbp have been reported to date from terrestrial or freshwater habitats. Marine cyanobacteria therefore exhibit almost the full range of genome sizes observed in the Cyanobacteria phylum as a whole. It is worth noting that the uncultivated group U-CYNA has an even smaller genome (1.44 Mbp) than Prochlorococcus, but is apparently unable to sustain a free-living lifestyle (Tripp et al. 2010). Metabolic reconstructions suggest that UCYN-A is dependent upon other organisms for essential compounds, such as amino acids and purines. Also, even though UCYN-A is phylogenetically affiliated to cyanobacteria, absence of essential components of the photosynthetic machinery, including photosystem II, carboxysomes (i.e. the cyanobacterial microcompartments where carbon fixation takes places, thanks to RuBisCo) as well as enzymes of the Calvin-Benson cycle, makes it paradoxically unable to perform oxygenic photosynthesis, a unique trait among this phylum (Zehr et al. 2008). Nonetheless, having kept complete photosystem I and ATP synthase, U-CYNA is capable of capturing solar energy that is probably used to generate ATP and reducing power (Tripp et al. 2010). From an evolutionary viewpoint, comparison of the U-CYNA genome with the 5.46 Mbp genome of Cyanothece sp. ATCC 51142 suggests that it shares a recent common ancestor with members of this genus, from which U-CYNA seemingly evolved by a drastic genome reduction (Welsh et al. 2008; Tripp et al. 2010).

Most Prochlorococcus lineages also have a streamlined genome (size range: 1.64–1.86 Mbp), associated with a low GC content (31–38 G+C%), the only exception being the LLIV (or eMIT9313) lineage, which is located at the base of the Prochlorococcus radiation and that has retained a genome size (2.41–2.68 Mbp) and GC content (50–51 G+C%) similar to those found in the closely related marine Synechococcus group (2.22–2.62 Mbp; 52–66 G+C%; Table 1). The progressive decrease in genome size and GC content that occurred during the evolution of these lineages was seemingly associated with an acceleration of the mutation rate of protein-coding genes (Dufresne et al. 2005). Comparative genomics studies have shown that different clades possess distinct gene complements and that one of the functional categories that was most differentiated between ecotypes was DNA replication, recombination and repair (Kettler et al. 2007; Partensky & Garczarek 2010). It has been suggested that the loss of genes involved in the repair of GC to AT transversions in some genotypes may have caused them to become ‘mutators’, i.e. cells with an increased mutation rate (Marais et al. 2008). This type of transversion appears to have occurred at least twice during evolution, once before the differentiation of LLI-III ecotypes and a second time before the differentiation of HL ecotypes (Partensky & Garczarek 2010). Each time, this event must have been followed by the restoration of a normal mutation rate, as recently verified in the HLI strain MED4 (Osburne et al. 2011). A differential genome streamlining process among lineages, only partially compensated for by acquisition of novel genes by horizontal transfer, has created genetically distinct ecotypes, each with a minimalist genome optimized for life in a specific ecological niche (Johnson et al. 2006; Kettler et al. 2007; Partensky & Garczarek 2010). However these niches, particularly the upper nutrient-poor layer of oceanic waters occupied by HL ecotypes, are among the most stable and the largest ecosystems on Earth reached by solar radiations.

At the opposite side of genome size, an inverse trend, i.e. genome expansion, is thought to have caused a progressive metabolic complexification in some cyanobacterial lineages (Larsson et al. 2011). This phenomenon was crucial for organisms needing to acquire novel metabolic traits that are required for colonization and survival in variable environments such as microbial mats or intertidal zones. The two main mechanisms for genome size increase are gene duplication and horizontal gene transfer (HGT). Gene duplications may create either gene redundancy, useful for increasing the number of transcripts in highly transcribed genes, or may be followed by the genetic divergence of one of the copies that eventually becomes a paralog. This derived gene copy often encodes a novel protein/enzyme with a slightly different function/location/activity from the original molecule and thus increases the physiological plasticity of the organism. A recent comparative study of 58 cyanobacterial genomes showed that the proportion of paralogs and the total paralogous gene copies are correlated to genome size (Larsson et al. 2011). The 8.36 Mbp genome of the marine, unicellular cyanobacterium Acaryochloris marina exhibits the highest numbers in both these categories, but Crocosphaera watsonii and Microcoleus chthonoplastes also have many paralogs, mostly belonging to the functional categories ‘DNA replication, recombination and repair’ and ‘signal transduction’. Members of the former category are mostly transposases (Swingley et al. 2008). These are particularly abundant in the two genomes of C. watsonii sequenced so far (strains WH8501 and WH0003), with e.g. a total of 1,211 in the former strain, including 292 copies of a single transposase sequence (Bench et al. 2011; Larsson et al. 2011). Consequently, in this species, most of the genetic diversity is seemingly generated by transposition of genes or genome fragments (Zehr et al. 2007; Bench et al. 2011). Genotypic variations nevertheless exist among C. watsonii strains, as for instance some strains are capable to synthesize exopolysaccharides, while others are not, and this is directly related to the presence or absence of genes involved in this process. These accessory genes are most often isolated or located in small gene islets across genomes (Bench et al. 2011). This contrasts with the large genomic islands observed in Prochlorococcus and Synechococcus, which constitute privileged insertion sites for laterally transferred genes (Coleman et al. 2006; Kettler et al. 2007; Dufresne et al. 2008). These hypervariable genomic regions are generally thought to be critical for adaptation to local niches (see below).

Although this may seem the ultimate degree of sophistication for prokaryotic organisms, the ability of some cyanobacterial species to undergo cellular differentiation, a phenomenon that occurs in response to environmental stress, is surprisingly not restricted to the largest genomes, as it has been observed in strains with genomes as small as 3.2 Mbp for the freshwater, filamentous strain Raphidiopsis brookii D9 or 5.3 Mbp for the brackish, heterocystous strain Nodularia spumigena CCY9414 (Stucken et al. 2010; Larsson et al. 2011). Cell differentiation includes the transformation of vegetative cells into hormogonia (short filaments used for organism dispersal), akinetes (resistant forms) or heterocysts (cells specialized in the fixation of dinitrogen; Herrero et al. 2004). A less sophisticated form of differentiation also occurs in the marine cyanobacterium Trichodesmium, in which about 15% of the cells are specialized in nitrogen fixation. These so-called diazocytes are located in the central part of filaments (or trichomes) and mainly differ from vegetative cells by their less granular aspect and the presence of nitrogenase (El-Shehawy et al. 2003). Examination of the Trichodesmium erythraeum IMS101 genome shows that it indeed lacks genes involved in the synthesis of the thick cell envelope surrounding heterocysts, which is composed of polysaccharides and glycolipids. However, it possesses hetR, the key regulatory gene in heterocyst differentiation and a few early heterocyst differentiation genes that seem to be critical for diazocyte differentiation.

Genome organization varies greatly depending on species and genome size. The smallest genomes, in particular all isolates of Prochlorococcus and marine Synechococcus sequenced thus far, possess only one chromosome and no plasmids. Metagenomic analyses have however suggested that cells from natural populations of Synechococcus from coastal waters of California might in fact possess one or several small plasmids (Palenik et al. 2009). Plasmids are the rule for larger cyanobacterial genomes. For instance, the marine Cyanothece sp. ATCC51142 possesses six separate DNA elements, a 4.93 Mbp circular chromosome, a 0.43 Mbp linear chromosome and four plasmids ranging in size from 10 to 40 Kbp (Welsh et al. 2008), while Acaryochloris marina MBIC11017 has one circular 6.50 Mbp chromosome and nine plasmids (2–374 kbp in size; Swingley et al. 2008). By allowing rapid lateral transfer of large DNA chunks via conjugation with organisms belonging to the same or other species, the presence of plasmids, combined with an efficient transposition machinery, likely confer on those strains a much higher capacity to acquire new, sophisticated functions than their picocyanobacterial counterparts, which depend mainly on cyanophages for the acquisition of new genes (Clokie et al. 2003; Lindell et al. 2004; Zeng & Chisholm 2012). One of the most striking examples of this phenomenon is likely the recent discovery a diazotrophic Acaryochloris strain (HICR111A) that would have acquired the capacity to fix N2 by lateral transfer of a 60-gene cluster, including 22 nif genes necessary for this process (Pfreundt et al. 2012). It is worth noting in this context that genes specialized in a given function (e.g. ATPase biosynthesis, carbon fixation, light harvesting, etc.) are often clustered in operons or gene regions in cyanobacteria, an organization which allows not only an efficient control of genes participating in the same process but may also permit lateral transfer of several genes at a time in a single step.

Adaptation of marine cyanobacterial genomes to a dynamic environment

The availability of complete or near complete genomes for a number of marine cyanobacteria makes it possible not only to have a global overview of the genetic potential of these organisms, but also, using postgenomic approaches, to study genome (or metagenome) dynamics in response to natural fluctuations of physico-chemical parameters or to biotic and abiotic stresses.

One of most regular phenomena that natural populations of cyanobacteria have to cope with in nature is the alternation of day and night. For these photosynthetic cells, this phenomenon implies strong and fast variations of incident visible irradiance, which in the uppermost layer of the ocean are associated with concomitant fluctuations of UV radiation fluxes and, to some extent, water temperature. Several laboratory studies have dealt with the effect of light-dark (L/D) cycles on the transcriptome of marine cyanobacteria, but have most often neglected the concomitant effects of UV and temperature variations. An interesting example of how cyanobacterial cells deal with L/D cycles is provided by the diazotrophs Cyanothece sp. and Crocosphaera watsonii, which perform oxygenic photosynthesis and dinitrogen fixation in the same cell, two processes that are mutually exclusive, given the high sensitivity of nitrogenase activity with regard to oxygen. To cope with this incompatibility, both organisms fix nitrogen at night, while photosynthesis is restricted to the light period (Toepel et al. 2008; Mohr et al. 2010b; Shi et al. 2010; Aryal et al. 2011; Stockel et al. 2011). This is made possible by a tight synchronization of the whole metabolism triggered by the circadian clock, a molecular mechanism that can maintain a robust diel rhythmicity of the whole transcriptome, even under continuous light conditions (Toepel et al. 2008; Pennebaker et al. 2010). Elvitigala et al. (2009) suggested that the majority of diurnally regulated genes, i.e. those genes that are maximally expressed during the middle of the light or dark periods, are light responsive, while genes that are up-regulated at the beginning of the dark (or subjective dark) period are under circadian control. Another notable example with regard to L/D cycles is the nondiazotroph Prochlorococcus, which lacks kaiA, one the three genes necessary to synthesize the circadian clock. It was shown in P. marinus PCC 9511 that the remaining two genes (kaiBC) are sufficient to make up a minimalist clock that needs to be reset every morning (Holtzendorff et al. 2008; Axmann et al. 2009). This so-called ‘hourglass’ is seemingly robust enough to trigger fine-tuned orchestration of the whole transcriptome over a 24-h period when cells are grown under a L/D cycle (Zinser et al. 2009), but oscillations of the transcriptome and the whole metabolism disappear in a few hours when cells are shifted to continuous light (Holtzendorff et al. 2008). If cyclic visible light is supplemented with UV radiation, the cell cycle timing is affected, as shown by a 2-h shift of the peak of cells in the DNA synthesis phase into the dark period, a strategy that is likely to decrease the risk of UV-induced mutations to DNA (Kolowrat et al. 2010). A comparable shift was also observed for WH7803, a strain representative of the closely related genus Synechococcus (Mella-Flores et al. 2012). However, growth under a modulated L/D cycle with high photon fluxes of visible light (supplemented or not with UV radiations) induced very different diel expression patterns in WH7803 and PCC 9511 for most genes involved in photosynthesis, response against UV and oxidative stress and several other metabolic processes, a difference likely due to distinct light-controlled regulation systems. For instance, genes involved in the biosynthesis of antenna complexes, ATP synthase or in CO2 fixation showed a maximal expression during the day in Synechococcus and the opposite pattern in Prochlorococcus (Mella-Flores et al. 2012). All these studies on the effect of L/D cycles on marine cyanobacteria clearly indicate that (i) transcript levels of most protein-coding genes vary significantly over the day and (ii) diel patterns differ from one pathway, and sometimes from one gene, to another. Consequently, analyses of field metatranscriptomes comprising only one or two data points per day must be interpreted with great care, especially in the case of cyanobacteria such as Crocosphaera which comprise only a small fraction of the total cell abundance and therefore may additionally have only poor metagenomic coverage (see e.g. Hewson et al. 2009).

Several studies have dealt with the effects of nutrient stresses on the transcriptome of marine cyanobacteria. Although nitrogen is the main limiting factor in most oceanic areas, several regions, such as the Mediterranean Sea or the Sargasso Sea, are known to be limited by the availability of phosphorus (hereafter P; Martiny et al. 2009), while others, such as the equatorial Pacific Ocean, are iron-depleted (Cavender-Bares et al. 1999). Natural populations of marine cyanobacteria therefore face a variety of nutrient stresses. P is necessary for the synthesis of ATP, nucleotides and phospholipids and is also required in all regulatory processes involving phosphorylation. To cope with low P availability in open ocean waters, Prochlorococcus cells, which are often the dominant organisms in these waters, have developed a variety of strategies, including the preferential synthesis of sulfolipids and glycolipids over phospholipids (Van Mooy et al. 2006). Furthermore, both natural Prochlorococcus populations thriving in P-depleted waters and strains isolated from these areas were shown to possess a much larger set of genes involved in P uptake and assimilation than populations/strains from P-replete areas (Martiny et al. 2006, 2009; Coleman & Chisholm 2010). This variability, which is independent of the phylogenetic distance between strains, translates into a wide diversity of P stress responses within the Prochlorococcus genus. For instance, P depletion provoked the upregulation of 30 genes in the HL-adapted MED4 vs. 176 in the LL-adapted MIT9313, but only seven were common to both strains (Martiny et al. 2006). This common set encodes proteins involved in P metabolism, including the response regulator PhoB, a transport system for orthophosphate (PstABCS) and PhoE, which is involved in the transport of orthophosphate across the outer membrane. Surprisingly, MIT9313 lacks an ortholog of the alkaline phosphatase gene phoA, the most highly up-regulated gene in MED4. Furthermore, MIT9313 has no functional PtrA (a transcription factor of the cyclic AMP receptor family) nor sensor kinase PhoR, two regulators of P metabolism (Scanlan et al. 2009), although there might be compensation mechanisms (Martiny et al. 2006). Altogether, MIT9313, which was isolated from the Gulf stream at 135 m, i.e. in the vicinity of the phosphacline, was seemingly less well adapted to P starvation than MED4, isolated from surface waters of the Mediterranean Sea, a P-limited area. The effects of P depletion was also studied in the marine Synechococcus strain WH8102 and caused a strong upregulation (>2-fold) of 36 genes and the downregulation of 24 others (Tetu et al. 2009). Several transiently upregulated genes were involved in transport (outer membrane porins, P-specific ABC transporters and solute-binding proteins), P uptake (alkaline phosphatases) or regulation of P metabolism (see also Ostrowski et al. 2010). Interestingly, two upregulated genes (swmA and B) coded for outer membrane proteins potentially involved in swimming motility, suggesting that P stress may either result in a reorganization of the cell envelope, perhaps to accommodate P-specific porins, or in an increased cell motility that may help Synechococcus cells to more efficiently scavenge P. Both Prochlorococcus and Synechococcus possess genes for taking up phosphonates as an alternative P source (Ilikchyan et al. 2009) and at least one Prochlorococcus strain (MIT9301) can also use phosphite (Martinez et al. 2011). Like these two picocyanobacteria, Crocosphaera watsonii possesses a high-affinity phosphate transport system (PstSCAB), but unlike those it also has the capacity to hydrolize phosphomonoesters (Dyhrman & Haley 2006). In contrast, no clear homologs of genes for phosphonate uptake and hydrolysis could be identified in the WH8501 genome, as confirmed by the absence of growth of this strain on this P source.

Like P stress, iron depletion is thought to have deeply influenced the structure of the Prochlorococcus genome. For instance, the divinyl-chlorophyll a/b-binding antenna complex protein Pcb is closely related to IsiA, a chlorophyll a-binding protein induced during iron deficiency in typical cyanobacteria (La Roche et al. 1996). Both Pcb and IsiA antenna form an 18-molecule ring around photosystem I (Bibby et al. 2003). Furthermore, the only gene encoding this PSI antenna in Prochlorococcus sp. MIT9313 and one of the two such genes in P. marinus SS120 are also iron-induced, while this gene has been lost in MED4. It is most likely that the constitutively expressed gene(s) coding the 10-molecule Pcb antenna surrounding photosystem II (present in 1–6 copies, depending on strains), arose by duplication then divergence of this isiA-like gene (Garczarek et al. 2001). The effect of low Fe stress on the whole transcriptome was compared between P. marinus MED4 and MIT9313 (Thompson et al. 2011). Surprisingly, the latter strain could grow at 10-fold lower dissolved inorganic Fe concentrations than the former strain, possibly due to a more efficient iron transport system, a better protection mechanism against the deleterious effects of iron depletion, such as oxidative stress and/or a lower cellular iron requirement than MED4. Again, only a handful of the 1159 orthologs common to MED4 and MIT9313 were found to be differentially expressed in both strains, including the downregulated petF gene (encoding ferredoxin) and the upregulated isiB (encoding flavodoxin), idiA (coding for an iron-deficiency-induced gene) and one of the many hli genes coding for HL-induced proteins (Thompson et al. 2011). The expression of over a hundred additional genes also changed in response to Fe stress in both strains but were not the same in the two strains (Thompson et al. 2011). This highlights the tremendous variability of response to iron stress (and, more generally, nutrient stress) that may exist within a single genus, a variability that may again complicate interpretations of metatranscriptomic data.

Despite its fairly recent discovery (<25 years ago; Chisholm et al. 1988) Prochlorococcus is clearly one of the marine cyanobacteria that has most benefited from the advent of genomics both in culture and in the field. Besides the availability of many genomes in public databases (including 14 accessible to date and a hundred more genomes from single wild Prochlorococcus cells currently in progress; Kelly et al. 2012), a wide set of Prochlorococcus-specific phages has been isolated and sequenced, and a number of studies have enlightened the intimate relationships that link them to their hosts (Lindell et al. 2005, 2007; Avrani et al. 2011; Zeng & Chisholm 2012). This, combined with its high natural abundance in tropical and temperate oceanic waters (Partensky et al. 1999), makes Prochlorococcus one of the rare model organisms that can be studied at all scales of organization, by a so-called ‘cross-scale systems biology’ approach (Coleman & Chisholm 2007). Indeed, it is possible with this organism to link gene content and genome dynamics not only to the local biotic or abiotic environment of the cell (or population of cells), but also to the ecosystem and even to the global ocean. Although the lack of a reliable genetic system somewhat hinders the determination of gene function in this genus, it must be noted that Prochlorococcus possesses only a handful of specific genes (Partensky & Garczarek 2010) and heterologous expression approaches have already been used successfully to characterize some of them (Stickforth et al. 2003; Satoh & Tanaka 2006; Wiethaus et al. 2010). Furthermore, for other genes, inactivation is feasible in the closely related genus Synechococcus (Brahamsha 1996). The latter cyanobacterium also represents a good candidate for cross-scale approaches, given its abundance, ubiquity and complementarity in terms of ecophysiology with regard to Prochlorococcus (Partensky et al. 1999; Scanlan et al. 2009). Like for the latter genus, cyanophages also play a key role in the genome evolution of Synechococcus spp. and many phages with various degrees of host specificity are available in culture (Sullivan et al. 2003; Millard et al. 2009).


Algal diversity

Diversity of marine photosynthetic eukaryotic algae

The extant eukaryotic marine algae have representatives in almost all the eukaryotic super-groups (Fig. 2). The main distinctive features of these groups are summarized in Table 2. One important feature that unifies these diverse organisms is the presence of a cyanobacteria-derived plastid, which has been acquired through several different processes of endosymbiosis, depending on the algal group (Box 1). These endosymbiotic events have had a major impact on the genomes of the photosynthetic eukaryotes, because they involved the transfer of large numbers of genes from the endosymbiont to the host nuclear genomes.

Figure 2.

Simplified scheme representing the eukaryotic tree of life [adapted from (Baldauf 2008) and (Cock & Coelho 2011) with permission from the editorial offices of the Journal of Systematics and Evolution and the Journal of Experimental Botany]. Lineages that include marine photosynthetic algae are indicated in bold type.

Table 2. Main groups of marine photosynthetic organisms. The main distinctive features are provided. The putative primitive organisms involved in the endosymbiotic events that gave rise to plastids are indicated (this topic is still debated for some groups, see text for details). Nucleomorphs are remnants of the endosymbionts nuclei
Groups of marine photosynthetic organismsOrganization of thallusNature of carbon reserve and locationGeneral distinctive features of the groupChlPBSNb of plastid mem-branesPutative origin of plastids
  1. Chl, main chlorophyll form; PBS, phycobilisomes. Features of marine photosynthetic organisms groups were compiled from Lee (2008); Michel et al. (2010b); Myklestad & Granum (2009)

  2. a

    Starch has only been found in Group V strain Clg1 (Deschamps et al. 2008).

  3. b

    Acaryochloris possesses chl d as the main chlorophyll as well as peculiar phycobilisomes.

  4. c

    Chl c is absent in the class Eustigmatophyceae.

  5. d

    Pigmentation (as well of number of membranes surrounding plastids) may vary depending on the endosymbiotic origin of plastids.

PBS-containing cyanobacteriaUnicellular, colonial, or filamentousGlycogenaNo flagella a b Yesb
Green oxyphotobacteria (prochlorophytes)Unicellular, filamentousGlycogenNo flagella Stacked thylakoidsa, bNo  
Rhodophyta (red algae)Most species multicellularStarch in cytoplasmNo flagella a Yes2Primitive cyanobacteria
Chlorophyta (green algae)Multicellular, unicellular or colonialStarch in plastid stromaStellate structure linking microtubules in flagellar basea, bNo2Primitive cyanobacteria
Heterokontophyta (incl. diatoms, pheophytes and other classes)Pheophytes multicellular, other groups unicellular or colonialLaminaran or chrysolaminaran (branched β-1,3-glucans) in cytoplasmTypically 2 flagella, one bearing tripartite tubular flagellar hairs (mastigonemes) and a trailing hairless (smooth) flagellum.a, ccNo4Primitive red alga
DinophytaUnicellular or colonialStarch in cytoplasm

Typically 2 dissimilar flagella, insertion variable.

Flattened vesicles beneath plasma membrane (eventually containing cellulose plates forming a theca)

Peculiar nucleus with permanently condensed chromosomes

a, cdNo3 in most speciesdPrimitive red alga in most species. Primitive green alga, haptophytes or heterokontophytes in some species
ChlorarachniophytaUnicellular or colonialChrysolaminaran (branched β-1,3-glucans) in cytoplasmAmoeboid cells have filipodia, flagellated cells have a single flagelluma, bNo4Primitive green alga (nucleomorph present)
EuglenophytaUnicellularParamylon (linear β-1,3-glucan) in cytoplasm

Flagella inserted into a reservoir

Peculiar type of closed mitosis

a, bNo3Primitive green alga
CryptophytaUnicellularStarch in periplastid space

2 flagella with bipartite tubular mastigonems

Ventral invagination with a vestibulum covered with ejectosomes

Cell wall composed of proteinaceous plaques

a, cYes4Primitive red alga (nucleomorph present)
HaptophytaUnicellularChrysolaminaran (branched β-1,3-glucans) in cytoplasm

Typically 2 flagella and a haptonema (thread like structure situated between the 2 flagella)

Cell body covered by scales (which can be calcified)

a, cNo4Primitive red alga

Box 1. The endosymbiotic origin of plastids

Most of the major eukaryotic lineages include at least some photosynthetic members that possess plastids as one of their cellular organelles. The vast majority of these plastids are thought to have their origin, either directly or indirectly, in a single primary endosymbiosis event that occurred early during the emergence of the green (Archaeplastida or Plantae) lineage, i.e. around 1.25–1.60 Gyr ago (Yoon et al. 2006; Hackett et al. 2007; Reyes-Prieto et al. 2007; Simon et al. 2009). This primary endosymbiosis event involved the capture of a cyanobacterium by a heterotrophic eukaryotic host cell, followed by extensive physiological and genetic modifications that transformed the engulfed cell into an organelle (Gray & Doolittle 1982; Delwiche et al. 1995; McFadden & van Dooren 2004). The primary plastid was then inherited by each of the major Archaeplastida groups, i.e. the glaucophytes and the red and green algae. Two processes were particularly important for the enslavement of the cyanobacterium and the integration of the emerging organelle into the host cell physiology: 1) a large-scale transfer of genes from the cyanobacterial genome to the host cell nucleus and 2) the evolution of a sophisticated protein import system that allowed nuclear-encoded proteins to be transported across the double membrane of the plastid to carry out their functions within the organelle. It is also important to note that a large number of the endosymbiont's genes would have simply been lost during the endosymbiosis process.

The plastids of algae from groups other than the Archaeplastida were derived indirectly from this primary endosymbiosis via events in which a eukaryotic host cell captured another, plastid-containing eukaryotic cell and repeated the enslavement process (Keeling 2004; Yoon et al. 2006). This process of indirect plastid acquisition is called secondary endosymbiosis. In some cases, this ‘russian doll-like’ process of plastid acquisition went even one step further with the capture of an algal cell resulting from a secondary endosymbiosis by another heterotrophic host, i.e. via a tertiary endosymbiotic event. One of the consequences of the serial captures involved in secondary endosymbiosis is that plastids derived from this process possess either one or two additional surrounding membranes (presumably corresponding originally to the endosymbiont's plasma membrane and/or the host-derived vesicle in which the symbiont was captured). The presence of three or four plastid-surrounding membranes added an additional complication for protein import into the organelle and nuclear-encoded plastid proteins in these organisms.

Like the primary endosymbiosis, secondary and tertiary endosymbiotic events also appear to have involved large-scale transfer of genetic material from the endosymbiont to the host nucleus (together with large-scale loss of endosymbiont genes), although the process was more complex because it involved the transfer of genes both from the plastid (organellar) genome and from the endosymbiont's nuclear genome. In some lineages, gene transfer and gene loss have led to complete loss of the endosymbiont's nuclear genome, but in others a remnant of the genome (the nucleomorph) is still present (Archibald 2009).

Interestingly, the primary endosymbiotic event has occurred more than once during Evolution. Indeed, the filose amoebae Paulinella appears to have acquired its nascent plastid (the chromatophore) by independently capturing a cyanobacterium, which has since undergone a process of genome reduction (Nowack et al. 2008; Yoon et al. 2009; Reyes-Prieto et al. 2010).

Marine green and red algae of the super-group Archeaeplastida

The green lineage (green algae and embryophytes) and red algae, together with the exclusively freshwater glaucophytes, belong to the Archaeplastida (also called Plantae). Plastids in this group were acquired by what appears to have been a single primary endosymbiosis event (see ;Box 1 and below).

The green algae and embryophytes share a combination of unique biochemical and ultrastructural features (Table 2). Consensus higher-level molecular phylogenies, as well as biochemical and ultrastructural evidence suggest an ancient divergence separating eukaryotic green algae into two major monophyletic lineages, the Chlorophyta and Streptophyta (Leliaert et al. 2011). Only two green algal groups, belonging to the phylum Chlorophyta are well represented in marine environments, the Ulvophyceae and the prasinophytes. The Ulvophyceae belong to the core chlorophytes and include multicellular green seaweeds that abound in coastal habitats. In temperate waters, they proliferate in spring and summer and can, in eutrophic waters, create green tides. In tropical habitats, they form a perennial canopy. The prasinophytes are a paraphyletic group of early diverging lineages that occur mainly in pelagic habitats. Multi-gene phylogenetic analyses and careful examination of biochemical and ultra-structural studies are beginning to shed light on these early divergences (Viprey et al. 2008; Leliaert et al. 2011) and have allowed the description of several new classes. Members of the recently described class Mamiellophyceae (Marin & Melkonian 2010), such as the picoplanktonic genera Micromonas and Ostreococcus, are especially abundant in mesotrophic to eutrophic, near-shore waters, where they can form blooms (O'Kelly et al. 2003; Not et al. 2004). The other prasinophyte lineages appear less diversified. Some of these lineages were identified by environmental sequencing of natural communities (Viprey et al. 2008; Lepère et al. 2009; Shi et al. 2009). Microalgal species belonging to the class Trebouxiophyceae, a sister taxon to the Ulvophyceae, are less prominent in marine habitats. This class occurs mostly in freshwater or terrestrial habitats (e.g. in lichen symbiotic associations), but some genera such as the multicellular Prasiola, or some species of the coccoid genus Chlorella can penetrate into brackish and marine waters. Sequences belonging to this group have been identified in environmental DNA sequence libraries corresponding to marine environments (Medlin et al. 2006).

To date four complete marine green algal genomes have been published (Table 1): the Mamiellophyceae Ostreococcus tauri (Derelle et al. 2006), Olucimarinus (Palenik et al. 2007) and two isolates of Micromonas pusilla (Worden et al. 2009). Several other genome sequencing projects with additional strains of the genus Ostreococcus and another Mamiellophyceae, Bathycoccus prasinos are currently in progress (Coelho et al. 2010; Pagani et al. 2012).

Red algae [700 genera, 4000–6000 species according to de Reviers (2003)] include mostly multicellular seaweeds and form a homogeneous group sharing several ultrastructural characters (Table 2). Almost all red algae are marine and benthic. Sequencing of the genomes of the marine red algae Chondrus crispus and Porphyra umbilicalis is ongoing (Table 1). In addition to these nuclear genomes, 26 complete green algal plastid genomes have been sequenced and assembled. These data, together with the genome sequence of a nonmarine red alga (Matsuzaki et al. 2004), provide an invaluable resource for in-depth analysis of genome organization and the processes of eukaryotic genome evolution as well as to infer evolutionary events that shaped the modern Plantae.

Golden-brown algae belonging to the Heterokontophytes (plastid-bearing stramenopiles)

Ultrastructural, biochemical, molecular phylogenetic and comparative genomic approaches have revealed that diatoms, together with multicellular brown seaweeds and other golden-brown algae form the heterokontophytes, a clade within the phylum Heterokonta (Cavalier-Smith 1986). Heterokonta (or Stramenopiles) comprise all eukaryotes that possess a forward directed flagellum with mastigonemes and a trailing hairless flagellum, as well as all derived species that have lost one or both flagella (Cavalier-Smith 1986; Graham & Wilcox 2000) (Table 2). Heterokontophytes possess secondary endosymbiotic plastids (Box 1). The phylogenetic relationships among the 12 classes of plastid-bearing heterokontes is still debated (Riisberg et al. 2009). Heterokontophytes are present in freshwater, marine and terrestrial habitats (Graham & Wilcox 2000). In marine habitats, the most diverse and ecologically important groups are the diatoms (class Bacillariophyceae) and the brown seaweeds (class Phaeophyceae). Whole genome sequences are currently available for members of the diatoms, brown seaweeds and pelagophytes (Table 1).

Diatoms: Bacillariophyceae are characterized by distinctive frustules that surround the cell membrane. This group includes an enormous diversity of both benthic and planktonic species [over 105 species according to Mann (1999)]. Planktonic diatoms are apparently extremely well adapted to grow in mixed turbulent waters and can form resting stages to overcome periods of adverse conditions. Diversity and evolution within this group has been reviewed (Kooistra et al. 2007; Medlin 2011). Diatoms have long been classified into two groups, centrics and pennates, based on characteristics of valve and ornamentation symmetry, mode of sexual reproduction and plastid number and structure. Today, three main groups are distinguished based on molecular phylogenies: radial centric diatoms, multipolar centric diatoms (including Thalassiosirales) and pennates [further separated into raphid and araphid; and this classification is supported by morphological and cytological data (Medlin & Kaczmarska 2004)].

Whole genomes have been sequenced for the diatoms Thalassiosira pseudonana (Armbrust et al. 2004) and Phaeodactylum tricornutum (Bowler et al. 2008). Additional ongoing diatom genome sequencing projects include: T. oceanica, T. rotula, Fragilariopsis cylindrus and Pseudo-nitzschia multiseries (Coelho et al. 2010; Pagani et al. 2012).

Pheophytes: members of the class Phaeophyceae form a homogeneous group of about 265 genera and 900–2000 species (van den Hoek et al. 1995; de Reviers 2003; Cock et al. 2011), all of which are multicellular. Pheophytes are characterized by unique reproductive structures: uni- and pluri-locular sporangia and are the only heterokontophytes that possess plasmodesmata. Pheophyte morphology is extremely diverse. They are almost exclusively marine and found from the equator to the poles, generally attached to a substratum in their intertidal and subtidal habitats (Fucales and kelps). Exceptions include Sargassum and Pylaeilla, which can be found in the pelagic environment. The only Phaeophyceae genome available so far is that of the filamentous brown alga Ectocarpus (Cock et al. 2010b, 2012).

Other heterokontophytes: Within this group, 13 classes of golden-brown algae have been described in addition to diatoms and pheophytes: the Chrysophyceae, Dictyochophyceae, Eustigmatophyceae, Pelagophyceae, Phaeothamniophyceae, Pinguiphyceae, Raphidophyceae, Xanthophyceae, Bolidophyceae, Chrysomerophyceae, Symchromophyceae, Aurearenophyceae and Schizocladiophyceae (Fig. 2). In recent years, detailed studies on the diversity of these groups have led to taxonomic rearrangements as well as descriptions of numerous new genera, species and even of several new classes (Kawachi et al. 2002; Moriya et al. 2002; Kawai et al. 2003; Kai et al. 2008). These discoveries highlight the limits of our current knowledge of the higher taxonomic levels and the levels diversity present within the heterokontophytes. Among these groups, dictyochophytes, raphidophytes, pelagophytes and chrysophytes appear to be the most ecologically important. Dictyochophytes and raphidophytes are perennial members of coastal phytoplankton assemblages. Harmful blooms of raphidophytes (e.g. Chatonella, Fibrocapsa, Heterosigma) regularly cause mass death in European and Asian fish farms (Edvardsen & Imai 2006). Aureococcus and Aureoumbra, two picoplanktonic members of the pelagophytes, are also able to cause massive blooms (brown tides) in eutrophic coastal environments (Gobler & Sunda 2006). Most described photosynthetic chrysophytes are freshwater taxa, and little information is available in the literature about their diversity and ecology in marine environments. However, marine chrysophytes may be more widespread than previously thought: plastidial 16S rRNA gene sequences assigned to this group has been retrieved from various environments within the last few years (Fuller et al. 2006; McDonald et al. 2007). Other groups have been poorly studied. Molecular analyses based on several genes identified in complex environmental samples indicate that the discovery of further novel taxa can be expected within the heterokontophytes (Massana et al. 2002). The only whole genome available for this group of taxa is the genome of the pelagophyte Aureococcus anophagefferens, a species that can form harmful blooms (Gobler & Sunda 2006; Gobler et al. 2011). Whole genome sequencing of several other heterokontophytes is in progress [i.e. of the genera Ochromonas, Bolidomonas, Chattonella, Pinguiococcus, Nannochloropsis, belonging respectively to the Chrysophyceae, Bolidophyceae, Raphidophyceae, Pinguiophyceae and Eustigmatophyceae; (Pagani et al. 2012)].

Dinoflagellates (super-group Alveolata)

Dinoflagellates, together with apicomplexans and ciliates, form the well-supported superphylum Alveolata, named for the continuous layer of flattened vesicles that lie beneath the plasma membrane. Dinoflagellates constitute one of the main groups of marine protists. In marine environments they are important primary producers, both as free-living phytoplankton and as symbionts of reef-forming corals (Symbiodinium). Many species are heterotrophic or mixotrophic.

Dinoflagellates display tremendous morphological diversity. Some members of this group (e.g. Dinophysis, Alexandrium, Prorocentrum) produce toxins that have negative impacts on marine ecosystems and fisheries and can cause poisoning in humans. Based on molecular phylogenies, three main groups of dinoflagellates can be defined: Oxyrrhinales, Syndiniales and the core dinoflagellates comprising most of the described species (Wisecaver & Hackett 2011). A large uncultivable diversity of marine dinoflagellates, termed Marine Alveolates Group I (ALV1), has been described by 18S rDNA environmental surveys and appears to be composed primarily of parasitic species (Guillou et al. 2008). Oxyrrhinales, Syndiniales and ALV1 taxa occupy a basal position in the dinoflagellate tree and are heterotrophic. Only about half of the core dinoflagellates have plastids. Within these core dinoflagellates, traditional orders are distinguished based on morphological distinctive features, such as thecal plate tabulation patterns or the absence of theca altogether (Graham & Wilcox 2000). Molecular phylogenetics has shown that many of these orders are poly- and paraphyletic and thecae characteristics have evolved multiple times within the core dinoflagellates.

No complete dinoflagellate genomes are available yet because of the technical problems related to the very large genome sizes in this group (see below). However, a genome sequencing project is ongoing for the coral symbiont genus Symbiodinium and EST sequencing has been carried out or is ongoing for several taxa, particularly toxic species such as Alexandrium tamarense (Pagani et al. 2012).

Haptophytes and cryptophytes

The position of the two remaining groups in the tree of life, haptophytes and cryptophytes, remains contentious. Haptophytes and cryptophytes have been proposed to branch together with several heterotrophic groups in the Hacrobia, but recent phylogenomic analyses suggest that they have separate origins (Burki et al. 2012). These analyses support haptophytes as a sister taxon to the stramenopiles, alveolates and rhizaria (the so-called stramenopiles, alveolates and rhizaria (SAR) group, Burki et al. 2008) while they suggest that cryptophytes may branch with the super-group Archeaeplastida (that include green and red algae).

Haptophytes are an almost exclusively photosynthetic lineage that is widespread in the ocean. They are characterized by the presence of a thread-like structure situated between the two flagella, the haptonema (Table 2). This appendage, which is unique to this group, is thought to play a role in prey capture in some species (Kawashi et al. 1991). Nonflagellated stages (either solitary or colonial) have been observed for some species (e.g. the nuisance causing Phaeocystis). Currently described species of Haptophytes are nanoplanktonic (2–20 μm) but recent genetic analyses of natural samples identified a wide diversity of previously undescribed picoplanktonic (0.2–2 μm) haptophytes (McDonald et al. 2007; Liu et al. 2009). Both phylogenetic and morphological studies support the division of the Haptophytes into two major groups: the Pavlovophyceae, that possess two unequal flagella and can be covered by small dense bodies called knob scales, and the Prymnesiophyceae, which gathers the plate scale-bearing taxa. The latter group includes the coccolithophores, which are characterized by the presence of calcified scales and form massive blooms in temperate waters.

The complete genome sequence of an Emiliania huxleyi strain is available (Table 1), while the sequencing of Phaeocystis globosa and P. antarctica strains is in progress (Pagani et al. 2012).

Cryptophytes are unicellular organisms with two flagellae that have bipartite tubular mastigonemes (Table 2). They are found in phytoplankton assemblages in coastal marine habitats. No sequences of cryptophyte nuclei are available to date, but nucleomorphs of the plastid-bearing marine species Guillardia theta (Douglas et al. 2001) and Hemiselmis andersenii (Lane et al. 2007) have been published. The nucleomorph sequence of the nonphotosynthetic but plastid-bearing freshwater species Cryptomonas paramecium has also been published (Tanifuji et al. 2011).

Euglenophytes (super-group Excavates)

The Euglenoida belong to the Euglenozoa, a very diverse group of protists and include primary heterotrophs, phototrophs and secondary heterotrophs. Euglenozoa are characterized by having flagellae inserted into a reservoir (Table 2). Recent phylogenetic analyses of nuclear-encoded SSU rDNA showed that all plastid-containing euglenoids, as well as secondary heterotrophs, form a monophyletic lineage, allowing the definition of the class Euglenophyceae (Marin et al. 2003; Marin 2004), which comprises two orders, the Euglenales and the Eutrepsiellales, the latter being predominantly marine. In the phototrophic Euglenozoa, chloroplasts have three envelope membranes (Marin 2004).

Whole genomes sequences of marine euglenophytes have not been published yet, but a sequencing project is underway for a strain of the freshwater species Euglena gracilis (Pagani et al. 2012). Whole genomes of several strains of the parasitic genera Trypanosoma, and Leishmania, both of which belong to the super-group Excavata, have been published (Pagani et al. 2012).

Chlorarachniophytes (super-group Rhizaria)

Chlorarachniophytes form a small group of marine algae (10 genera described to date) that are widely distributed from temperate to tropical coastal environments as well as in the open ocean. Cell shapes can be amoeboid, coccoid or flagellate. The typical amoeboid cells have several filopodia (pseudopodia) and the flagellated cells have a single flagellum (Hibberd & Norris 1984). Each cell possesses a nucleus and several green plastids bound by four membranes that contain chlorophyll a and b (Hibberd & Norris 1984). The plastid possesses the vestigial nucleus (the nucleomorph) of the green algal endosymbiont that was engulfed by the ancestral cercozoan protist host and integrated as a plastid (McFadden et al. 1994; Gilson & McFadden 1999; Ishida et al. 1999).

The sequencing of whole genomes of the chlorarachniophytes Bigelowiella natans and Chlorarachnion reptans is in progress (Pagani et al. 2012). A complete nucleomorph genome sequence has been published for Bigelowiella natans (Gilson et al. 2006).

Diversity: the tip of the iceberg?

Our understanding of biodiversity patterns and the ecological roles of algae in natural communities has been deepened both by field and laboratory studies, involving microscopic identification and quantification of the dominating taxa, and by physiological analyses of cultured isolates. For unicellular algae, and in particular for the smallest size classes (pico- and nanophototrophic protists), these approaches have however failed to decipher the complexity of natural assemblages. The use of DNA-sequence based approaches and NGS is rapidly changing this situation. The main target for sequencing efforts has been SSU rRNA genes (although other markers, such as the plastidial 16SrRNA gene have also been used, Fuller et al. 2006), and the general picture emerging from these studies is illustrated in the form of rank abundance curves, which indicate the relative abundance of Operational Taxonomic Units (groups of sequences that group together, based on sequence similarity). These approaches have revealed an enormous diversity of small eukaryotes highlighting the presence of many uncultivated lineages at different taxonomic levels (Díez et al. 2001; López-García et al. 2001; Moon-van der Staay et al. 2001; Fuller et al. 2006; Not et al. 2007; Liu et al. 2009). Flow cytometer cell sorting of photosynthetic populations based on size and pigment composition followed by amplification and cloning of the 18S rRNA nuclear gene (Shi et al. 2009; Yoshida et al. 2009; Cuvelier et al. 2010; Marie et al. 2010) or of the 16S rRNA plastid gene (Jardillier et al. 2010; Shi et al. 2011) have confirmed the importance of uncultivated microorganisms within photosynthetic pico- and nano-plankton. DNA-based approaches have also demonstrated that natural communities include an exceptionally large amount of ‘rare taxa’ (the long tail in rank abundance curves). The ecological role, physiological and biochemical properties of these uncultivated taxa still remain to be determined [reviewed in (Massana & Pedrós-Alió 2008)].

Targeted metagenomic approaches have recently provided insights into the ecology and physiology of uncultivated MPOs. This strategy allowed Yoon et al. (2011) to obtain genome data from single cells of the recently described candidate algal phylum Picobiliphyta (Not et al. 2007). Genome analyses suggested that those cells are heterotrophic. A similar approach conducted on complex marine samples (retrieved using cell sorting followed by whole genome amplification) has also been applied to uncultivated haptophyte communities (Cuvelier et al. 2010). Such approaches should provide a combined access to diversity, ecology and physiology of uncultivated groups.

Dynamic diversity: hints from the genomes of bloom-forming algae

Algal blooms are an example of the dynamic nature of planktonic ecosystems. Algal blooms in a strict sense are entirely natural phenomena which have occurred throughout recorded history. However, it has been hypothesized that the development of these blooms will increase under the scenario of global change (Edwards & Richardson 2004). It has been proposed that the seasonal cycles of phytoplankton blooms are related to biological traits controlled by genetic diversity in individual species or ecotypes. But what are the factors and regulatory mechanisms that control these events? The role of biotic and abiotic factors in the generation and termination of blooms remains controversial. Several factors have been proposed including light quality and intensity, nutrient availability, pathogens and grazers [reviewed in (Bowler et al. 2010)]. For instance, the harmful alga Aureococcus anophagefferens out-competes co-occurring phytoplankton in estuaries with elevated levels of dissolved organic matter and turbidity and low levels of dissolved inorganic nitrogen. The analysis of its complete genome sequence permitted comparisons with the gene complements of six competing phytoplankton species identified through metaproteomics (Gobler et al. 2011). This eco-genomic approach allowed the identification of gene sets that may be favourable under the environmental conditions present during blooms. Collectively, these findings suggest that anthropogenic activities resulting in elevated levels of turbidity, organic matter and metals have created a niche within coastal ecosystems that ideally suits the unique genetic capacity of A. anophagefferens and thus has facilitated the proliferation of this (and potentially other) bloom forming species.

The recent development of molecular tools to examine genetic diversity has revealed differences in phytoplankton taxa across geographic scales and is providing insights into the physiology and ecology of blooms (Erdner et al. 2011). Genotypic analyses of the dinoflagellate Alexandrium fundyense provided clear evidence of succession of subpopulations during a bloom and showed that selection can act on the timescale of weeks to significantly alter the representation of genotypes within a population. Anderson et al. (2012) reviewed these and other aspects of the biology of algal blooms, including the role of bacteria and grazing by zooplankton, and approaches to study the ecological and genetic basis for the production of toxins and allelochemicals. Meta- and eco-genomic approaches, and NGS in general, will clearly help to explore not only the dynamic diversity of bloom-forming algae but will also shed light on the underlying metabolic and cellular processes involved in bloom formation.


Macroevolution of algae: endosymbiosis as a major evolutionary driver

There is now strong evidence that eukaryotic genomes harbour genes of mixed ancestry, a proportion having been acquired by the processes of endosymbiotic gene transfer (EGT) and HGT. These endosymbioses spread oxygenic phototrophy to a diverse array of algal eukaryotic algal lineages. Deciphering the footprints of these events in host nuclei in order to propose evolutionary scenarios is one of the most fascinating but controversial topics in eukaryote evolution (Archibald 2009; Elias & Archibald 2009; Kim & Archibald 2009).

Plastid acquisition via a cyanobacterium–eukaryote endosymbiosis is a rare evolutionary event (Box 1). This process requires drastic modifications of both host and endosymbiont genomes and host cellular mechanisms (protein targeting machineries) to incorporate the endosymbiont as an organelle. This is why a single endosymbiotic event in the common ancestor of the green, red algae, and glaucophytes has proved difficult to confirm by single nuclear-encoded gene phylogenies (Kim & Graham 2008) but is recovered in the most recent phylogenomic analyses (Inagaki et al. 2009). However, additional genomic studies are required to understand the early endosymbiotic events and the precise nature of the ancestral host.

The exact mechanism of the acquisition of plastids by the Plantae is still debated, but the details of the processes by which the other eukaryotic super-groups acquired their plastids are even more controversial. Recent analyses of genomic data have not resolved these controversies, but instead, have brought new questions and challenges.

It is generally accepted that plastids of the phototrophic stramenopiles, haptophytes, cryptophytes, euglenophytes and chlorarachniophytes have evolved from members of the Plantae (green or red algae) that were engulfed by nonphotosynthetic protists. This process would have involved a second round of EGT, this time from the endosymbiont nucleus, to that of the secondary host, as well as the acquisition of a second protein-import mechanism, in addition to that used by primary plastids (Elias & Archibald 2009). In these cases, the host genome acquired not only cyanobacterial genes via EGT and endosymbiotic gene replacement (Stiller et al. 2009), but also eukaryotic sequences from the nucleus of the endosymbiont.

Definitive evidence for the occurrence of these ‘secondary endosymbiosis’ events came from the study of algae with two nuclei, the host nucleus and a eukaryote endosymbiont-derived nucleus, the nucleomorph (Greenwood et al. 1977; McFadden et al. 1994). Nucleomorphs have been detected in the chlorarachniophytes as well as in the cryptophytes. Members of these groups consequently harbour four genomes (nuclear, nucleomorph, mitochondrial and plastid). Nucleomorphs of chlorarachniophytes and cryptophytes are strikingly similar both in size (around 1 Mbp) and functional distribution of genes. In both groups, nucleomorph genomes consist of three chromosomes with sub-telomeric rDNA (Archibald 2011). However, cryptophyte plastids were derived from a red alga, whereas chlorarachniophytes acquired their plastid from a primitive green alga. The evolutionary pressures responsible for similarities in nucleomorph genome organization and structure in these distantly related groups, and the biological significance of these similarities are not yet understood.

The number of secondary (and tertiary) endosymbioses that gave rise to the plastids found in the heterokontophytes, dinoflagellates and haptophytes is unknown (Archibald & Lane 2009). In these groups, the nucleus of the endosymbiotic algal cell has apparently disappeared, its genes having been eliminated or transferred to the host nucleus.

Plastid genes of euglenophytes, like those of chlorarachniophytes, group with green algal sequences in phylogenies but genome comparisons suggest that they evolved from distinct lineages of green algae (Turmel et al. 2009). This argument, together with the fact that host components belong to two different super-groups (the Excavates and the Rhizaria) indicates that euglenophytes and chlorarachniophytes acquired their plastids through independent endosymbiotic events.

In phylogenies, plastids of haptophytes, heterokontophytes and peridinin-containing dinoflagellates form a group with red algal plastids (Le Corguillé et al. 2009; Yoon et al. 2002). These plastids are thought to be derived from a secondary endosymbiosis involving primitive eukaryotes and red algae. The number of secondary endosymbiotic events that gave rise to these groups is highly controversial (Archibald & Lane 2009). According to Cavalier-Smith's ‘chromalveolate’ hypothesis a single endosymbiosis event was at the origin of both the ‘chromists’ (plastid-bearing cryptophyte, stramenopiles and haptophytes) and the alveolates (including dinoflagellates) (Cavalier-Smith 1999). If this hypothesis holds true, then secondary plastids must have been lost in many nonheterotrophic lineages within the alveolates and stramenopiles. Some evidence for the presence of ‘algal’ genes in nonphotosynthetic protist genomes have been cited as evidence for the ‘Chromalveolate hypothesis’ (e.g. in oomycetes; Keeling 2009). Single locus phylogenetic analyses and recent phylogenomic analyses of nuclear encoded genes in representatives of red plastid-bearing algae did not all support the Chromalveolate hypothesis (Keeling 2009), but identified new relationships between subsets of algal groups: the SAR group {stramenopiles, alveolates and rhizaria; (Burki et al. 2008) or Hacrobia [haptophytes and cryptophytes; (Burki et al. 2009; Okamoto et al. 2009)]}. Moreover, the fact that stramenopiles and alveolates possess strikingly different carbon storage pathways suggests that they may have acquired their plastids by different secondary endosymbiotic events (Michel et al. 2010b), rather than the single endosymbiotic event proposed by the chromalveolate hypothesis. Today, in spite of a growing dataset of genomic data (Table 1), the acquisition of a red algal secondary plastid is still enigmatic. The picture becomes more complex with the discovery of genes that may have been derived from an endosymbiosis involving a green alga in the genomes of the diatoms Thalassiosira and Phaeodactylum that harbour a red algal derived plastid (Moustafa et al. 2009). In both genomes, genes of apparent green algal ancestry outnumber red algal genes by more than three to one. This hypothesis was also mentioned as a possible explanation for the strong green lineage affiliation of nuclear-encoded genes in metagenomes of uncultivated haptophytes (Cuvelier et al. 2010). Genome mosaicity had already been described for a chlorarachniophyte (Archibald et al. 2003). In this green plastid bearing organism, nucleus encoded plastid-targeting proteins were found to be of green algal ancestry but red algal genes, as well as bacterial genes, were also identified (Archibald et al. 2003).

Some dinoflagellates have engaged in additional rounds of presumably ‘tertiary endosymbiosis’ by taking up algae that already possessed secondary plastids [a diatom, a haptophyte or a cryptophyte; (Chesnick et al. 1997; Patron et al. 2006)]. In the case of endosymbiotic diatoms, the nucleus and mitochondria have been retained and the resulting complex organism has been named a dinotom. Analyses of ESTs from the haptophyte-derived plastid of Karlodinium micrum together with libraries of the free-living haptophytes Isochrysis galbana and Pavlova lutheri have suggested that plastid targeted proteins in the K. micrum genome are from a tertiary, as well as from a potential previous secondary, endosymbiont origin (Patron et al. 2006).

In order to achieve a better understanding of endosymbiotic events, one important challenge is to distinguish between HGT and EGT. Indeed, if many studies postulate that genes of cyanobacterial origin in eukaryotic genomes have been acquired by EGT, it is in fact difficult to tell whether any given ‘foreign’ gene is the product of an EGT or was acquired in an endosymbiosis-independent fashion before, during or after plastid acquisition (Archibald 2011). HGT is now a well-established factor in the evolution of eukaryotic genomes (Keeling & Palmer 2008). This mechanism of gene transfer was probably involved in the acquisition of some of the chloroplastic genes present on mini-circles in dinoflagellates (Moszczynski et al. 2012). More genomes and phenotypic characters of organisms belonging to under-sampled groups of algae, together with improved testing procedures for evolutionary scenarios will be needed to fully understand the origins of the mosaic genomes of algae.

Macroevolution of algae: genome structure

Algae have long been known to exhibit a remarkable diversity of genomes sizes, ranging over four orders of magnitude (from 12 Mbp genome in Ostreococcus tauri up to 215 Gbp in some dinoflagellate species; Derelle et al. 2006; Hackett et al. 2004a). Whole genome sequencing has allowed the structures of representative genomes from several algal groups to be analysed in detail, confirming previous observations and revealing novel structural features. Ostreococcus, Micromonas and Bathycoccus are microscopic (less than 2 μm) planktonic green algae of the class Mamiellophyceae that appear to have a cellular machinery that is close to the minimum required for a functional, free-living, photosynthetic cell (with just one mitochondrion and one chloroplast per cell; for example Derelle et al. 2006; Worden et al. 2009). The cellular simplicity of these cells is reflected in their genome sequences. The five Mamiellophyceae genome sequences that have been published to date have all revealed very compact genomes, of between 12 and 22 Mbp. In all cases about two-thirds of the genome is coding sequence with a very small proportion of intron and intergenic DNA. Taken together with the minimal cellular structure, like in the case of Prochlorococcus cyanobacteria (see above), it seems likely that the compact genomes of these algae are the result of a selective ‘streamlining’ process that has produced a cell optimally adapted for a specific ecological niche. This is somewhat at odds with the cosmopolitan distribution of these algae but may suggest that these algae exploit specific planktonic niches that occur ubiquitously across the world's oceans.

In contrast with the exceptionally compact genomes of the Mamiellophyceae, macroalgal genomes tend to be significantly larger, ranging from 100 Mbp to 19 Gbp (Kapraun 2005). The recently published genome of the filamentous brown alga Ectocarpus, for example, has been estimated at 214 Mbp (Cock et al. 2010a). One of the factors that may contribute to this difference in size is the different ecological strategies of the two groups of organisms. The brown algae are all multicellular organisms, possessing several differentiated cell types, and it is probable that the deployment of these body plans requires a more complex genome in terms of gene content. Moreover, with very few exceptions, the brown algae are sedimentary organisms with niches in coastal ecosystems. A more complex genome may also be necessary in order to complete their relatively long life cycles in what are often highly variable and harsh environments. A more diverse gene complement could facilitate adaptation to both short- and long-term fluctuations in environmental conditions. Differences in developmental and life cycle strategies may also impact on genome structure and size via nonselective mechanisms. For example, large multicellular organisms tend to have smaller effective population sizes, and one consequence of this is that selection is less effective. This factor has been proposed to contribute to genome expansions because proliferation of selfish DNA elements and other events such as duplications at different genomic scales are not efficiently selected against in these smaller populations (Lynch & Conery 2003).

While it is likely that ecological niche and life cycle strategy can influence genome structure (and vice versa), these are clearly not the only factors involved. The presence of unicellular species with very large genomes, such as dinoflagellates, within phytoplankton populations is an indication of this. Understanding the relationship between the ecology of an organism and the structure and size of its genome remains an important challenge in ecological genomics.

As mentioned above, high throughput sequencing has allowed the structures of the genomes of a number of algae to be examined in detail. One remarkable feature that has been observed in all of the Mamiellophyte genomes sequenced to date is the presence of at least two chromosomes that exhibit unusual features compared to the rest of the genome. These ‘outlier’ chromosomes generally have lower percentage GC content (Derelle et al. 2006; Palenik et al. 2007; Worden et al. 2009). In Ostreococcus these chromosomes are also rich in transposons, the genes located on them evolve at a fast rate and have smaller introns that are more AT-rich and have divergent splicing signals (Derelle et al. 2006). Despite these differences, phylogenetic analyses indicate that the genes on these chromosomes are ancestral and have not been acquired via a recent HGT. One possible explanation for the structural differences between these chromosomes and the rest of the genome is that they are involved in sexual differentiation. Sex chromosomes are known to exhibit unusual structural features compared to autosomes (Ming & Moore 2007). Gender has not been observed in Ostreococcus strains in culture but evidence for sexual recombination has been detected by the analysis of genome sequence data (Grimsley et al. 2010) and it is possible that the production of new genetic combinations via recombination plays an important role in generating strains adapted to variations in oceanic ecosystems.

The sequenced Micromonas genomes generally contain less transposable elements than the Ostreococcus genomes, but interestingly one of the Micromonas genomes appears to have been invaded by unusual intronic repeat elements that have been called introners (Worden et al. 2009).

The genome of the brown alga Ectocarpus also exhibits a number of unusual structural features (Cock et al. 2010a). The genes tend to be split into many small exons (6.8 per gene on average), separated by large introns and the distances between genetic loci are often very small. One consequence of this organization is that a very large proportion (40%) of the Ectocarpus genome consists of intron sequence. The diatom genomes that have been sequenced so far (i.e. Thalassiosira pseudonana and Phaeodactylum tricornutum; Armbrust et al. 2004; Bowler et al. 2008) have structures very different from that of Ectocarpus, the genes being much less rich in introns (0.8 and 1.3 introns per gene on average, respectively) for example. Hence, it appears that many of the structural features of the Ectocarpus genome were acquired as the brown algal branch diverged from that of the diatoms. It is tempting therefore to link these recently evolved features with ecological features characteristic of the brown algae, such as multicellular development or adaptation to sedentary growth in coastal environments. However, it should be noted that recent analysis of the genome sequence of the multicellular red alga Chondrus crispus has shown that it has a genome structure very different to that of Ectocarpus, introns being very rare and genes being clustered into ‘gene islands’ separated by clusters of repeated elements (Jonas Collén, personal communication). Again, this observation indicates that factors other than an organism's developmental biology and ecology need to be taken into account in order to understand genome structure. In particular, the evolutionary history of each organism has presumably played an important role in determining present day genome structures.

Comparative genome analyses of different phytoplankton species have revealed a higher genetic divergence than expected for related species and highlighted our underestimation of phytoplankton diversity and the probable existence of many cryptic species, especially among picoeukaryotes. For example, O. tauri and O. lucimarinus are morphologically identical but are very divergent, with an average amino acid identity of only 70%, making them, despite their morphological similarity, the most divergent species within the same genus among the sequenced eukaryotes (Palenik et al. 2007).

Dinoflagellate genomes are particularly interesting because of their large sizes, ranging from <1.5 to 245 Gbp (equivalent to between 0.5 and 80 times the size of the human haploid genome). The DNA is heavily methylated, containing a large proportion of hydroxymethyluracile, and the majority of the genome appears to be in a liquid crystal state, suggesting that most of the DNA has a structural role (Wisecaver & Hackett 2011). The organellar genomes are also unusual; both are very compact and the plastid genome is organized as a series of mini-circles, each containing one or very few genes. The large sizes of dinoflagellate genomes has precluded complete genome sequencing but some information about genome composition has been obtained from EST sequencing (Bachvaroff et al. 2004; Hackett et al. 2004b; Patron et al. 2005, 2006; Nosenko & Bhattacharya 2007) and sequencing of samples of genomic DNA (Bachvaroff & Place 2008; McEwan et al. 2008). These analyses indicate that the number of genes in dinoflagellate genomes may be high due to the presence of tandem duplicated arrays of the highly expressed genes (Bachvaroff & Place 2008). However, these tandem duplications do not appear to be the major factor accounting for the large genome sizes, which appear to be principally due to the presence of large amounts of surprisingly complex noncoding DNA (McEwan et al. 2008). Based on EST analyses, it has been estimated that small dinoflagellate genomes (<10 pg, or about 9.700 Mbp, of DNA), such as that of K. veneficum, contain at least 12 000 unique genes (Patron et al. 2006), whereas larger genomes, such as the 200 pg genome of A. tamarense contain about 40 000 (Moustafa et al. 2010). These values, which are high compared to other eukaryotes, may nonetheless be underestimations (Hou & Lin 2009), in part because an EST dataset does not include genes that are not expressed at the time of sampling, and because the number of ESTs generated for these two species was not sufficient to retrieve all expressed genes. Current advances in DNA sequencing technologies are expected to facilitate the analysis of large genomes in the years to come and a complete dinoflagellate genome sequence would represent a major step towards understanding many of the complex features of these organisms.

Speciation in the ocean

One of the major challenges in evolutionary ecology is to understand the complexity of evolutionary and ecological mechanisms leading to reproductive isolation and speciation. To gain insights into mechanisms involved in speciation, it is essential (i) to be able to define species and (ii) to identify the factors that affect population structure at the different spatial and temporal scales at which gene flow may operate. In the case of eukaryotic algae, the discovery of genetic markers linked to selective traits that may inform the mechanism of speciation is very difficult. Cryptic species that display subtle variations in morphology associated with reproductive isolation have been described in all major phylogenetic lineages of eukaryotic marine algae (Saez et al. 2003; Amato et al. 2006; Peters et al. 2010), despite the fact that large population sizes and ocean mixing were expected to facilitate gene flow and homogenize species distinctions. Population genetic surveys using high-resolution markers or whole genome analyses that allow differentiation among individuals should greatly improve our understanding of the mechanisms that cause genetic diversity within and gene flow between populations.

Allopatric vs. sympatric speciation

Allopatric speciation (i.e. speciation that occurs due to extrinsic barriers to gene flow) has been considered the dominant form of speciation. Geographic mechanisms are often argued as being the main drivers of speciation (for discussions, see Butlin et al. 2008) particularly for benthic sessile algae. Algal species in the intertidal/shallow subtidal regions of northern hemisphere temperate coasts frequently include members of the brown algal genus Fucus, which typically consists of three to four zoned species across the intertidal-shallow subtidal gradient. Fucus is a model for speciation in coastal macroalgae. Interestingly, the steep vertical selective gradient spanning the intertidal zone is sufficiently strong to facilitate small-scale local adaptation and consequently maintain morphological and physiological species traits, even in the face of extensive (neutral) gene flow (Zardi et al. 2011). Gene flow across Fucus species boundaries has not been sufficient to disrupt adaptive physiological traits associated with emersion-stress resilience.

Remarkably, recent and rapid speciation was detected in Fucus in a species-poor open marine ecosystem in the Baltic Sea (Pereyra et al. 2009), despite the general belief that in the marine environments, genetic divergence between populations is expected to evolve relatively slowly as recruits and propagules are readily transported by ocean currents (Palumbi 1994).

The role of allopatric speciation in planktonic microbial diversification is more controversial because of the high dispersal potential and large population sizes of these organisms as well as the apparent lack of strong dispersal barriers in the pelagic environment. A growing body of evidence supports the idea that sympatric speciation, where different species arise from a parent species without physical isolation, is a more common process than previously thought and that this is actually the prevalent mode of speciation in planktonic marine microbial organism.

Population genetic structuring of marine planktonic eukaryotic microorganisms is just beginning to be explored, and the first results show that several types of speciation mechanisms may co-exist in the environment. There is evidence that ubiquitous pelagic diatom taxa originated by allopatric speciation, suggesting that isolation by distance allows for genetic divergence, even in organisms that undergo high levels of dispersal such as planktonic diatoms. A recent study showed that gene flow between distant populations of the ubiquitous, bloom-forming diatom species Pseudo-nitzschia pungens var. pungens (one of the three lineages identified within the planktonic superspecies P. pungens; Casteleyn et al. 2008) is limited and follows a strong isolation by distance pattern. Within P. pungens clade I, significant population differentiation exists at macrogeographic scales. This means that dispersal limitation by geographic distance may be an important factor in genetic differentiation, even in high dispersal marine microorganisms (Casteleyn et al. 2010). Isolation by distance is probably not the only driver for gene-flow restriction between populations: the oceans are not as uniform as is commonly believed, but rather are made up of regional water masses that are distinct in their temperature and salinity. Such characteristics may serve to restrict gene flow between populations by preventing the long distance movement of plankton thereby causing population subdivision, a precursor to allopatry. Based on the constrained geographic distributions of several cryptic species, isolation by physical and/or ecological barriers could be an important driver of allopatric processes, even in highly dispersed marine microbes. Interestingly, differentiation of subpopulations among marine planktonic dinoflagellates during a bloom period has been identified, showing that selection can act on the timescale of weeks to significantly alter the genetic structure of the community and potentially drive speciation (Lily et al. 2007).

Mechanisms of speciation

Understanding the mode of speciation (allopatry vs. sympatry) is important because this has implications in terms of both the potential for gene flow and the selection. Another important aspect, however, is the molecular basis of speciation, and genomic and genetic approaches can potentially provide valuable information to address this question (for comprehensive reviews see Safran & Nosil 2012; Schluter 2009). Two types of mechanisms can be distinguished: ‘ecological speciation’, a process in which divergent natural selection drives reproductive isolation between taxa (Mayr 1942) and ‘mutation-order speciation’, a process in which reproductive isolation is due to fixation of advantageous mutations in separate populations that are under uniform selection pressure (Mani & Clarke 1990). Few studies have investigated mechanisms of speciation in algae, but early insights into genomic variations among strains of the same species suggest that genomic modifications (such as polyploidization, gene losses/acquisition and high mutation rates) potentially leading to speciation can occur remarkably rapidly (Palenik et al. 2007; Koester et al. 2010).

Polyploidization accounts for 2–4% of speciation events in flowering plants and up to 7% of speciation events in ferns (Otto & Whitton 2000), and has been identified as a potential driver for speciation in some diatoms. Within the cosmopolitan morphological species Dytilum brightwellii, two main populations with identical 18S rDNA sequences can be distinguished. Population 1 is found everywhere along the US coasts at temperate latitudes, while population 2 appears to be restricted to the Northwest Pacific coast of the USA. These two populations co-occur within the Puget Sound estuary, although their peak abundances differ depending on local conditions. The observed two-fold difference in genome size between the D. brightwellii populations suggests that whole genome duplication occurred within cells of population 1 ultimately giving rise to population 2 cells. The apparent regional localization of population 2 is consistent with a recent divergence between the populations, which are likely cryptic species (Koester et al. 2010). Genome size variation is known to occur in other diatom genera and duplication may be an active and important mechanism of genetic and physiological diversification in diatoms. In the future, polyploidization may also be an important factor in the adaptation of diatoms to changing ocean conditions.

Increased mutation rates and/or relaxed constraints on portions of the genomes, as well as gene losses and gene acquisition through horizontal transfers were also identified as potential factors in speciation of microalgae. The recent analysis of two complete genomes belonging to highly divergent species of the mamiellophyte genus Ostreococcus (O. tauri; Derelle et al. 2006) and O. lucimarinus; Palenik et al. 2007) provided insights into how they diverged. The two atypical chromosomes potentially involved in sexual differentiation (see above) are present in both species. They show lower levels of synteny and have different base composition and gene densities than other chromosomes, and their genes appear to be fast evolving and encode an unusually high proportion of membrane-localized proteins. The increased mutation rate (or relaxed constraint) on these genes could explain the high proportion of species-specific genes. Genomic and proteomic studies also showed that several species-specific functions were involved in various cellular processes, which could be critical for environmental adaptation (Jancek et al. 2008). For example, key genes involved in iron metabolism found in the O. tauri genome are absent from the O. lucimarinus genome (Palenik et al. 2007; Jancek et al. 2008). This observation suggests that nutrient availability may be an important selective force in the evolution of phytoplankton species. Similar comparative genomic studies were conducted by Worden et al. 2009 on two strains of Micromonas, another picoplanktonic mamiellophyte. Many genes that occur in one Micromonas genome, but not the other, are very similar to those found in organisms as evolutionarily distant as animals, fungi and bacteria. Such genes could be the product of HGTs.

These examples show that, much like in the marine cyanobacteria Prochlorococcus (Kettler et al. 2007) and Synechococcus (Dufresne et al. 2008), mechanisms of speciation involving ‘genomic islands’ with increased mutation rates, gene acquisition or loss processes, as well as higher rates of HGT may, at least in part, explain the diversification observed in eukaryotic microbial species. Such mechanisms could prevail in eukaryotic taxa that exhibit infrequent or no sexual reproduction. This is probably the case of Ostreococcus and Micromonas and other mamiellophytes for which sexual stages have rarely been reported, although genes predicted to encode and express meiosis-specific functions have been found in their genomes (Derelle et al. 2006; Worden et al. 2009). Understanding the importance of strain-specific differences in gene content and genome structure within species (or ecotypes) will probably contribute significantly to our understanding of micro-evolutionary patterns and mechanisms.

Function, acclimation and adaptation

Gene content

Not surprisingly, most of the effort aimed at understanding the relationships between algal genome sequences and the ecology of the corresponding species have concentrated on protein coding genes. Analyses of the genomes of planktonic algae, sometimes coupled with experimental work, have provided clues as to how these organisms cope with fluctuating light intensities in their marine environment. The genome of the diatom P. tricornutum, for example, contains a light-induced light-harvesting protein (LHCSR) designated LHCX1 that has been shown to be involved in modulating nonphotochemical quenching (NPQ) capacity not only under stress conditions [as in the freshwater alga Chlamydomonas; (Peers et al. 2009], but also during the normal diurnal light cycle (Bailleul et al. 2010). Moreover, genome analyses of both mamiellophytes and diatoms have suggested that these organisms may be capable of carrying out C4 photosynthesis (Worden et al. 2009). Carbonic anhydrase genes have also been found in several marine algal genomes, suggesting an alternative carbon concentrating mechanism (Kroth et al. 2008). The diverse groups of marine algae store carbon in different forms, starch for the members of the green lineage, laminarin or chrysolaminarin respectively for two stramenopile groups, the brown algae and diatoms (Table 2). Genome analysis has provided some insights into the genes involved in the production of these storage molecules and the evolutionary history of these genes (Michel et al. 2010b).

A common feature of the marine algal genomes analysed so far is the presence of systems allowing the uptake and assimilation of several different nitrogen sources, including urea (Armbrust et al. 2004; Derelle et al. 2006; Cock et al. 2010a). The ability of an alga to recover nutrients from the surrounding environment is probably an important factor in complex planktonic ecosystems, which tend to be populated by multiple interacting species. Nutrient assimilation is likely to be particularly important in oligotrophic waters where organisms are competing for scarce resources, and there is some evidence that isolates from this type of environment possess a more diversified range of transporter molecules (Worden et al. 2009), as is also the case for the cyanobacterium Prochlorococcus (Martiny et al. 2006, 2009; Coleman & Chisholm 2010).

Extracellular molecules also play an important role in the interactions of marine algae with their environment. Diatoms have a silica exoskeleton, the frustule, which is thought to protect them from predation. Genome analysis has identified many of the components of the molecular machinery that makes up this structure (Armbrust et al. 2004; Bowler et al. 2008). The cell walls of brown algae contain unusual, sulphated carbohydrates, such as alginates and fucans. These molecules probably also serve a defence role but, in addition, they are important structural components, allowing the construction of a robust multicellular body plan that can resist the physical stresses of the coastal environment.

Large-scale phylogenetic analysis of the genes in Phaeodactylum indicated that a large number had been acquired from bacterial genomes via HGT, a key adaptive process for marine algae to changes in their environment (Bowler et al. 2008). Similarly, in brown algae, HGT may have enabled the emergence of key metabolic pathways, for example for carbon storage or cell wall biosynthesis (Michel et al. 2010a,b).

Analysis of individual genome sequences can provide important information about how an organism functions within its ecosystem, but this type of analysis can be significantly enhanced by the use of comparative genomic and phylogenomic approaches, which place genome information in an evolutionary context. For example, a comparison of the complete protein-coding gene sets from the two available Ostreococcus genome sequences was carried out in an attempt to identify genes involved in adaptation to different ecological niches (Jancek et al. 2008). This study showed that genes encoding membrane or secreted proteins tended to be evolving more rapidly than genes encoding proteins in other cellular compartments. This observation highlights the importance of the interface between the algal cell and its environment in adaptation and speciation processes (see above).

A more global comparative approach was used by Martens et al. (2008), who compared the two published diatom genomes with ten additional genomes from the Chromalveolate group and used Dollo logic to model genome-wide gain and loss of gene families during the evolution of this group. Gene family expansions were also analysed. This analysis allowed the identification of 1,782 gene families that were predicted to have evolved as the diatom lineage diverged from that of the oomycetes. The study also identified a number of gene families that had significantly expanded in diatom genomes compared to other chromalveolates. Both sets of genes are presumably enriched in genes with specific roles in adapting diatoms to their planktonic lifestyles.

Small RNA regulation

One important advance in biology over the past two decades has been the demonstration that noncoding RNAs, and particularly small RNA molecules, play important roles in the regulation of cellular function. The work that led to this discovery was carried out using essentially green plant and animal models and, until recently, very little information has been available about the roles of noncoding RNAs in organisms outside of these two major eukaryotic groups. Over the last few years however, analysis of genome and transcriptome sequence data has led to the identification of genes involved in small RNA signalling, such as Dicer, Argonaute and RNA-dependent RNA polymerase (RdRP), in almost all of the major eukaryotic lineages, indicating that these proteins are part of an ancient regulatory system (Cerutti et al. 2011). Often, however, the Dicer-like and Argonaute-like sequences are highly diverged compared to plant and animal sequences, suggesting that small RNA systems may have novel features in these groups.

In addition to searches for Dicer and Argonaute genes, high-throughput sequencing approaches have been applied in several organisms, including the marine algae Ectocarpus siliculosus, Phaeodactylum tricornutum and Thalassiosira pseudonana, in order to identify and characterize the small RNA molecules themselves (Cock et al. 2010a; Huang et al. 2011a; Norden-Krichmar et al. 2011) and analysis of the small RNA data has allowed the identification of putative miRNAs. In Ectocarpus, for example, an analysis based on stringent selection rules (Meyers et al. 2008) identified 26 loci that were strongly predicted to encode miRNAs (Cock et al. 2010a). miRNA candidates were also subsequently identified in P. tricornutum and T. pseudonana (Huang et al. 2011a; Norden-Krichmar et al. 2011). The miRNAs identified in these stramenopile algae appear to have evolved independently of miRNAs in other eukaryotic groups as they do not share any significant similarity at the sequence level (Cock et al. 2010a; Huang et al. 2011a; Norden-Krichmar et al. 2011).

Although it is now becoming clear that small RNA regulatory systems are employed across the eukaryotic tree, the precise functions of these systems in groups outside the green plants, animals and fungi are still not clearly understood. One possible function may be to protect the genome against invasion by repeated elements. Small RNAs mapped preferentially to transposable elements in the Ectocarpus genome, suggesting that they may be involved in silencing these elements (Cock et al. 2010a). Silencing of transposable elements by small RNAs usually involves DNA methylation (Malone & Hannon 2009), but an alternative mechanism is likely to operate in Ectocarpus, which appears to lack a DNA methylation system (Cock et al. 2010a).

Prediction of miRNA target genes, based on partial or complete matches of miRNA sequences to other regions of the genome, can also provide some clues about potential function. In Ectocarpus, for example, three quarters of the predicted miRNA target genes encode proteins that contain leucine rich repeat (LRR) domains (including nine members of a large ROCO GTPase family) and are predicted to have evolved as the brown algal and diatom lineages diverged (Cock et al. 2010a). LRR proteins are known to play a role in recognition and signalling during immune responses in both plants and animals (Meyers et al. 2005; Kumar et al. 2011), and in plants miRNAs regulate part of this process (Zhai et al. 2011). The putative Ectocarpus miRNAs and their LRR targets may have similar roles (Zambounis et al. 2012).

Genomic approaches will continue to provide information about the possible roles of small RNAs in marine algae, but it will also be necessary to develop experimental approaches to test proposed functions directly. In this respect, two studies in which RNA interference have been used to test gene function in marine algae represent important advances in this area. In the first, injection of double stranded RNA was used to knock down the function of an aureochrome blue light receptor in the yellow-green alga Vaucheria frigida (Takahashi et al. 2007), whereas the second used a transgene strategy to direct the RNAi machinery against phytochrome and cryptochrome transcripts in the diatom P. tricornutum (De Riso et al. 2009). The latter study provided some mechanistic insights into the RNAi system in P. tricornutum, indicating for example that the system may affect both transcript stability and translation. The study also provided evidence that RNAi silencing of a transgene was associated with methylation of the transgene DNA. Moreover, this methylation spread to parts of the gene outside the region targeted by the RNAi constructs (De Riso et al. 2009).

Although small RNA regulatory systems based on Dicer and Argonaute protein appear to be a common feature across the major eukaryotic groups, there is strong evidence that a number of species have lost the system, including marine algae such as Ostreococcus spp., Micromonas sp. strain CCMP1545 and probably also Aureococcus anophagefferens. Analysis of the complete genome sequences of these organisms failed to detect either Dicer-like genes or the more strongly conserved Argonaute-like genes (Derelle et al. 2006; Palenik et al. 2007; Cerutti et al. 2011; Gobler et al. 2011). This suggests that Dicer/Argonaute-based small RNA regulatory systems are not absolutely essential for cell function. It is likely that Dicer/Argonaute-based systems were present in the ancestors of these organisms but were subsequently lost, because these algae belong to diverse algal lineages.

Interactions with the environment

The interaction between organisms and their surrounding environment is crucial in evolution: environmental stress, in particular, may cause extinction when organisms fail to adapt to the constantly varying abiotic and biotic challenges. However, these challenges may also act as forces that promote evolutionary change and the formation of new species adapted to new environments (reviewed in Nevo 2011). We will treat separately sessile algae and plankton in the next sub-sections, because environmental constraints on these two types of organisms are fundamentally different.

Sessile algae in the stressful coastal environment

Many coastal habitats represent a challenging environment, with steep abiotic gradients and drastic environmental fluctuations over narrow spatial and temporal scales. In particular, MPOs living in intertidal or shallow subtidal habitats are regularly exposed to strong water motion and extreme fluctuations in temperature, pH, irradiance, salinity and/or nutrient availability. In addition to these ‘natural’ sources of stress, the coastal ecosystems are subjected to a broad range of anthropogenic influences, including both the indirect effects of climate change and the direct effects of pollution.

Brown macroalgae are commonly the most important organisms of the intertidal zone in terms of biomass and play a crucial role in ecosystem functioning by providing habitats for a wide range of other species. As sessile organisms living in this harsh environment, they need to be highly tolerant to the broad range of stressors described above. On many intertidal shores, several coexisting species have overlapping vertical ranges, each range being determined by factors such as the ability to tolerate emersion during low tide (for a review, see Davison & Pearson 1996). The intertidal zone therefore provides an interesting environment for exploring questions concerning response, acclimation and adaptation to stress.

While the brown macroalga Fucus has been used for many years as a cell biology model to look at response and resilience to stress (e.g. Coelho et al. 2002; Nielsen et al. 2003a), it is only recently that genomic tools are being employed to understand the mechanisms underlying the capacity of this genus to persist in the intertidal zone. The genus is quickly becoming a model system for studies of local adaptation, ecological divergence and speciation. Fucus appears to have undergone a recent radiation (Pereyra et al. 2009) leading to the establishment of several closely related species, each adapted to a specific ecological niche. Tolerance to abiotic stress, particularly desiccation seems to have played an important role in this diversification. The existence of physiological differences between Fucus species provides a means to analyse the regulatory and evolutionary basis of desiccation tolerance in this genus. The genome of Fucus has not been sequenced, but EST libraries have provided both gene sequences and a means to develop molecular markers that can be used to look for evidence of selection using genome scans (Storz 2005). Moreover, comparative studies based on EST collections have provided insights into the changes in physiology and cellular metabolism associated with acclimation and genetic adaptation to stress. Responses to changes in salinity and temperature have recently been investigated in Fucus spp. Six genes, including a putative mannitol transporter, showed possible signatures of selection. Mannitol is a key osmotic regulator in saline environments (Iwamoto & Shiraiwa 2005 and references therein) and there may therefore be ongoing selection on at least some part of the mannitol pathway in this brown alga (Coyer et al. 2011).

Strong selection on physiological traits across the intertidal exposure gradients has been shown to maintain the distinct genetic and morphological Fucus taxa within their preferred vertical distribution ranges (Zardi et al. 2011). On a larger geographical scale, Pearson et al. 2009 showed that fitness and adaptive potential are reduced at distributional range edges between central and southern (rear) edge populations of F. serratus. Edge populations of F. serratus were less resilient to desiccation and heat-shock than central populations, and abundances of heat-shock gene transcripts were higher at the same temperatures, suggesting that these populations were more stressed (Pearson et al. 2009). Modifications in the Earth climate may therefore threaten small, fragmented marginal populations because they are less fit and have a lower adaptive capacity relative to larger central populations.

The lack of a complete genome sequence for a Fucus species is a limiting factor for the implementation of genome-wide approaches to study the ecology of this genus. An alternative system, for which a whole genome sequence is available, is the model brown alga Ectocarpus. Moreover, in addition to the genome sequence, a number of genomic tools are available for Ectocarpus, offering the means to produce comprehensive transcriptomic, proteomic and metabolomic datasets (Gravot et al. 2010; Ritter et al. 2010; Dittami et al. 2011). For example, large scale reprogramming of the transcriptome of Ectocarpus was observed in response to several different abiotic stresses using a microarray approach (Dittami et al. 2009). This study identified several new genes and pathways with potential functions in the stress response. Future work will be aimed at extending this type of analysis to a broad range of strains isolated from populations adapted to different environmental conditions. The Ectocarpus genus is particularly suitable for this sort of approach because a large number of strains are available from locations covering a wide range of environments across the globe (Coelho et al. 2012). Preliminary analyses indicate that one clade within the genus Ectocarpus exhibits particularly high tolerance to (abiotic) stresses and that strains that belong to this clade are consistently found in harsh environments across the globe (Akira Peters, personal communication). Comparative genome analyses would be a useful approach to correlate genomic features with stress resistance.

The evidence for global climate change has now become unequivocal, with recordings of significant increases in global average air and ocean temperatures, widespread melting of ice caps and reports of rising average sea level. Climate change is expected to alter patterns of natural selection in many species, and ecological genetics tools will provide a means to understand the factors that influence the persistence of populations in response to climate change.

Coastal environments are also particularly susceptible to direct anthropogenic influences. The release of heavy metals, such as copper, which is produced by a wide range of human activities, remains a major threat for marine ecosystems, given its impact on benthic flora and fauna assemblages (see e.g. Contreras et al. 2009). Copper is essential for all forms of life, acting as cofactor for many enzymatic systems and participating in crucial physiological processes including photosynthesis and respiration. However, this metal is extremely toxic at high concentrations because it catalyses the synthesis of the highly reactive hydroxyl radical, causing oxidative stress. Some marine areas are chronically polluted by heavy metal containing wastes such as mine waste. The stress caused by this pollution can act as a selective force leading, in some cases, to the generation of genetically adapted tolerant strains (Pauwels et al. 2008). Recently, local adaptation of two strains of the brown alga Ectocarpus has been studied using a proteomic approach. Comparison between copper tolerant and nontolerant strains of Ectocarpus from a copper polluted rocky beach in northern Chile and an uncontaminated coast in southern Peru allowed the identification of a number of proteins with potential roles in copper resistance, including proteins with roles in glutathione metabolism and heat shock protein accumulation (Ritter et al. 2010). This reinforced previous cellular and physiological studies using Fucus, which indicated significant and inherited differences in Cu2+ tolerance during the early developmental stages of Fucus derived from adults located in either Cu2+-contaminated or uncontaminated locations (Nielsen et al. 2003a,b). These studies, focused primarily on the cell and physiological mechanisms involved in the response to copper in different populations, have highlighted the importance of multi-level approaches to understanding adaptation. In the past, acclimation and plasticity were studied by molecular biologists and physiologists in the absence of an evolutionary and genomic context. Those in the field of ecological genomics studied adaptation and selection in different environments, but lacked molecular and physiological approaches to understand how the organisms are responding plastically. Merging of these two approaches will certainly help to understand and predict the evolutionary response of organisms to changes in their environment.

Adaptation of microalgae to life in the pelagic system: diatoms as a paradigm

About half of global primary biomass production occurs in the oceans, in the thin upper layer reached by sun (i.e. the photic zone). Most of this biomass is produced by planktonic algae and cyanobacteria, which typically live in vast populations, form highly interdependent communities and exhibit fast turnover rates (i.e. between 1 and 6 days depending on species, compared with years for land plants). These features make MPOs highly reactive to environmental and climatic changes.

Diatoms are particularly successful in upwelling environments, where they are able to rapidly respond to nitrate influx and outcompete other marine phytoplankton when this nutrient and silicate are both abundant (Estrada & Blasco 1979). Diatoms have an evolutionary history that is distinct from plants and green algae and that has brought together a unique combination of genes. Their ability to adapt and survive in a highly fluctuating environment appears to be a key feature for their ecological success, yet the underlying molecular mechanisms are only recently beginning to be unravelled. Recent analyses of complete diatom genomes and associated post-genomic studies are now providing clues to the success of this group (see e.g. Armbrust et al. 2004; Bowler et al. 2008; Nunn et al. 2009), in particular concerning the adaptive responses of diatoms to dynamic environmental conditions. For instance, the proteome of the diatom Thalassiosira pseudonana (CCMP 1335) at the onset of nitrogen starvation was compared with that of nitrogen-replete cells with the aim of gaining insight into the global regulation of metabolic pathways in response to nitrogen starvation (Hockin et al. 2012). Interestingly, the response of diatom carbon metabolism to nitrogen starvation was found to be different from that of other photosynthetic eukaryotes and showed closer resemblance to the response of cyanobacteria. The relationship between central carbon metabolism and catabolic processes in the cell is now recognized to have been an important factor in the adaptation of diatoms to the ocean environment, where nitrogen availability is highly dynamic.

As mentioned above for cyanobacteria, natural populations of diatoms also face a variety of nutrient stresses. Specifically, iron limitation is a major factor controlling phytoplankton and cyanobacteria growth in the wide, perennially high nutrient low chlorophyll (HNLC) regions. Interestingly, marine diatoms species have different levels of tolerance to iron limitation (Kustka et al. 2007). The recently completed genome sequences of two diatoms (Table 1) have emphasized some important differences which may account for their respective thresholds for iron limitation. A combination of nontargeted transcriptomic and metabolomic approaches revealed the metabolic reconfiguration strategy used by P. tricornutum to acclimate to low iron levels, which includes downregulation of processes carried out by components rich in iron (e.g. photosynthesis, mitochondrial electron transport and nitrate assimilation; Allen et al. 2008).

A distinctive feature of diatoms is their requirement for silicon, which they use to build their cell walls. Despite the availability of whole genome sequences, the molecular basis for the elaborate species-specific diatom silica structures has remained obscure, largely because of the lack of homology to proteins involved in silicon manipulation in other organisms. Whole-genome expression profiling has identified gene products involved in silica processing in diatoms and, unexpectedly, revealed shared mechanisms between iron and silica pathways (Mock et al. 2008).

A functional-genomics approach in the marine diatom Phaeodactylum tricornutum was used to characterize a novel protein belonging to the widely conserved YqeH subfamily of GTP-binding proteins thought to play a role in ribosome biogenesis, sporulation and nitric oxide (NO) generation (Vardi et al. 2008). The conservation of proteins containing YqeH domains across kingdoms and their prevalence in oceanic samples is an example of how studies in distantly related organisms can offer a phylogenetic perspective going beyond the classical model groups.

Diatoms are providing a comparative basis for mechanisms that exist in more classical model systems, but as a consequence of their evolutionary history they are also source of novelty. Genome-wide analysis of cell cycle genes has revealed that one of the most remarkable gene family expansions in both diatom genomes available so far concerns cyclins, which are key regulators of eukaryotic cell division. The discovery of new and highly conserved cell cycle regulators in diatoms suggests the evolution of distinctive mechanisms controlling cell division in these organisms, which probably contribute to their ability to adapt and survive in their constantly changing environments (Huysman et al. 2010).

In addition to their enormous genetic diversity (Alverson 2008), some species of diatoms such as P. tricornutum show a remarkable phenotypic plasticity, including the ability to form chains or colonies and to present several shapes and sizes. The reasons behind this behaviour have remained elusive. Recent work combining global transcriptomics and cellular imaging has provided insights into the relevance of these phenomena to the organism's acclimatization to changing environmental conditions. Morphotype changes were shown to be regulated by environmental conditions and a trend towards increased oval cell abundance was found in response to stress (De Martino et al. 2011).

Ecological and evolutionary significance of complex life cycles in the ocean: clues from genomic approaches

A wide range of life cycles are found among eukaryotes and a considerable amount of theoretical work has gone into finding the reasons for this and into modelling the potential advantages of each type of life cycle [reviewed in (Coelho et al. 2007)]. Our knowledge about life cycles and the occurrence of sexual reproduction in free-living planktonic eukaryotes is mainly limited to certain diploid groups, such as diatoms, in which meiosis is naturally induced either when the size of their frustule, which decreases at each round of mitotic cell division, reaches a critically small size (Chepurnov et al. 2008) or when cells are in the presence of ciliates (Coleman 2005; Snoke et al. 2006; Catania et al. 2009). In certain haptophytes, life cycles are characterized by independent haploid and diploid phases displaying radically different morphologies (Billard 1994). In the green microalgae, complete life cycles have been described for only a few species. In contrast, the life cycles of multicellular brown and red algae are relatively well studied. Among brown algae, haploid-diploid life cycles with alternation between a gametophyte and a sporophyte generation are very common, and these can involve both isomorphic and heteromorphic generations, with gametophytes and sporophytes often being very different morphologically in the latter (in extreme cases such as Laminaria, the gametophyte generation is microscopic while the sporophyte can attain several meters). Importantly, many macroalgae gametophyte and sporophyte are independent free-living organisms, probably occupying different ecological niches. In the Plantae, haploid life cycles are common among aquatic taxa, whereas diploidy tends to be correlated with the transition to drier terrestrial environments and an increase in developmental complexity. Moreover, there is a general tendency for developmental complexity to be associated with diploid life cycles across the eukaryotic tree. The reasons for the evolutionary stability of haploid-diploid life cycles are still not fully understood. One hypothesis is that such cycles allow a species to exploit two different ecological niches, each of the generations of the life cycle being adapted to a particular habitat. Alternatively, haploid-diploid life cycles may be a means to increase resistance to pathogens if the two generations exhibit different disease resistance characteristics (Frada et al. 2008). The sequential development of two multicellular generations for each complete life cycle may also be beneficial by reducing the cost of sex. A considerable number of hypotheses have been put forward to explain the range of life cycles found in nature but, in contrast, relatively few experimental studies have been conducted to test these hypotheses (Coelho et al. 2007). Moreover, in most organisms, the life cycle remains poorly characterized, particularly under field conditions.

The brown algae represent an ideal group both for understanding the adaptive benefits of particular life cycle structures and for investigating the general role played by life cycle modifications in evolutionary processes. Recent genomic studies using Ectocarpus are starting to provide some clues about fundamental questions such as: Are the sporophyte and gametophyte generations of equal importance in terms of time of persistence and prevalence in the field? Do the sporophyte and the gametophyte exploit different ecological niches? Global gene expression analysis using microarrays in gametophytes vs. sporophytes revealed a set of genes specifically expressed in each generation. Differential expression of metabolic pathways found in sporophytes vs. gametophytes may be explained by diverse ecological niches and reproduction strategies (Coelho et al. 2011).

A high throughput transcriptomic approach was used in the coccolithophore E. huxleyi to unravel the molecular basis of the morphological and functional differences exhibited by its haploid-diploid life cycle (von Dassow et al. 2009). The E. huxleyi life cycle involves alternation between calcified, nonmotile, diploid (2N) cells and noncalcified, motile, haploid (1N) cells, with both phases being capable of unlimited asexual cell division. Recently, E. huxleyi 1N cells have been shown to be resistant to the EhV viruses that are lethal to 2N cells [(Frada et al. 2008); see also below]. This suggests that haploid cells might have a crucial role in the long-term maintenance of E. huxleyi populations by functioning as a survival stage during the ‘boom and bust’ successions of 2N blooms. The pronounced differences between haploid and diploid stages suggest a large difference in gene expression between the two sexual stages. Comparison of their transcriptomes identified phase-specific expression patterns of genes involved in important cellular processes known to be specific to one phase or the other (e.g. motility for 1N cells and calcification for 2N cells; von Dassow et al. 2009).

Interaction with other organisms

The biodiversity of marine pathogens and parasites and their impact on the physiology of their hosts have been relatively well studied [reviewed by (Gachon et al. 2010)], and modern ‘omic’ techniques are now starting to unravel the molecular basis of these relationships. When plants are attacked by pathogens or herbivores, they are able to induce a set of defence mechanisms that allow them to restrict pathogen growth. The genomic basis of the interaction between plants and herbivores is intensively studied in land plants (see e.g. Porth et al. 2011), but this process is largely unstudied in marine organisms. An ecological and functional genomics approach has been used to gain insights into the underlying mechanisms of defence to grazers in Fucus vesiculosus (Weinberger et al., unpublished). A cDNA microarray was developed from libraries based on grazed and nongrazed F. vesiculosus with the aim of exploring the genetic basis of inducible anti-herbivore defence in this ecologically important brown alga. Both chemical elicitors (oligo-alginate and methyl-jasmonate elicitors) and isopod grazers altered gene expression, although, remarkably, grazing provoked relatively little change in gene expression.

The extensive application of metagenomic and metatranscriptomic sequencing techniques has provided an abundance of novel genetic information relating to marine viruses (reviewed in Breitbart et al. 2007). However, understanding the evolutionary impact of the interaction between viruses and their hosts remains a challenge. Recent genome sequences of giant viruses suggest a long evolutionary history of Mamiellophyceae/virus interactions with several occurrences of HGT (Moreau et al. 2010). Emiliania huxleyi is the most abundant and ubiquitous coccolithophore in the oceans and forms huge seasonal blooms (Brown & Yoder 1994). Specific viruses increase in abundance during these E. huxleyi blooms and are closely linked to their sudden termination. Manipulation of the host's lipid metabolism plays a fundamental role during this host–virus interaction (Pagarete et al. 2011).

Eurychasma dicksonii is an obligate oomycete pathogen of marine brown algae. Analysis of transcripts from Ectocarpus infected with Eurychasma dicksonii identified a high abundance of transcripts corresponding to Ectocarpus transposable elements. This observation indicated an environmental, most probably stress-dependent, transcriptional regulation of transposable elements in the host species (Grenville-Briggs et al. 2011). The genome of Ectocarpus has also been used to identify candidates involved in the defence against pathogens. Homologues of genes involved in defence in other organisms have been searched for and the evolutionary pressure acting on these genes was assessed. Interestingly, candidate defence genes were found to exhibit high birth and death rates, while diversifying selection was found to act on their key pathogen recognition specificity domains (Zambounis et al. 2012).


MPOs represent a vastly under-explored compartment of the biosphere, despite their tremendous importance in marine ecosystems functioning, notably due to their position at the base of food chains. As oxygenic phototrophs, they have also a key role in gas exchange with the atmosphere and in carbon sequestration, two properties of particular interest in the context of global warming and its most probable relationship with the ongoing exponential increase in atmospheric carbon dioxide, a potent greenhouse gas. Understanding the genetic bases of the ability of organisms to adapt to a changing environment is a major challenge and MPOs constitute ideal organisms to tackle this question, because the ocean is the largest ecosystem on Earth and because they suffer from environmental constraints profoundly different from those of land ecosystems. By considerably easing the acquisition of genomes and transcriptomes from both cultures and natural populations of MPO, the advent of NGS technologies has revolutionized the fields of marine microbiology and phycology and will certainly continue to do so in forthcoming years, as the price per sequenced base continues to decrease. Indeed, marine biologists can now obtain extensive genetic information on MPO groups that are selected for their ecological and/or phylogenetic interest, rather than for their sole potential interest in biotechnology or human/animal health or because they are model organisms. It is of course not excluded that these ecologically and/or phylogenetically relevant organisms prove a posteriori able to synthesize molecules of industrial/medical interest, but this is no longer a prerequisite for scientists to launch massive sequencing projects, as was often the case in the past.

For MPO with small genomes, like the picocyanobacteria Prochlorococcus and Synechococcus, NGS technology has allowed a large number of genomes to be sequenced (public databases currently contain at least 14 genomes for each genus; Table 1), covering a large part of the diversity within these groups at the clade level (Kettler et al. 2007; Dufresne et al. 2008; Scanlan et al. 2009). Nevertheless, addition of further genome sequences (almost an order of magnitude more genomes of marine picocyanobacteria are ongoing) will provide the means to address such fundamental questions as defining a ‘species concept’ based on whole genome content, rather than just a few markers, or exploring the extent of gene variability within a genetically homogeneous group of organisms (population genomics). Comparison of a large number of genomes should also allow refined evolutionary analyses and exploration of the deep nodes of phylogenetic radiations, which are often poorly resolved. Finally, one can expect a much better understanding of the ecology of the groups, by unravelling the underlying genetic bases of their occupation of a given niche in the environment.

Large genomes still represent a considerable challenge, despite the advent of NGS technologies, but current progress in the development of new sequencing methodologies and improved genome assembly strategies is expected to improve this situation in the coming years.

NGS has also allowed marine biologists to cross the barrier of noncultivability. Thanks to whole genome amplification using low error polymerases (Binga et al. 2008; Rodrigue et al. 2009), it is now possible to amplify genomes from single, field-collected cells or just a small number of cells selected, for example, by flow cytometric cell sorting (Stepanauskas & Sieracki 2007; Lepère et al. 2011). Not only this can help understand why organisms are uncultivable and help bring them into culture, but it also allows genetic information to be obtained even when the organism is infrequently encountered or of low abundance. Such organisms may harbour key metabolic properties for the ecosystem.

Last but not least, NGS approaches will, in the foreseeable future, allow near-exhaustive surveys of genomic diversity to be carried out, not only for MPOs but also for their predators, such as viruses, protists and zooplankton (Rusch et al. 2007; Karsenti et al. 2011). It will also be possible to analyse the dynamics of microbial communities at the genomic level and different temporal and spatial scales. One remarkable illustration of this concept is a recent study of the antagonistic coevolution over a 6-month period between the ecologically important marine cyanobacterium Synechococcus and a lytic phage, during which the two organisms underwent multiple coevolutionary cycles, leading to the diversification of both the host and its virus (Marston et al. 2012). The integration of biogeography, field experimentation and long-term life history studies with genomics tools will surely advance our understanding of adaptation of MPOs to their environment.


SMC, SA, JMC are supported by the Centre National de la Recherche Scientifique, the Université Pierre et Marie Curie, Groupement d'Interet Scientifique Génomique Marine, the Interreg program France (Channel)-England (project Marinexus), Agence Nationale de la Recherche (Project Bi-cycle) and the Emergence programme (UPMC). FP was funded by the French ANR program PELICAN (PCS-09-GENM-030) and the European Union program MicroB3 (EU-contract-287589) and NS by the Interreg project Marinexus.

S.M.C. is interested in brown algae reproductive biology, particulary on the genetic mechanisms involved in the regulation of haploid-diploid life cycles and the evolution of sex determination. S.A. is interested in population genetics and evolutionary genomics with regard to population demography and currently works on the evolution of sex chromosomes in brown algea. N.S. has a broad interest in the ecology, taxonomy and evolution of marine algae. J.M.C. is interested in macroalgal developmental biology, particularly the genetic mechanisms regulating life cycle progression, and was coordinator of the Ectocarpus genome project. F.P is senior scientist working at CNRS and interested in the ecology, photophysiology and evolutionary history of marine picocyanobacteria.