Metagenomic investigation of the geologically unique Hellenic Volcanic Arc reveals a distinctive ecosystem with unexpected physiology

Hydrothermal vents represent a deep, hot, aphotic biosphere where chemosynthetic primary producers, fuelled by chemicals from Earth's subsurface, form the basis of life. In this study, we examined microbial mats from two distinct volcanic sites within the Hellenic Volcanic Arc (HVA). The HVA is geologically and ecologically unique, with reported emissions of CO2 -saturated fluids at temperatures up to 220°C and a notable absence of macrofauna. Metagenomic data reveals highly complex prokaryotic communities composed of chemolithoautotrophs, some methanotrophs, and to our surprise, heterotrophs capable of anaerobic degradation of aromatic hydrocarbons. Our data suggest that aromatic hydrocarbons may indeed be a significant source of carbon in these sites, and instigate additional research into the nature and origin of these compounds in the HVA. Novel physiology was assigned to several uncultured prokaryotic lineages; most notably, a SAR406 representative is attributed with a role in anaerobic hydrocarbon degradation. This dataset, the largest to date from submarine volcanic ecosystems, constitutes a significant resource of novel genes and pathways with potential biotechnological applications.


Introduction
Hydrothermal vents (HVs) constitute biogeochemical environments characterized by emission of reactive gases, dissolved elements and sharp thermal and chemical gradients (Schrenk et al., 2003;Kristall et al., 2006). Microbes have adapted to survive these extreme conditions, often withstanding extremely high temperatures (e.g. in chimneys reaching maximum temperature bursts of 464°C), salinity, heavy metals and acidic pH (Brock and Freeze, 1969;Pakchung et al., 2006;Perner et al., 2014). HVs provide a wide range of microhabitats for chemolithoautotrophic microorganisms. In these environments, microorganisms are the primary producers by fixing inorganic carbon using chemical energy obtained through the oxidation of reduced inorganic compounds thus converting the geothermally derived energy into microbial biomass (Reysenbach and Shock, 2002).
Despite the vital role of microorganisms in these ecosystems and their potential biotechnological applications, key questions about their diversity and metabolic capabilities remain understudied simply because many hydrothermal vent environments remain largely unexplored (Dando et al., 2000;Reysenbach and Shock, 2002;Campbell and Cary, 2004;Mehta and Baross, 2006;Proskurowski et al., 2008;Brazelton and Baross, 2009;Byrne et al., 2009;Querellou et al., 2010;Xie et al., 2011;Bourbonnais et al., 2012;Kawaichi et al., 2013;Kilias et al., 2013;Cao et al., 2014;Perner et al., 2014;Stokke et al., 2015). Geologically unique and poorly studied HVs exist in the Hellenic Volcanic Arc (HVA), eastern Mediterranean, where volcanism and hydrothermal activity occur through thinned continental crust associated with the African-Eurasian subduction zone (Dando et al., 2000;Kilias et al., 2013). Major hydrothermal systems are found along the HVA at Methana, Milos, Santorini, Nisiros and Kos (Dando et al., 2000). Venting gases in these areas contain substantial amounts of CO2 (95%), H2 (9%), H2S (3%) and CH4 (1.8%) providing an ideal chemical environment for chemolithoautotrophic primary production (Dando et al., 1995;2000). The Santorini volcano is world famous because of its recent explosive eruption (∼3600 years ago), which was one of the largest known volcanic events in historical time. The Santorini volcanic field extends for 20 km as a line of more than 20 submarine cones. The largest of them is Kolumbo, a 3 km diameter cone with a 1500 m wide crater situated 505 m below sea level.
In 2006, chimneys emitting fluids with varying temperatures of up to 220°C, as well as lower temperature chimneys and vents (with fluids up to 70°C) were discovered in the northern part of the Kolumbo crater (Sigurdsson et al., 2006). This volcanic discharge of CO 2 contributes to local reductions in pH and serve as natural experiments in ocean acidification (Hall-Spencer et al., 2008;Tunnicliffe et al., 2008). Additionally, the enclosed geometry of the crater impedes vertical mixing and results in stably stratified density gradient and accumulation of acidic water (Carey et al., 2013). As a probable consequence, there is an absence of macrofauna (primary consumers and so on) that are typically associated with many HV sites. Indeed, the only thriving biological entities are of bacterial and archaeal origin (Kilias et al., 2013).
The exterior of most chimneys and large areas of the seabed around Kolumbo were covered with white/grey and reddish/orange microbial mats. This newly discovered hydrothermal vent field is gaining increasing interest for potential biotechnological exploitation and, recently, a detailed bio-geochemical description of this environment was undertaken (Kilias et al., 2013). 16S rRNA gene pyrosequencing analysis demonstrated the presence of highly diverse microbial communities and revealed the dominance of archaeal sequences closely related to Nitrosopumilus maritimus strongly suggesting that nitrification should play a key role in this environment (Kilias et al., 2013).
In the present study, we used shotgun metagenomic sequencing of microbial communities from two different sites located within the same volcanic complex (HVA; Fig. 1) in order to decipher their metabolic potential and environmental adaptation strategies. This comparison reveals the magnitude of phylogenetic and metabolic diversity that can be found at the Santorini volcanic complex. The first sample (hereafter white mat) was taken from the CO 2-rich environment of the Kolumbo volcano with temperatures varying from 60 to 70°C (Fig. 1C), whereas the second sample (hereafter red mat) was taken from the CO2-poor environment of the Santorini caldera with temperatures close to 22°C (Fig. 1B). We present a comparative metagenomic analysis of the two samples, as well as comparison with other publicly available datasets from related environments. Insights from taxonomic and functional observations from these microbial communities suggest unexpected physiological diversity including, notably, the use of aromatic hydrocarbons as an important carbon and energy source.

Results and discussion
White mat and red mat samples were collected from the HVA approximately 10 km apart, proximal to the Kolumbo and Santorini volcanic chambers, respectively, and processed as discussed in Supplementary Methods. Assembly and annotation yielded 2 928 214 and 6 508 507 contigs totalling 1445 and 3305 Gb of sequence encoding 3 874 122 and 8 540 156 coding sequences (CDS) for white mat and red mat, respectively (Table S1).

Community composition and binning
The taxonomic composition of assembled metagenomic sequences was assessed by phylogenetic analysis of 16S rDNA sequences and protein-coding marker genes (Supplementary Methods). Five hundred forty-two contigs containing 16S rRNA genes (Fig. S1) and 427 contigs containing phylogenetic marker proteins (Table S2) were used. 16S rRNA gene analysis of the two samples identified representatives of 58 archaeal and bacterial phyla and candidate divisions, whereas 30 phyla were detectable by protein phylogeny (Table S3). The results of phylogenetic analysis based on 16S rRNA genes and protein markers (Figs 2 and 3, Datasets S1, S2, S3, Table S3) were largely concordant and indicated that both samples were dominated by representatives of four bacterial phyla, Proteobacteria, Bacteroidetes, Chloroflexi and Planctomycetes. A clear difference in community structure between the two mats could be seen in the dominant taxa. The most abundant lineages in the white mat were Proteobacteria, including methylotrophic γ-proteobacteria from Methylococcales order and ε-proteobacteria from Campylobacterales order (Figs 2 and 3, Datasets S1, S2, S3, Table S3). Both lineages were nearly undetectable in the red mat. Among the most abundant individual ε-proteobacterial taxa in the white mat were some typical inhabitants of hydrothermal vents, such as the nitrate-reducing chemoautotroph Nitratiruptor, the sulfur-oxidizing chemoautotroph Sulfurimonas and the CO-oxidizing heterotroph Sulfurospirillum (Nakagawa et al., 2005;Callac et al., 2015) (Datasets S2, S3). In contrast to well-characterized ε-proteobacterial lineages, little is known about thermophilic γ-proteobacterial methylotrophs living in hydrothermal vent systems (Hirayama et al., 2014). The functional implications of their presence are discussed below. As expected, other thermophilic lineages, such as Thermotoga, Caldiserica and Acetothermia (formerly OP1) were found only in the white mat living at 60-70°C.
Several bacterial lineages were detected in both samples, albeit at different abundances. These include iron-oxidizing ζ-proteobacteria, sulfate-reducing δ-proteobacteria of Desulfobacterales order and representatives of phyla Planctomycetes, Verrucomicrobia, Ignavibacteria, Bacteroidetes (order Flavobacteriales), as well as candidate phylum Parcubacteria (previously known as OD1). The representatives of these shared lineages from both the white and red mats were quite divergent (Figs 2 and 3, Datasets S1, S2, S3, Table S3), underscoring the difference in community composition between the two samples. Archaeal lineages were not abundant in either sample (Figs 2 and 3). Among the Archaea, 16S rRNA genes from thermophilic Euryarchaea (mainly Thermoplasmata and Thermococci) were identified only in the white mat, which is consistent with the higher temperatures reported for this sample. Sequences similar to the ammonia-oxidizing Nitrosopumilus (Thaumarchaeota Marine Group I) were detected in both mats but at low abundances. Contrarily, a high proportion of sequences akin to Nitrosopumilus has been previously reported in polymetallic spires located in the same area (Kilias et al., 2013). This can be attributed to differences in sample type (i.e. loosely floating seafloor white mat versus firmly attached mat on gas chimney surface) and Phylogenetic tree of 16S assemblies and protein bins from red and white mats. Phyla, phylum-level candidate divisions, and major subdivisions of large phyla form the leaves of this 'near maximum likelihood' tree of 16S rRNA assemblies from each mat. Nodes with >80% bootstrap are indicated by a small black circle. Archaeal phyla are coloured green; Bacterial ones are blue. Branches coloured in brown mean that the associated taxon has at least one cultured representative. The four concentric rings of coloured circles on the outside of the tree indicate the normalized number of 16S assemblies and protein bins recovered for the adjacent taxon. Pink, red mat 16S assemblies; light grey, white/grey mat 16S assemblies; dark red, red mat protein bins; dark grey, white/grey mat protein bins. adopted sequencing methods (i.e. PCR amplicon versus metagenomic).
The contigs from most abundant and well assembled lineages were used as seeds to recruit additional sequences by analysis of their tetranucleotide composition using ESOM (Ultsch and Moerchen, 2005). A depiction of these results can be viewed in Fig. 3 and Dataset S3 while general binning statistics for white mat and red mat bins is provided in Table S4. Sizable bins were generated for several candidate lineages with no cultured representatives [e.g., Marinimicrobia, Aminicenantes (formerly OP8) and Parcubacteria (formerly OD1)], and it was possible to assign novel physiology to some of these, as discussed below.

Functional composition
Functional properties of the white mat and red mat communities were investigated and contrasted using KEGG Orthology (KO) or Pfam assignments. Significant differences in abundance were observed between the two samples for broad categories like cell motility and chemotaxis, replication and repair, signal transduction and more (data not shown). More specifically, the white mat sample displayed higher abundance of genes associated with mismatch repair and homologous recombination, which is consistent with previous findings (Xie et al., 2011) showing that microbial communities inhabiting extreme environments encode extensive DNA repair mechanisms to cope with environmental stress like high temperature, or high concentrations of H 2S and other sulfur compounds, trace metals, etc. (Pruski and Dixon, 2003). The white mat is also characterized by greater abundance of genes associated with signal transduction (two-component systems), chemotaxis and motility, akin to previous findings for hydrothermal vent systems (Xie et al., 2011). The relative abundance of these functions in the white mat could be the result of adaptation to steep chemical and thermal gradients due to diffuse venting from the Kolumbo volcano chimneys.
As submarine hydrothermal vents are typically dominated by chemolithoautotrophic communities that fix inorganic carbon using energy derived from the oxidation of reduced inorganic compounds that are plentiful in hydrothermal vent fluids (e.g. reduced sulfur compounds, hydrogen, methane) (Corliss et al., 1979;Alain et al., 2002), we examined these processes by analyzing marker genes for key enzymatic steps for CO 2 fixation, sulfur-, nitrogen-and methane metabolism. Where possible, metabolic processes were attributed to specific taxa or lineages in the red mat or white mat samples, based on detection of marker genes in Emergent Self-Organizing Maps (ESOM)-delineated bins. Furthermore, our datasets were contrasted with publicly available metagenomic datasets from related environments -such as other hydrothermal vents, hydrothermal vent plumes or deep subsurface samples (Table S5). Normalization was performed as described in Materials and Methods.

Carbon source
Autotrophic carbon fixation. Carbon fixation is a keystone biosynthetic process used by hydrothermal vent microbes to assimilate inorganic carbon (i.e. carbon dioxide) into biomass. In this study, we analyzed all six pathways known to perform CO2 fixation (Cody et al., 2001;Berg, 2011). Of these, evidence for the four major pathways were detected in both samples: (i) reductive tricarboxylic acid (rTCA) cycle, (ii) reductive acetyl coenzyme A (CoA) pathway (or Wood-Ljungdahl pathway), (iii) reductive pentose phosphate or Calvin-Benson-Bassham (CBB) cycle, and (iv) 3-hydroxypropionate/malyl-coA bicycle. Two additional pathways, which are only described in hyperthermophilic archaea to date, were not identified (Huber et al., 2008;Berg et al., 2010). These six pathways differ in terms of their energy and reducing equivalent requirements, oxygen sensitivity and more (Berg, 2011).
The rTCA cycle is known to play an important role in CO 2 fixation in hydrothermal vent environments (Campbell and Cary, 2004;Hugler et al., 2005;Nakagawa and Takai, 2008). Markers of the rTCA cycle [ATP citrate lyase (acl) (EC:2.3.3.8) and pyruvate:ferredoxin oxidoreductase (EC:1.2.7.1)] were detected in both metagenome samples but overrepresented in the white mat (Fig. 4A, Fig. S3b). This is a likely reflection of the dominant sulfur-and hydrogen oxidizing ε-proteobacterial members in the white mat, where diffuse venting results in > 99 weight % CO2 (Fig. 4A). Hydrothermal vent ε-proteobacteria are known to employ the rTCA cycle for CO2-fixation (Hugler et al., 2005). Correspondingly, a putative ATP-citrate lyase gene was detected in a white mat bin for an unknown ε-proteobacteria from the order Campylobacterales. The rTCA cycle requires significantly less ATP and reducing equivalent compared with other cycles, which is of potential advantage in an energy limiting environment.
Markers of the reductive acetyl-CoA (Wood-Ljungdahl) pathway were also detected in both samples including carbon monoxide dehydrogenase (CODH -EC:1.2.99.2) and acetyl-CoA synthase [EC:2.3.1.169] (acsB). These markers were found in higher proportions (>2 fold) in the red mat sample compared with the white mat (Fig. 4A,  Fig. S3c). Carbon monoxide dehydrogenase is the key enzyme of the reductive acetyl-CoA pathway, which may couple CO oxidation to sulfate reduction, or reduce CO 2 to either acetate (acetogenesis) or methane (methanogenesis) (Wu et al., 2005;King and Weber, 2007). This pathway is previously reported in anaerobic ammonia-oxidizing Planctomycetes, sulfate-reducing bacteria and in autotrophic Archaeoglobales (Berg, 2011). In this study, markers of Wood-Ljungdahl pathway were identified in red mat bin for candidate phylum OP8 (Aminicenantes), as well as in a white mat bin for unclassified δ-proteobacteria (NaphS2-like) (Fig. 4B). Due to high demand for metals, coenzymes (tetrahydropterin and cobalamin) and strict anaerobiosis requirements by key enzymes (CO dehydrogenase/acetyl-CoA synthase), occurrence of this pathway is restricted to specific niches despite its high energetic efficiency (Berg, 2011). In the expanded comparison, this pathway is clearly absent from all the plume samples (likely due to oxygen sensitivity) and may be a dominant autotrophic CO 2 fixation pathway in the subsurface samples (Fig. S4). The reductive acetyl-CoA pathway may also serve in energy conservation by generating an electrochemical gradient (Berg, 2011).
Although CBB pathway was not the most prevalent mode of CO2 fixation in the studied environment, marker genes of this pathway, such as ribulose-bisphosphate carboxylase (RubisCO -EC:4.1.1.39) and phospho-ribulokinase (PRK -EC:2.7.1.19), were detected in both samples (Fig. S3a). Some of these markers were captured in ζ-(Mariprofundus-like) and γ-proteobacterial (Halorhodospira-like) bins from red mat (Fig. 4), although these taxonomic assignments are tentative as they are not colocalized with cognate genes. Details of specific RubisCo genes examined using high scoring BLASTp hits can be found in Supplementary Materials. The CBB cycle is known to be ubiquitous and energetically expensive with an upper temperature limit of ∼70 to 75°C due to heat instability of intermediate products. Comparing the expanded set of metagenomic samples, the CBB cycle is seen as the predominant autotrophic CO2 fixation pathway in the vent samples (Spillway iron-oxidizing mat and Lost City) and the plume samples (Fig. 5, Fig. S4), but less-so in the subsurface samples. The occurrence of this pathway in the higher temperature Lost City hydrothermal vent samples (with a reported temperature of ∼90°C) is a curiosity; however, unknown specifics about the sample site might provide a suitable explanation.
Methanotrophy. In addition to autotrophy discussed above, evidence of methanotrophy (methane as sole carbon and energy source) is presented by functional marker data (Fig. 4B), primarily in the white mat sample, and possibly restricted to a few γ-proteobacterial Methylococcaceae lineages. Specifically, the marker gene, methanol dehydrogenase (Mdh1) (Lau et al., 2013), was found predominantly in the Methylococcaceae bins from the white mat (Fig. 4B). Correspondingly, a small amount of methane (0.26%) was detected in gas samples from the Kolumbo hydrothermal vents (Carey et al., 2013). Although γ-proteobacterial methanotrophs are known to be aerobic and the sampled environment is primarily anaerobic, anaerobic oxidation of methane may be proposed using oxygen produced from the dismutation of nitric oxide into nitrogen and oxygen gas during denitrification, as previously described for Methylomirabilis oxyfera (Ettwig et al., 2010). Evidence for another type of anaerobic methanotrophy (using reverse methanogenesis) has been found in the Guaymas Basin hydrothermal vents as represented by two archaeal methane oxidizers (ANME-1 and ANME-2) (Teske et al., 2003). However, 16S rRNA gene data do not support their presence in the Kolumbo/Santorini samples. Marker genes for methanogenesis (McrABG) were negligible (concurring with absence of acetoclastic and hydrogenotrophic methanogens in either sample) as methanogens are known to be inhibited by sulfate (Winfrey and Zeikus, 1977).

Anaerobic degradation of aromatic hydrocarbons.
Unexpectedly, both samples showed evidence for chemoheterotrophy based on presence of markers for anaerobic degradation of low molecular weight aromatic hydrocarbons. These include enzymes for anaerobic transformation of aromatic compounds by radical addition of fumarate to methyl or ethyl groups on the aromatic ring (e.g. benzylsuccinate synthase), aromatic hydroxylation (e.g. ethylbenzene dehydrogenase) and conversion of the central intermediate, benzoylcoenzyme A, to acetyl-CoA, by reductive ring cleavage (e.g. benzoyl-CoA reductase) (Heider, 2007;An et al., 2013a,b). Examining their occurrence in all available metagenomes within the IMG/M database (Markowitz et al., 2014), these anaerobic hydrocarbon degradation marker genes were detected primarily in oil-or coalimpacted environments, natural oil seeps, peatlands and contaminated sites. Besides these highly impacted areas, some marker genes were also detected in the deep marine and terrestrial subsurface samples in our subset, but not in the other hydrothermal vent or plume environments (Fig. 5, Fig. S5). Interestingly, the occurrence of these markers in the red mat was four times higher compared with the white mat (Fig. 4, Fig. S5). A complete operon for the anaerobic degradation of benzoyl-CoA was identified in the red mat and attributed to the candidate phylum SAR406 [Marinimicrobia or Marine Group A; (Rinke et al., 2013)] (Fig. S6), for which little information is available. SAR406 representatives are prevalent in oxygen-deficient marine systems (e.g. minimum zones, anoxic basins) (Gordon and Giovannoni, 1996;Allers et al., 2013), and they were recently proposed to be involved in the marine sulfur cycle (Wright et al., 2014). Our findings propose a previously undescribed ecological role for members of this candidate phylum. Although no markers were identified, the δ-proteobacterial lineage NaphS2 detected in the white mat sample (Dataset S3) is a known anaerobic degrader of naphthalene (DiDonato et al., 2010).
Additional complementary evidence supporting the presence of anaerobic hydrocarbon degradation markers comes from the observation of shared CRISPR-Cas system spacers between the HVA samples and engineered environments. Specifically, we have observed that a significant percentage of the spacer hits from both red and white mat (6.1 % and 11.8 % respectively) were against hydrocarbon-impacted environments including oil seeps, subsurface oil reservoirs and anthropogenically contaminated sites (Dataset S7). At this stage no phage genomes were reconstructed and due to the inability of CRISPRs to assemble well, we are forced to work with tiny scaffolds with limited taxonomic information. Thus, we were unable to clearly define the source of the spacers identified in this study.
Given these observations, discovering a potential source of these compounds in the Santorini-Kolumbo volcanic field is important. In a previous study at a Guaymas Basin hydrothermal vent, seabed hydrocarbons were attributed to the hydrothermal alteration of organic sediment originating from highly productive surface waters [i.e. organic matter is pyrolized to petroleum-like aliphatic and aromatic hydrocarbons, short-chain fatty acids, ammonia, and methane; (Teske et al., 2014)]. Other studies suggest an abiotic origin of hydrocarbons from thermogenesis occurring deep in the earth's crust (using CO 2 and mantle carbon) that could subsequently be dissolved and vented out with hydrothermal fluids (Proskurowski et al., 2008;Konn, 2009;Konn et al., 2011;Li et al., 2012).
Both of these theories could explain the source of monoaromatics in the Santorini hydrothermal area. However, our post-hoc attempts to detect alkylated monoaromatic hydrocarbons in mat samples after 6 years of cold storage were unsuccessful, though the absence of such volatile compounds could be due to their evaporation losses during sampling and/or storage. It should be also stressed that a source of monoaromatics cannot be attributed to anthropogenic pollution as this should also result in high levels of polycyclic aromatic hydrocarbons (PAHs). Our measurements in the red mat samples revealed total PAHs concentrations (sum of 16 PAH members) to be lower than 100 ng g −1 dry weight (unpublished results), suggesting that the impact of pollution was rather limited. Clearly, a comprehensive, accurate sampling/analysis of monoaromatic compounds and their metabolites, along with transcriptomic characterization of the marker genes would be the most judicious approach to assess whether their presence and activities are germane in Santorini-Kolumbo volcanic field.

Energy source
Sulfur cycle. Sulfur compound oxidation and reduction are dominant strategies for energy production by hydrothermal vent microbes, and presumably hydrogen sulfide is a major electron donor used by resident chemolithotrophs. Active sulphide-sulphate structures have been mapped at the north part of the Kolumbo hydrothermal field where the streams of white mat are located (Kilias et al., 2013). Sulfur oxidation (Sox) genes indicative of neutrophilic sulfur oxidizing chemoautotrophs, capable of oxidizing various reduced sulfur compounds to sulfate (Kelly et al., 1997;Friedrich et al., 2001), were at least four times more prevalent in the ε-proteobacteria-dominated white mat (Fig. 4A, Fig. S7)where they can likely make quick use of electron donor supplied by reduced vent fluids or from the chemical oxidation of metal sulfides (Schippers and Sand, 1999). Sox genes were identified in white mat bins for Campylobacteraceae and Methylococcaceae, as well as in a red mat bin for an unknown γ-proteobacterium (Fig. 4B, Supplementary Results and Conclusions, Fig. S7).
Marker genes for the dissimilatory anaerobic sulfate reduction to sulfide were found in similar proportions in both samples. These genes were captured in red mat bins for Bacteroidetes, Halorhodospira and in a white mat bin for delta-NaphS2-like (Fig. 4). Overall, the presence of marker genes for sulfur oxidation and dissimilatory sulfate reduction suggest the potential that such processes are occurring in all the metagenomic samples contrasted in our analyses (Fig. 5).
Nitrogen cycle. Previous reports from the Kolumbo crater hyrothermal vent field have shown that the nitrogen cycle plays a crucial role in the establishment of the geochemical environment of the field (Kilias et al., 2013). High ammonium and nitrate concentrations were previously detected in the Kolumbo hydrothermal vent field (Kilias et al., 2013), and have also been reported in the Guaymas Basin, the Okinawa Trough Backarc Basin and the Endeavor segment of the Juan de Fuca Ridge (Ishibashi et al., 1995;Mehta et al., 2003;Bourbonnais et al., 2012). These inorganic nitrogen compounds are important terminal electron acceptors for chemolithoautotrophic growth in hydrothermal vents (Ishibashi et al., 1995;Mehta et al., 2003;Nakagawa and Takai, 2008). The rapid consumption of ammonium by chemoautotrophs has been demonstrated in the hydrothermal vent plumes of the Juan de Fuca Ridge (Lam et al., 2004) as well as in Kolumbo geothermal gradient (Kilias et al., 2013). Evidence of nitrification, denitrification and nitrogen fixation in hydrothermal vents systems have all been previously reported (Mehta et al., 2003;Bourbonnais et al., 2012;Voss et al., 2013).
Both red and white mat samples are replete with genes required for denitrification (Figs 4,5,S8,S9), suggesting that nitrate could be an important electron acceptor, possibly coupled to sulfur-oxidation, as a significant energygenerating pathway for chemolithoautotrophic growth in these environments. The dominant fate of nitrate in the red and white mat is likely denitrification (based on proportions of the NirKS markers), although some minimal evidence of dissimilatory nitrate reduction to ammonium (DNRA) (NrfAH markers) is also captured in both mat samples (Fig. 4, Fig. S9). NirKS genes were found in bins for Bacteroidales, Gemmatimonadetes and SAR406 in the red mat, and Bacteroidales, Campylobacterales and SUP05-like γ-proteobacteria were found in the white mat (Figs 4B, S9) (see Supplementary Results and Conclusions for gene details).
Other electron donors. In addition to the oxidation of reduced sulfur and nitrogen compounds, there is evidence based on phylogenetic marker-based identification of taxa known to be involved in hydrogen oxidation (by way of Ni-Fe hydrogenases), iron oxidation (protein marker-based evidence of potentially iron-oxidizing members within Mariprofundales and Nitrospira spp.; Fig. 3) and potential manganese or other donor oxidation (by multicopper oxidases) in both red and white mat samples (data not shown). Notably, potential iron respiration genes were found in Aminicenantes (OP8) bin in the red mat (Supplementary Results and Conclusions). Therefore, a large and diverse species of electron donors is likely in these samples.
Overall, this investigation provides new insights into the spectrum of possible physiological roles of hydrothermal vent microorganisms. Evidence from the analysis of our HV samples reveals that autotrophic carbon fixation is mainly a product of the rTCA and CBB cycle, which appeared to be primarily driven by the oxidation of reduced sulfur compounds through SOX-dependent pathway, using either oxygen or nitrate (derived from ammonia oxidation) as terminal electron acceptors. Thus, the availability of reduced sulfur compounds, as well as oxygen and nitrate, appear to be key parameters structuring the microbial community as reflected by the community composition of the 16S and binning results presented here. On the basis of the genomic organization of the key genes of the carbon fixation and sulfur oxidation and ammonia oxidation pathways contained in the metagenomic sequences, both obligate and facultative autotrophs appear to be present and contributing to biomass production. Moreover, the observation of a complete operon for the anaerobic degradation of benzoyl-CoA in the red mat, and attributed to the candidate phylum SAR406 not only proposes a previously undescribed ecological role for members of this candidate phylum, but also opens up new avenues of exploration into the origin of aromatic hydrocarbons in the HVA ecosystem and their role in shaping microbial community structure and function.

Samples collection
Microbial mat samples were collected during the sampling expedition EN419 undertaken in

DNA isolation and library construction
Approximately 50 μg of high molecular weight DNA (>23 kb) was isolated from each microbial mat following the detailed protocol described by Brady (Brady, 2007). DNA samples were shipped on dry ice to Joint Genome Institute, Department of Energy, USA, for library construction (see Supplementary Information 1.3).

Sequencing, assembly and annotation
Sequencing was performed using two platforms, Roche 454 and Illumina (GAIIX generating 114 bp reads and HiSeq2000 generating 151 bp reads). Roche 454 pyrosequencing generated 1 429 091 reads (507 Mb) for the red mat and 299 975 reads (114 Mb) for the white/grey mat. The Illumina platform produced 45 337 178 GAIIX reads and 304 325 806 HiSeq2000 reads for the red mat with a total of 51 Gb. For the white mat 49 227 146 GAIIX reads and 304 325 806 HiSeq2000 reads totaling 53 Gb were generated.
The assembly of the 454 data was performed with Newbler v2.4, whereas the Illumina reads were assembled with IDBA_UD v1.1.1 . Contigs from both assemblies were merged together using MeGAMerge (Scholz et al., 2014) (see Supplementary Information 1.4). Assembled sequences were annotated using the IMG/M metagenome analysis system (Markowitz et al., 2014) (see Supplementary Information 1.5 - Table S1). Both datasets are available in the IMG/M system (accessions 3300002231 and 3300002242 for red and white mat respectively).

Tree construction for 16S rRNA gene sequences found on contigs
Full-length 16S rRNA gene sequences (>1250 bp) were merged with the representative set of all 16S rRNA gene sequences used by Rinke and colleagues (2013) and treed using the SILVA-aligner and FastTree programs; shorter 16S rRNA gene sequences (>500 bp) were 'Quick-Added' using ARB (see Supplementary Information 1.6).

Tree construction for concatenated protein marker genes
A reference set of 56 single copy protein marker genes from 689 isolate and single cell genomes with known taxonomy were used to create Hidden Markov Models (HMMs) that enabled extraction, alignment and phylogenetic tree construction of these protein marker genes found on metagenomic contigs (see Supplementary Information 1.6).

Functional abundance comparisons
Gene counts for individual KEGG Orthlology terms and Pfam domains for each metagenome were retrieved from the IMG/M v4.0 management system (Markowitz et al., 2014). To enable comparison, these counts were normalized by dividing with the mean count of 10 universal (conserved in all three domains of life) ribosomal proteins (Table S2), as an estimate of the number of populations for each sample. We also performed normalization using individual ribosomal protein markers, as well as length-normalized counts (by dividing the sums of the lengths of the proteins by the expected length of the marker protein) to account for any possible inflation in counts due to fragmentation of ORFs. All approaches yielded identical results and inferences.

CRISPR-Cas system spacer information
We used a combination of three CRISPR-Cas spacer detection tools: piler-CR (Edgar, 2007); CRT (Bland et al., 2007); and CRASS (Skennerton et al., 2013). Both piler-CR and CRT tools were run against metagenomic assembled data, whereas the CRASS tool was run against unassembled reads. Using these tools, we generated a total spacer database of 44 402 and 252 238 spacers for the red mat and the white mat metagenomes respectively. After removing redundant spacers, we used BLASTn to map these spacers to the red and white mat samples as well as an additional 3695 metagenomes found in IMG/M (see Supplementary Information 1.8, 1.9).

Fig. S1.
Work flow for analysis of assembled 16S rRNA gene sequences. Fig. S2. Viral contig assignments based on gene composition for (a) red mat and (b) white mat. Fig. S3. Enrichment of genes normalized using IMG estimation utility for major carbon fixation pathways. (a) CBB cycle, (b) rTCA cycle, and (c) reductive acetyl-CoA pathway. Diagrams are based on KEGG pathway maps. Enzyme (EC) or COG classification numbers for each step were included in boxes. Box color indicated total gene counts of each enzyme with darker red displaying higher gene count and green color representing lower gene count according to the legend provided. The red mat metragenome is always shown on the left and the white mat metagenome on the right.  Fig. S6. Schematic depicting of the complete benzoyl-coA degradation operon discovered in the red mat bin for SAR406. Fig. S7. Sox-dependent and Sox-independent sulfur oxidation pathways identified in the present study. Genes are normalized using IMG estimation utility. SO4 2-: sulfate; SO3 2-: sulfite; S 0 : sulfur; S2O2 2-, thiosulfate; S 2-: sulfide; APS: adenylylsulfate; SOX: sulfur oxidation multi-enzyme complex. Box color indicates the total gene counts for each enzyme with colour coding as in Fig. S5. White mat gene counts are shown on the left and red mat gene counts are displayed on the right. Genes encoding the putative sulfide quinone oxidoreductase (sqr) and the flavocytochrome c/sulfide dehydrogenase (FccAB) were not identified. These enzymes catalyze the oxidation of sulfide to sulfur, with FccAB often associated with low sulfide concentrations (Mussmann et al., 2007;Mussmann et al., 2011). On the other hand sulfite reductase (EC:1.8.1.2) was found in high frequencies in both the white and red mat metagenomes. This enzyme is known to catalyze the oxidation of sulphide to sulfite. This may reflect the adaptation of different microbial mat communities to varying sulfide concentrations. The elemental sulfur formation reaction catalyzed by sulfide quinone oxidoreductase or FccAB can be by-passed and direct oxidation to sulfite can be performed. There was further evidence for the reverse dissimilatory sulfate reductase reaction for sulfur oxidization due to presence of key enzymes involved in this pathway [dissimilatory sulfate reductase (EC:1.8.99.3), adenosine-5′-phosphosulfate reductase (EC:1.8.99.2) and ATP sulfurylase (EC:2.7.7.4)]. Fig. S8. Nitrogen metabolism pathway. Components of the nitrogen metabolism pathways identified in this study. Box color indicated the gene counts for each enzyme with color coding as previously shown in Fig. S7.  Fig. S9 Fig. S10. Schematic showing the presence of the minimal gene set for nitrogen fixation in white mat bins for Campylobacteraceae, Coraliomargarita and Methylococcaceae_38GC. Fig. S11. Principal component analysis PC1 vs. PC2 plot of metagenomes chosen for comparison in this study using normalized KO counts. Grey circles are hydrothermal vent plume samples, red circles are red and white mat from Hellenic Volcanic Arc, orange circles are serpentinite rock and fluid samples, yellow circles are deep oceanic subsurface samples, pink circle is Lost City hydrothermal vent, and purple circle is hydrothermal vent from Loihi Seamount, Hawaii (Spillway mat). Table S1. Assembly and annotation statistics for white and red mat samples. Table S2. Protein markers used for inference of contig taxonomy and function count normalization. Table S3. Assemblies of rRNA by Taxon for Red and White Mats. Depth of coverage of assembled 16S rRNA gene sequences and counts of binned protein-containing contigs are reported by taxon on the table below. White mat counts were normalized to the raw red mat counts by multiplying by 2.29, which is the ratio of the red mat to white mat metagenome size. Table S4. Bin statistics: General statistics for bins in red and white mat metagenomes. Table S5. Samples metadata for publicly available metagenomes from IMG. Table S6. Pfams related to iron respiration pathways and their counts in white and red mat samples. Dataset S1. ARB Tree of 542 16S Assemblies from the Red and White Mats. Sequences in red were assembled from the red mat; sequences in grey from the white/grey mat. Sequences in black are reference sequences from the ARB database used by Rinke et al. (2013). The data elements shown for the black sequences are: phylum/division assignment, this study; phylum/division assignment Rinke et al.; GenBank nucleotide accession; sequence length; % Identity to Least Common Ancestor (LCA) in SILVA; taxonomy of LCA in SILVA. The data elements shown for the red and white/grey sequences are: phylum/division assignment, this study; IMG scaffold ID; sequence length; literal 'no trim' (final sequences were not trimmed during load to ARB); % Identity to Least Common Ancestor (LCA) in SILVA; taxonomy of LCA in SILVA. Groups of sequences having no 16S assemblies and which were collapsed for this figure are suffixed with 'noKV.' All other groupings are shown in the Fig. 2 of the main text. Dataset S2. Phylogenetic analysis of metagenomic contigs containing protein markers genes for red and white mat samples. Dataset S3. Taxonomic assignment and abundance estimation of red mat and white mat contigs. (a)Taxonomic assignment of contigs containing one or more protein marker genes has been inferred based on their placement in phylogenetic tree. (b) Total sequences of contigs assigned to the lineage has been determined by multiplying contig length by respective coverage (Excel File Supplementary_Dataset_S3.xlsx). Dataset S4. Viral contigs detected in the red mat. Dataset S5. Viral contigs detected in the white mat. It is indicated the host contig connection when possible. Dataset S6. Viral contig assignments based on gene composition detected in both mats. Dataset S7. Metagenomes containing CRISPR spacers from the red mat and the white mat. We used the Genomes On Line Database (GOLD) sample project ID from IMG/M to categorized the ecosystems. (*) Spacer counts refer to the number of total hits that a spacer from the red mat (R_) or white mat (W_) in any given metagenome. Dataset S8. List of spacer sequences.