High‐contiguity genome assembly of the chemosynthetic gammaproteobacterial endosymbiont of the cold seep tubeworm Lamellibrachia barhami

Abstract Symbiotic relationships between vestimentiferan tubeworms and chemosynthetic Gammaproteobacteria build the foundations of many hydrothermal vent and hydrocarbon seep ecosystems in the deep sea. The association between the vent tubeworm Riftia pachyptila and its endosymbiont Candidatus Endoriftia persephone has become a model system for symbiosis research in deep‐sea vestimentiferans, while markedly fewer studies have investigated symbiotic relationships in other tubeworm species, especially at cold seeps. Here we sequenced the endosymbiont genome of the tubeworm Lamellibrachia barhami from a cold seep in the Gulf of California, using short‐ and long‐read sequencing technologies in combination with Hi‐C and Dovetail Chicago libraries. Our final assembly had a size of ~4.17 MB, a GC content of 54.54%, 137X coverage, 4153 coding sequences, and a checkm completeness score of 97.19%. A single scaffold contained 99.51% of the genome. Comparative genomic analyses indicated that the L. barhami symbiont shares a set of core genes and many metabolic pathways with other vestimentiferan symbionts, while containing 433 unique gene clusters that comprised a variety of transposases, defence‐related genes and a lineage‐specific CRISPR/Cas3 system. This assembly represents the most contiguous tubeworm symbiont genome resource to date and will be particularly valuable for future comparative genomic studies investigating structural genome evolution, physiological adaptations and host‐symbiont communication in chemosynthetic animal‐microbe symbioses.


| INTRODUC TI ON
Mutualistic symbioses between chemoautotrophic bacteria and invertebrate animals sustain deep-sea hydrothermal vent and cold seep ecosystems worldwide (Dubilier, Bergin, & Lott, 2008). Among the key fauna are vestimentiferan tubeworms (Polychaeta; Siboglinidae), which act as foundation species by creating biomass-rich aggregations that provide habitat space and ecological niches for a variety of other co-occurring animal taxa (Bright & Lallier, 2010). Adult tubeworms do not possess a functional digestive tract and are nutritionally dependent on their gammaproteobacterial endosymbionts that are housed in a specialized organ within the coelomic cavity (trophosome). Through the oxidation of sulphide or hydrogen these symbionts gain chemical energy to convert inorganic carbon to organic matter, which serves as food for the host (Petersen et al., 2011;Thiel et al., 2012).
The ecophysiology, environmental transmission mode, population structure and genomics of the vent-dwelling symbiont Candidatus Endoriftia persephone, especially in association with its host Riftia pachyptila, have been investigated extensively (e.g., Nussbaumer, Fisher, & Bright, 2006;Markert et al., 2007;Robidart et al., 2008;Robidart, Roque, Song, & Girguis, 2011;Gardebrecht et al., 2012;Klose et al., 2015;Perez & Juniper, 2016;Hinzke et al., 2019). By contrast, comparatively little is known about the biology of symbionts associated with other tubeworm species, in particular those that are usually found at cold seeps. Aspects of the evolution, physiology and ecology of these symbionts can be expected to be different from that of vent-associated symbionts, given that seeps are sedimented habitats that differ in physicochemical conditions and environmental stability from hydrothermal vents (Bright & Lallier, 2010). In addition, seep-associated host species exhibit slower growth rates and longer lifespans than their vent relatives and develop extensive subsurface root systems that play important roles in the energy cycles of both host and symbiont (Cordes, Arthur, Shea, Arvidson, & Fisher, 2005;Boetius, 2005).
To date, four seep tubeworm symbiont genomes from the Gulf of Mexico and the South China Sea (Escarpia spicata, Lamellibrachia luymesi, Seepiophila jonesi, Paraescarpia echinospica symbionts) and three vent tubeworm symbiont genomes from the East Pacific Rise and the Juan de Fuca Ridge (Riftia pachyptila, Ridgeia piscesae, Tevnia jerichonana symbionts) have been published (Gardebrecht et al., 2012;Perez & Juniper, 2016;Li, Liles, & Halanych, 2018;Yang et al., 2019). Recent comparative genomic analyses based on these genomes suggest that seep-associated tubeworm symbionts use the same carbon fixation and sulphur oxidation pathways as vent-associated tubeworm symbionts, but might have a higher potential to acquire foreign genetic material, contain a larger amount of virulence factors for modulating host-symbiont interactions and utilize a more diverse repertoire of energy sources for their metabolism Yang et al., 2019). However, since seep tubeworms have broad geographic distributions and can occur at other types of chemosynthetic habitats (e.g., McMullin, Hourdez, Schaeffer, & Fisher, 2003; Reveillaud, Anderson, Reves-Sohn, Cavanaugh, & Huber, 2018), genomic analyses on their symbionts from a variety of biogeographic regions are needed to better understand the links between symbiont metabolic capacities, diversity, evolution and host niche utilization.
In addition, due to the difficulties associated with metagenomic data analysis, the currently available symbiont genome assemblies have varying degrees of fragmentation, which complicates comparative genomic investigations of structural rearrangements, such as gene duplications, translocations or inversions.
To address these limitations and improve assembly contiguity, we sequenced and scaffolded the symbiont genome of the tubeworm species Lamellibrachia barhami ( Figure 1) from a hydrocarbon seep that was recently discovered at the Pescadero Transform Fault in the southern Gulf of California (Goffredi et al., 2017;Paduan et al., 2018;Clague et al., 2018). Our approach combined Illumina shotgun, Nanopore and Hi-C/Chicago data to generate a chromosome-level assembly, which we compared against previously published tubeworm endosymbiont genomes (Table 1). CA, USA). To obtain high-molecular weight (HMW) DNA we also performed extractions using a CHAOS buffer protocol (Supporting Information). The tubeworm host species were identified based on their mitochondrial cytochrome-c-oxidase I (COI) sequences, which we amplified and sequenced with the primer pairs jgLCO1490 and jgHCO2198 (Geller, Meyer, Parker, & Hawk, 2013) following previously published PCR protocols (Breusing, Johnson, Tunnicliffe, & Vrijenhoek, 2015). Sequences were edited in Geneious v9.1.8 (http:// www.genei ous.com/) and annotated via Blast v2.9.0+ searches (Camacho et al., 2009). investigated for quality with FastQC v0.11.5 (Andrews, 2010) and then adapter-clipped and quality-trimmed with TriMMoMatiC v0.36 (Bolger et al., 2014) using a custom adapter file and the following options:

| Nanopore long-read sequencing and read preparation
To assist the resolution of repeat regions in the symbiont genome, we sequenced ~1 million long reads of D756-A12-LB11 using Oxford Nanopore

| Hybrid metagenome assembly and binning
We used idBa-ud v1.1.3 (Peng, Leung, Yiu, & Chin, 2012) to initially create a combined draft metagenomic assembly from all sequenced individuals, choosing k-mers between 21 and 121 at a step size of 10 and a minimum support of 2. Reads were corrected before assembly with the --pre_correction option. Binning of the assembly was performed with GBtools v2.6.0 (Seah & Gruber-Vodicka, 2015 and 111 at an increment of 10. Binning was done as described above.

| Genome scaffolding and annotation
Our assembly approach resulted in 328 prescaffolds. To increase the contiguity of the genome, we concentrated GC-rich symbiont DNA via CsCl-bisbenzimide gradient centrifugation (Tran-Nguyen & Schneider, 2013; Supporting Information ), and then prepared four Hi-C and two Chicago libraries for three L. barhami individuals using the FatI and DpnII restriction enzymes (Belton et al., 2012;Putnam et al., 2016). After assessing library performance on a MiSeq system, four successful libraries were sequenced on an Illumina HiSeq4000 system with a 2 × 150 bp paired-end protocol (Table 3).
The JuiCer v1.5 and 3d-dna v180922 pipelines (Durand et al., 2016;Dudchenko et al., 2017) were subsequently applied for scaffolding using long-range linkage information provided by the Hi-C and Chicago data.
To reduce assembly errors, we re-assembled and -scaffolded the genome for D756-A12-LB11 as described above, using only reads that mapped to the Hi-C/Chicago scaffolds. The final assembly was re-binned and polished with Pilon v1.22 (Walker et al., 2014).
Input NEXUS files were prepared by aligning the symbiont 16S rRNA sequences against the global Silva 16S rRNA alignment with Sina v1.2.11 (Pruesse, Peplies, & Glöckner, 2012). Two runs of four chains (three heated plus one cold) were run for 1,100,000 generations at a sampling interval of 100 generations and a burnin of 100,000 generations.

| Genome assembly
Our sequencing approach in the candidate individual D756-A12-LB11 yielded ~3.9 million symbiont Illumina sequences and ~180,000 symbiont Nanopore sequences for assembly, which were joined into 19 scaffolds with the help of ~13 million Hi-C and Chicago reads (

| Comparative genomics and phylogenomics
Phylogenetic analyses of the 16S rRNA gene and 1,290 orthologous single-copy gene clusters indicated that the L. barhami symbiont belongs to the Seep Group of vestimentiferan endosymbionts and is closely related to the recently sequenced symbiont of the tubeworm species Paraescarpia echinospica from the South China Sea (Figures 3a,b). The genome of the L. barhami symbiont was characterized by a markedly higher contiguity than other assemblies and contained 433 unique gene clusters (consisting of 443 genes) (Figures 3a,b; Table S3). The majority of these gene clusters had unknown functions, while several others were involved in viral defence mechanisms and genetic transposition ( Figure 4a; Table S3). Notably, we found different type I and type II restriction-modification systems as well as a lineage-specific CRISPR-Cas3 system (type I-E) associated with two distinctive CRISPR arrays, one consisting of 27 repeats and 26 spacers and another one consisting of 15 repeats and 14 spacers (Table S1, S2, S3). All genomes shared 1,749 core gene clusters (consisting of 15,149 genes), but differed in the presence or absence of 4,710 gene clusters (consisting of 14,792 genes) (Figure 3b; Table S4). The core genome is abundant in genes involved in energy production and conversion, translation, signal transduction, post-translational modification, amino acid metabolism,
These include: the SoxXYZAB-multienzyme complex (without

| Nitrogen metabolism
The L. barhami symbiont genome encodes genes for both assimilatory and dissimilatory nitrate reduction (

Tevnia jerichonana symbiont
Ridgeia piscesae symbiont octaheme tetrathionate reductase that was previously shown to reduce nitrite to ammonia in vent-associated tubeworm symbionts (Robidart et al., 2011;Gardebrecht et al., 2012). In contrast to other genome reports Reveillaud et al., 2018), we found no evidence for the presence of a membrane-bound respiratory nitrate reductase (Nar).

| D ISCUSS I ON
While several tubeworm symbiont genomes have been sequenced Yang et al., 2019), the present assemblies remain unfinished due to challenges inherent to metagenomic data sets, including strain-level heterogeneity, interspecies repeat regions and low or uneven read coverage (e.g., Nurk, Meleshko, Korobeynikov, & Pevzner, 2017). We tried to remedy these challenges by using a combination of Illumina shotgun sequencing, ONT long-read sequencing and chromosome conformation capture techniques to reconstruct the metagenome of the gammaproteobacterial endosymbiont of the cold seep tubeworm L. barhami from the Gulf of California. Our approach resulted in the first near chromosome-level assembly for these symbionts. While this work will be a helpful guideline for other researchers working on metagenomic data, it provides an especially useful resource for further studies on the molecular adaptations, genetic variation and genome evolution of chemosynthetic bacteria.
Comparative genomic analyses indicated that the genome of the L. barhami symbiont is distinct from other tubeworm symbiont genomes in terms of both nucleotide sequence and structure. ANIs relative to other symbiont genomes were typically less than 93%, with the exception of the P. echinospica symbiont genome, which showed >97% identity. ANIs between the other seep-associated symbionts (harboured by L. luymesi, Escarpia spicata and Seepiophila jonesi) as well as between the vent-associated symbionts (harbored by Riftia pachyptila, Ridgeia piscesae and Tevnia jerichonana) were greater than 94%. Based on the currently recommended species-level ANI cutoff of 94%-96% (Konstantinidis & Tiedje, 2005;Goris et al., 2007;Richter & Rosselló-Móra, 2009;Meier-Kolthoff, Auch, Klenk, & Göker, 2013), these findings suggest that the symbionts analysed in this study belong to at least three different bacterial species (and distinct bacterial strains), including the vent-specific symbiont species Ca. Endoriftia persephone (confirming conclusions by Perez & Juniper, 2016) and two undescribed seep-specific symbiont species.
While our results imply a potential sister relationship between the symbionts of L. barhami and P. echinospica, future comparative genomic studies will need to reassess how representative these phylogenetic placements are once taxonomic sampling and genome information becomes available for more tubeworm symbiont species and strains.
Despite sequence similarities in homologous regions all symbiont genomes showed a marked amount of structural rearrangements, such as translocations, indels and inversions, as well as differences in the presence of several gene clusters. This level of variation contrasts markedly with analyses based on the 16S rRNA gene, which have so far implied that tubeworm symbionts are genetically very similar, at least within vent and seep groups (McMullin et al., 2003).
In contrast to prevailing beliefs, recent studies have shown that tubeworm symbionts can have varying degrees of intrahost diversity (Zimmermann et al., 2014;Reveillaud et al., 2018;Polzin, Arevalo, Nussbaumer, Polz, & Bright, 2019;Breusing, Franke, & Young, 2020) -a finding that is supported by our analyses. Despite potential inter-strain competition, symbiont heterogeneity is predicted to be maintained if symbiont strains are functionally distinct and thereby promote adaptation of their hosts to fluctuating environmental conditions (Zimmermann et al., 2014;Perez & Juniper, 2016;Reveillaud et al., 2018;Polzin et al., 2019). The level of polymorphism detected in this study appears to be markedly higher than previously reported values (6.84 SNVs/kbp as opposed to 0.5-2.24 SNVs/kbp) (Reveillaud et al., 2018;Polzin et al., 2019), which could be related to differences in the diversity of free-living symbiont populations at the time of host infection and/or differences in transmission modes and timing among symbiont and host species. The symbiont infection process has only been clearly documented in the Riftia-Endorifta association, although the molecular mechanisms that establish and maintain symbioses in vestimentiferan tubeworms are still poorly understood (Nussbaumer et al., 2006;Klose et al., 2015). Nussbaumer et al. (2006) showed that Riftia tubeworms acquire their symbionts transdermally during the larval stage from a free-living Endoriftia population that is present in the hydrothermal fluids. The symbionts subsequently initiate a profound metamorphosis of the mesoderm into trophosomal tissue. How symbionts are transmitted in seep-associated host species is largely unknown, although the infection process probably differs from that of vent-associated symbionts, considering the complex plant-like anatomical and ecological adaptations of seep tubeworms to their sediment-based environment (Cordes et al., 2005;Boetius, 2005). The L. barhami symbiont genome encodes a variety of nodulation factors, auxin biosynthesis proteins, chemoreceptors and outer membrane components, such as lipopolysaccharides, capsular polysaccharides, exopolysaccharides and β-glucans. All of these compounds are essential for host-symbiont encounter, bacterial adhesion and penetration of host cells in the Rhizobia-legume symbiosis (e.g., Fraysse, Couderc, & Poinsot, 2003;Downie, 2010;Marczak, Mazur, Koper, Żebracki, & Skorupska, 2017). Although we currently lack histological evidence to support the following assumptions, it is possible that the host colonization mechanism in seep-associated tubeworm symbionts is similar and involves invasion of preinfectious stages from the seep sediment through the tubeworm "root" system.
The L. barhami symbiont shows broad similarity in metabolic gene content to other tubeworm symbionts, having the potential to use both sulphide and hydrogen as energy sources for chemosynthesis as well as organic compounds for heterotrophic growth, including formate (as electron donor) and DMSO (as electron acceptor).
Previous studies indicated that vestimentiferan symbionts are metabolically highly flexible and might alternate between different carbon assimilation strategies depending on the availability of electron donors and acceptors as well as micronutrients (Thiel et al., 2012;Li et al., 2018;Reveillaud et al., 2018;Yang et al., 2019 (Hochstrasser & Doudna, 2015;Rath, Amlinger, Rath, & Lundgren, 2015). Perez & Juniper (2016) suggested that variation in CRISPR array sequences between symbiont species and populations is probably an adaptation to geographic or habitat-specific differences in

DATA AVA I L A B I L I T Y S TAT E M E N T
The final genome assembly including associated raw reads from