High‐quality draft genome sequence of Gaiella occulta isolated from a 150 meter deep mineral water borehole and comparison with the genome sequences of other deep‐branching lineages of the phylum Actinobacteria

Abstract Gaiella occulta strain F2‐233T (=CECT 7815 = LMG 26412), isolated from a 150 meter deep mineral water aquifer, was deemed a candidate for high‐quality draft genome sequencing because of the rare environment from which it was isolated. The draft genome sequence (QQZY00000000) of strain F2‐233T is composed of approximately 3 Mb, predicted 3,119 protein‐coding genes of which 2,545 were assigned putative functions. Genome analysis was done by comparison with the other deep‐branching Actinobacteria neighbors Rubrobacter radiotolerans, Solirubrobacter soli and Thermoleophilum album. The genes for the tricarboxylic acid cycle, gluconeogenesis and pentose phosphate pathway, were identified in G. occulta, R. radiotolerans, S. soli and T. album genomes. Genes of the Embden–Meyerhof–Parnas pathway and nitrate reduction were identified in G. occulta, R. radiotolerans and S. soli, but not in the T. album genome. Alkane degradation is precluded by genome analysis in G. occulta. Genes involved in myo‐inositol metabolism were found in both S. soli and G. occulta genomes. A Calvin–Benson–Bassham (CBB) cycle with a type I RuBisCO was identified in G. occulta genome, as well. However, experimental growth under several conditions was negative and CO2 fixation could not be proven in G. occulta.

counts are also monitored to record alterations in the number of colony-forming units (CFUs).
The majority of studies on the microbial diversity of bottled water have been performed on still natural mineral waters using culture-dependent approaches (Guillot & Leclerc, 1993;Morais & da Costa, 1990;Vachée, Mossel, & Leclerc, 1997). Microbial abundances estimated by CFUs indicate that heterotrophic bacteria number is low at source (around 10 CFU/ml) but increases to about 10 4 -10 5 CFU/ml during storage at room temperature (Croville, Cantet, & Saby, 2011;Morais & da Costa, 1990;Warburton, 1993). More recently culture-dependent and culture-independent techniques were used together to determine the microbial diversity and abundances present at source of one mineral water, to assess microbial stability of the source over a 1 year period, and to examine the microbial dynamics after bottling throughout 6 months of storage of the mineral water in factory produced plastic bottles (França, Lopéz-Lopéz, Rosselló-Móra, & da Costa, 2015). In all cases, communities were largely dominated by Bacteria affiliated with the Alphaproteobacteria, Betaproteobacteria, and Gammaproteobacteria. Several isolates representing new species were also characterized and described from the aquifer and the bottled mineral water (Albuquerque et al., 2011;França, Albuquerque, & da Costa, 2015;Franca, Albuquerque, Sanchez, Farelaira, & da Costa, 2017;Leandro, França, Nobre, Rainey, & da Costa, 2013;Leandro et al., 2012).
Among them, Gaiella occulta is the sole representative of the family Gaiellaceae of the order Gaiellales within the deep-branching lineages of the phylum Actinobacteria and was deemed a candidate for high-quality genome sequencing given the rare environment from which it was isolated. The phylum Actinobacteria comprises several deeply branching lineages that consist of species of the orders Rubrobacterales, Solirubrobacterales, Thermoleophilales, and Gaiellales (Foesel, Geppert, Rohde, & Overmann, 2016). The order Gaiellales and the family Gaiellaceae comprises only the species G. occulta strain F2-233 T (Albuquerque et al., 2011).

| Organism information
Gaiella occulta is a nonpigmented, Gram-negative staining, nonmotile, aerobic, and chemoorganotrophic, with an optimal growth temperature of 35-37°C, optimum pH for growth between 6.5 and 7.5, and was isolated from a 150 meter deep water aquifer in Portugal (Albuquerque et al., 2011;Foesel et al., 2016). The strain forms short rod-shaped cells 1.0-3.0 µm in length by 0.3-0.5 µm in width (by transmission elctro microscopy) (Figure 1). Strain F2-233 T can assimilate carbohydrates, organic acids, and amino acids. Nitrate is reduced to nitrite; long chain n-alkanes are not used as carbon and energy source.

| Growth conditions and genomic DNA preparation
Strain F2-233 T was grown in 1 L Erlenmeyer flasks containing 300 ml of R2A medium (http://www.dsmz.de/microorganisms/medium/pdf/DSMZ_Medium830.pdf) at 37°C in a rotary water bath shaker until late exponential phase of growth for DNA extraction.
To ascertain CO 2 fixation and the enzymatic activity of RuBisCo, G. occulta was grown under aerobic and anaerobic conditions in sealed 50 ml serum ampules at 37°C containing 40 ml of a minimal medium previously described, supplemented with 0.02% yeast extract (Hahnke, Moosmann, Erb, & Strous, 2014). The medium contained (per liter) 0.5 g (NH 4 ) 2 SO 4 , 0.5 g MgSO 4 .7H 2 O, 0.1 g CaCl 2 .2H 2 O, 6 g HEPES, 0.12 g K 2 HPO 4 , 0.04 g KH 2 PO 4 , and 1 ml  as electron acceptor, approximately 3 ml H 2 was also added to anaerobic and aerobic ampules after sterilization and prior to inoculation. Cell density was measured at 610 nm.
Total genomic DNA was extracted following the method of Nielsen et al. (Nielsen, Fritze, & Priest, 1995). Briefly, cells were lysed with a solution of lysozyme, guanidium thiocyanate, and sodium

| Genome sequencing and assembly
Genomic DNA was prepared with the Nextera XT DNA Library

| Genome properties
The F2-233 T strain DNA sequence run generated 5,261,564 pairedend reads of which 3,362,091 high-quality reads remained after quality filtering. The average read length was of 197 bp. The de novo read assembly produced 34 contigs with an N50 size of 401,372 bp.
The high-quality draft assembled genome sequence consisted of 3,028,529 bp, with a sequencing depth of coverage of 520-fold and a DNA G + C content of 71,65% (

| Genome annotation
The draft genome comprised 2,545 genes with putative functions (~82% of total protein-coding genes) and 1,718 genes allocated to the Clusters of orthologous grups (COG) functional categories (55% of total protein-coding genes). The most abundant COG category was "Amino acid transport and metabolism" followed by "General function prediction only" and "Energy production and conversion" (Table 2). Gaiella occulta and S. soli utilize myo-inositol as single carbon and energy source for growth (Albuquerque et al., 2011;Kim et al., 2007). Several genes coding for enzymes involved in the metabolism of myo-inositol, namely iolABCDEG, were identified in G.

| Central metabolism
occulta and S. soli genomes (Yoshida et al., 2008). However, the gene iolJ that codes for 6-phospho-5-dehydro-2-deoxy-D-glu- Both G. occulta, S. soli, and T. album succinate dehydrogenase complex seems to lack the gene sdhD, which codes for the anchor subunit D, although the absence of this gene is not unusual (Horsefield, Iwata, & Byrne, 2004). In the genome of R. radiotolerans, we could not identify homologs for the following genes, NADH dehydrogenase nuoE, nuoF and nuoG, succinate dehydrogenase sdhC, and cytochrome bc1 cytochrome-c subunit. The ATPase of the type strains examined here are of the common bacterial F-type.
Genes coding for the uptake and reduction of nitrate were identified in the genomes of G. occulta, R. radiotolerans, and S. soli, namely the MFS-type nitrate/nitrite transporter (narK/nasA, Gocc_2854) and the respiratory narGHIJ nitrate reductase complex (EC 1.7.5.1, Gocc_2855 to Gocc_2859), although the reduction of nitrate to nitrite was not observed in S. soli (Kim et al., 2007). Genes coding for the two-component system transduction pathway NarX/NarL identified in G. occulta (Gocc_1932 and Gocc_1933) were not identified in S. soli genome, but nitrate reductase expression does not seem to be altered by the absence of narX (Laub & Goulian, 2007;Sohaskey & Wayne, 2003). The sox genes, as well as other genes involved in sulfite oxidation/sulfate reduction, namely adenylylsulfate reductase (EC 1.8.99.2), dissimilatory sulfite reductase (EC 1.8.99.5) or sulfite dehydrogenase (cytochrome) (EC 1.8.2.1), were not identified in the G. occulta genome sequence or the other deeply branching Actinobacteria whose genome has been sequenced, precluding the utilization of reduced sulfur compounds as electron donors.

| Stress response
Gaiella occulta genome sequence has the key enzymes for the main DNA repair mechanisms, except for the mismatch repair pathway.
Genes mutS and mutL were not encountered, as in many Actinobacteria and Archaea (Castañeda-García et al., 2017). A gene coding for the endonuclease NucS (Gocc_1770) was identified, suggesting G. occulta may use the noncanonical mismatch repair pathway described recently for Mycobacterium smegmatis and Streptomyces coelicolor (Castañeda-García et al., 2017). Thermoleophilum album and S. soli may also use this alternative mismatch repair pathway as they also lack homologs of mutS and mutL and have a nucS homolog.
Rubrobacter xylanophilus and R. radiotolerans accumulate the compatible solutes mannosylglycerate, trehalose, and low levels of di-myo-inositol-phosphate generally involved in osmotic adapation in (hyper)thermophilic organisms that in the Rubrobacter spp. are constitutively accumulated (Empadinhas et al., 2007;Nobre, Alarico, Fernandes, Empadinhas, & da Costa, 2008). Genes coding for enzymes involved in the synthesis of mannosylglycerate or di-myo-inositol-phosphate were not identified in G. occulta. In R. radiotolerans, trehalose synthesis can proceed via four pathways namely TpS/TpP, TreS, TreT, and TreY/TreZ (Egas et al., 2014;Nobre et al., 2008  The type strain of T.album has genes that predict the hydrolysis of n-alkanes and is known to be able to use only these substrates for growth. S. soli also possesses genes for the degradation of n-alkanes, but growth on these substrates were not examined in this organism.

| CON CLUS IONS
Gaiella occulta does not possess the homologs found in T. album and does not grow on n-alkanes. The results obtained in this study indicate that all organisms appear to be strict chemoorganotrophs and, for the most part, corroborate the phenotypes of these strains.

ACK N OWLED G EM ENTS
We would like to thank Fred Rainey, University of Alaska Anchorage USA, for the phylogenetic analysis. This work was supported by Competitiveness and Internationalisation (POCI), Lisboa, Portugal Regional Operational Programme (Lisboa2020), Algarve Portugal Regional Operational Programme (CRESC Algarve2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF), and by Fundação para a Ciência e a Tecnologia (FCT).

CO N FLI C T O F I NTE R E S T S
None declared.

E TH I C S S TATEM ENT
None required.

DATA ACCE SS I B I LIT Y
The genome sequence of G. occulta F2-233 T is publicly available in the SRA under the accession number SRR7537062 and the genome assembly under the accession number QQZY00000000.