Novel nitrite reductase domain structure suggests a chimeric denitrification repertoire in the phylum Chloroflexi

Abstract Denitrification plays a central role in the global nitrogen cycle, reducing and removing nitrogen from marine and terrestrial ecosystems. The flux of nitrogen species through this pathway has a widespread impact, affecting ecological carrying capacity, agriculture, and climate. Nitrite reductase (Nir) and nitric oxide reductase (NOR) are the two central enzymes in this pathway. Here we present a previously unreported Nir domain architecture in members of phylum Chloroflexi. Phylogenetic analyses of protein domains within Nir indicate that an ancestral horizontal transfer and fusion event produced this chimeric domain architecture. We also identify an expanded genomic diversity of a rarely reported NOR subtype, eNOR. Together, these results suggest a greater diversity of denitrification enzyme arrangements exist than have been previously reported.

found in two distinct enzymes-the copper-based nitrite reductase NirK, and the cytochrome-type reductase NirS (Braker et al., 2000;Decleyre et al., 2016;Priemé et al., 2002). NirS reduces nitrite via cytochrome cd1, a dimer of subunits each containing heme c and heme d1; in canonical denitrification, as observed in Pseudomonas aeruginosa, cytochrome cd1 catalyzes the oxidation of a colocalized cytochrome c551 to reduce nitrite to NO at the heme d1 site (Philippot, 2002;Zumft, 1997). The nirS gene has been reported as a constituent of a larger gene cluster containing genes such as nirM and nirF, which encode biosynthetic proteins for the cytochrome c551 and heme d 1 , respectively, and the nitrite transporter nirC (Kawasaki et al., 1997;Philippot, 2002). The next step in the pathway-the reduction of NO to nitrous oxide-is catalyzed by nitric oxide reductases (NORs). Most bacterial NORs are homologous and closely related to one another, and to oxygen reductases in the heme-copper oxygen reductase superfamily (Hemp & Gennis, 2008). The most widely studied NOR enzymes are cytochrome-type nitric oxide reductases (cNOR) and quinol-dependent nitric oxide reductases (qNOR) (Graf et al., 2014;Hemp & Gennis, 2008;Hendriks et al., 2000), distinguished by their respective electron donors. Enzymes in the cNOR subfamily have two subunits-one catalytic site and one heme-containing electron shuttle that accepts electrons from cytochrome c-while qNOR family enzymes' single, fused subunit accepts electrons from membrane-bound quinol groups (Hemp & Gennis, 2008). Rarer, alternative NOR enzymes, including sNOR, gNOR, and eNOR, have been more recently identified and characterized in limited members of the Proteobacteria, Firmicutes, Archaea, and Chloroflexi (Hemp & Gennis, 2008;Hemp et al., 2015;Sievert et al., 2008;Stein et al., 2007). Like cNOR, these enzymes are predicted to have a two-subunit structure, but the second subunit in these NORs contains a cupredoxin instead of heme c fold (Hemp & Gennis, 2008).
Many bacteria contain genes encoding only one or a partial subset of the four denitrification steps. Such organisms may perform partial denitrification, while others may use one of these enzymes for nondenitrifying functions (Graf et al., 2014;Hendriks et al., 2000;Roco et al., 2017;Sanford et al., 2012). In partial denitrifiers, the cooccurrence of denitrification pathway genes appears to vary across different taxa and environments (Graf et al., 2014). Some of this variation may be constrained by the chemistry of certain intermediates. For example, nitric oxide (NO), the product of NirS and NirK, is highly cytotoxic. Both Nir types are periplasmic, and so cells require a means of effluxing or detoxifying NO before it accumulates to lethal levels. Denitrifiers are thought to immediately reduce NO to nitrous oxide (N 2 O) to avoid injury, using membrane-bound NOR enzymes (Hendriks et al., 2000). Perhaps, for this reason, it is rare to find genomes that contain nir but not nor, while organisms showing the inverse-the presence of a nor gene but not a nir gene-are far more common (Graf et al., 2014;Hendriks et al., 2000). While cNORs are only found in denitrifying microbes, other types of NOR-for example, quinol-dependent qNOR-are found in nondenitrifiers and can presumably detoxify environmental NO (Hendriks et al., 2000).
While denitrification has been most widely studied and observed in Proteobacteria, the process has also been identified in other phyla, including Chloroflexi. Chloroflexi are ecologically and physiologically diverse, and often key players in oxygen-, nutrient-, and light-limited environments, including anaerobic sludge and subsurface sediments (Hug et al., 2013;Ward, Hemp et al., 2018). Previous surveys have indicated the presence of diverse nitrite reductases in Chloroflexi; members of order Anaerolineales and classes Chloroflexia and Thermomicrobia may have the capacity for nitrite reduction via the copper-type NirK (Decleyre et al., 2016;Hug et al., 2013;Wei et al., 2015). However, recent studies indicate that certain Chloroflexi-including members of Anaerolineales-may possess nirS instead of nirK (Hemp et al., 2015;, and may also harbor a divergent variant of nor previously reported in members of Archaea (Hemp & Gennis, 2008;Hemp et al., 2015).
These findings suggest that the evolution and/or biochemistry of denitrification may be unusual for this subset of bacteria, and informative for a broader understanding of microbial denitrification metabolisms and their origin.

| Genome sampling and assembly
Collection of all fluid samples and total genomic DNA extractions from those fluids, as well as corresponding physical and geochemical data, have been described previously (Heard et al., 2017;Lau et al., 2014Lau et al., , 2016Magnabosco et al., 2016;Momper & Jungbluth, 2017;Momper & Kiel Reese, 2017;Osburn et al., 2014). All metagenome-assembled genomes (MAGs) from North America and Africa were reconstructed according to the methods used in Momper and Jungbluth (2017). MAG identifiers F I G U R E 1 Denitrification. Complete denitrification transforms nitrate into dinitrogen gas. Respiratory nitrate reductase gene (nar) and periplasmic nitrate reductase (nap) genes are distributed in non-denitrifying organisms. Nitrite reductase (Nir) is considered the canonical first enzyme of denitrification (Graf et al., 2014), followed by nitric oxide reductase (NOR) and nitrous oxide reductase (Nos) and sources are listed in Table A5. Completeness was calculated using the composite values from five widely accepted core essential gene metrics. Duplicate copies of any of these single-copy marker genes were interpreted as a measure of contamination (Alneberg et al., 2014;Campbell et al., 2013;Creevey et al., 2011;Dupont et al., 2012;Wu & Scott, 2012).
Individual genomes were then submitted for gene calling and annotations through the DOE Joint Genome Institute IMG-ER (Integrated Microbial Genomes expert review) pipeline (Huntemann et al., 2015;Markowitz et al., 2008). For quality control purposes, the genes flanking every denitrification gene presented in this study were individually searched on the National Center for Biotechnology Information's (NCBI) RefSeq database using the BLASTp algorithm, confirming that top hits for all flanking genes were also to Chloroflexi. This step ensured that the nitrogen transforming genes of interest presented here were not simply on scaffolds that were incorrectly binned into a putative Chloroflexi genome.

| Genetic database construction and sequence sampling
Sequences for nirS and eNOR genes from Sanford Underground Research Facility (SURF) MAG 42 (see Table A5) were used as queries to BLAST (Camacho et al., 2009) three genomic repositories: 1. Genome databases constructed for 21 Chloroflexi genomes assembled from deep-subsurface MAG data Momper & Jungbluth, 2017; Table A5).
Additionally, putative environmental homologs were evaluated using protein sequence data from SURF MAG 42 to query NCBI's nonredundant environmental metagenomic sequence database (env-nr, as of June 2020; Agarwala et al., 2018). Hits from all databases were combined and assessed for quality; hits with E ≤ 1 × 10 −10 were included for initial analyses. To capture diversity while limiting imprecision and biased sampling of overrepresented groups (e.g., Proteobacteria), hits were subsampled to the genus level, except for members of the Chloroflexi (to fully capture the taxonomic distribution of the novel gene variant).
One additional, divergent multispecies hit was allowed per genus.
The genus-level filter was also removed for C1, where non-Chloroflexi hits were severely limited (see below

| Sequence alignment
Putative homologous protein sequences were aligned with MAFFT, using autoparameterization (Nakamura et al., 2018), and visualized in Jalview (A. M. Waterhouse et al., 2009). Alignments were manually curated; partial sequences with substantial missing regions or anomalous insertions in conserved regions of the protein were removed to avoid confounding phylogenetic analyses and evolutionary model selection. Protein sequence alignments were trimmed to the length of individual domains identified by NCBI's Conserved Domains Database (CDD). Each domain was then realigned.

| eNOR
A preliminary alignment for the eNOR gene showed a poorly conserved region near the C-terminal end of the open reading frame (ORF); to improve accuracy and avoid misalignment, this region was manually removed, and the remaining sequences were realigned before tree construction. Two sequences (Actinobacteria bacterium RBG_16_68_12, OFW73639.1, and Thermus WP_015717644.1) with missing N-terminal regions and three sequences (Chloroflexi bacterium, RME47896.1; Rhodocyclaceae bacterium UTPRO2, OQY7467.1; and Rhodothermus profundi, WP_072715415.1) with missing C-terminal regions were included in the final alignment; the placement of these sequences is therefore based upon fewer alignment sites than other taxa. All retain key active site residues and show no clear evidence of long-branch attraction artifacts in the tree.

| C1
An initial alignment for the C1 domain showed a poorly conserved N-terminal region. To improve accuracy, this region was manually removed, and the remaining sequences realigned before tree construction.  site that plays a role in cofactor crosslinking (Hemp & Gennis, 2008;

| C1
Due to a paucity of initial hits (17 total genera), the genus-level filter was removed for all phyla to increase the resolution of the domain phylogeny. A preliminary tree ( Figure A8) was expanded to identify outgroup sequences by including hits with E ≤ 10 −4 . The resulting tree was rooted using minimal ancestor deviation (MAD) rooting (Tria et al., 2017).

| RESULTS
To investigate divergent denitrification genes in Chloroflexi, we performed a comprehensive analysis of denitrification homologs in over 100 recently sequenced Chloroflexi MAGs, as well as previously available genomes and metagenomes from the NCBI protein databases. Domains of interest were initially identified in SURF MAG 42, an Anaerolineales bacterium sampled in the SURF, a former gold mine in South Dakota (Momper & Jungbluth, 2017).  Table A1. Analyses of additional selected C1-C2-NirS ORF neighborhoods within other Chloroflexi reveal diverse genetic assemblages also dissimilar to the canonical Pseudomonas operon ( Figure A1, Table A2). Gene neighborhood analyses of the 20,000 base pair region surrounding nirS differ markedly between P. aeruginosa PAO1 (GenBank reference sequence NP_249210.1, top) and two Chloroflexi genomes containing the C1-C2-NirS gene: SURF MAG 42 (center) and Anaerolinea thermolimosa (bottom). While the P. aeruginosa nitrite reductase occurs as part of a larger nir operon, and in close proximity to nitric oxide reductase genes, the nirS ORF neighborhoods in SURF MAG 42 and A. thermolimosa do not appear to contain other denitrification-specific genes. Detailed descriptions of ORF/gene families and functions can be found in Table A1 The inferred phylogeny for C1 shows a much different evolutionary history than the other two domains ( Figure 5). In contrast to the domain trees for C2 or NirS, the C1 tree shows sequences from Nitrospirae and Nitrospinae grouping together within a large clade of Chloroflexi C1 domains. Additional Chloroflexi sequences group with a small number of more distantly related Proteobacteria. However, the placement and taxonomic representation of Proteobacteria in the C1 tree is different from that seen in the other domain trees.

| Apparent chimeric fusion in Chloroflexi NirS
The majority of ORFs represented in the C1 domain tree contain the C1 domain homolog either as a free cytochrome or as one of multiple cytochrome-type or cytochrome superfamily domains.
In rare or isolated cases, C1 homologs co-occur in ORFs with membrane or structural protein domains (Table A3). The ORF containing the C1 homolog was annotated as a NOR in several members of the Nitrospirae and one Geobacteraceae genome (Table A3); domain analysis of these genes yielded limited additional data, but representative sequences showed detectable sequence similarity to Pseudomonas norC genes.
The occurrence of C1 domain homologs within predicted nitrite reductase genes is restricted to the Chloroflexi. The majority of these C1-containing nir genes have the cytochrome-type NirS domain; however, a small number of Chloroflexi MAGs contain an ORF pairing the C1 cytochrome with the copper-type NirK domain instead. This NirK ORF also included an N-terminal cupredoxin/plastocyanin domain. As this fusion is only apparent within a small number of MAGs, which are identical across the length of the analyzed ORF, this may represent an assembly artifact. However, several of the nir genes that contained a C1 homolog and a cytochrome-type NirS

| C1-C2-NirS domain architecture is unique to Chloroflexi
Though putative homologs exist independently for the constituent C1 and C2-NirS regions, respectively, these hits reflect different cytochrome or cytochrome-type nitrite reductases (largely in Proteobacteria, Nitrospirae, and Nitrospinae). The full C1-C2-NirS architecture appears unique to Chloroflexi and is not observed in other groups. Querying NCBI's nonredundant environmental database F I G U R E 4 Phylogenetic trees for C2 and NirS domains. Phylogenetic analysis of C2 domain homologs (above) places the Chloroflexi within a diverse clade including Epsilonproteobacteria, Aquificae, Bacteroidetes, and Planctomycetes; this clade is sister to a broad radiation of Alpha-, Beta-, and Gammaproteobacteria. Analysis of NirS domain homologs (below) places the Chloroflexi within the clade dominated by Alpha-, Beta-, and Gammaproteobacteria, which also contains members of the Bacteroidetes and Aquificales. The overall taxonomic representation for the domains is similar. Support values for selected bipartitions are labeled (aLRT/bb). Support for other nodes is indicated with the following color scheme: Strong support with both values ≥90 (black); weak support with both values ≤50 (white); intermediate support with one or both values between 50 and 90 (gray); conflicting support, with one value ≤50 and the other ≥90 (gray) (env-nr) with the full ORF from SURF MAG 42 did not identify additional examples of the full gene construct. While several hits were identified that reflected putative homology to the joint C2-NirS domains, none of these included the C1 domain as well. An independent search of the env-nr database using the C1 domain as a query returned few overall hits. While some of these putative C1 homologs were identified in ORFs containing additional cytochrome-type enzyme superfamily domains or subunits, none co-occurred with NirS or NirK domains. These data suggest that there is little to no missing diversity of the Chloroflexi-type chimeric nitrite reductase in existing metagenomes.
Attempts to visualize the full enzyme structure using homology modeling (Bienert et al., 2017;A. Waterhouse et al., 2018) were unsuccessful; structural models were only able to predict a close match for the conserved C2-NirS region of the putative gene. Efforts to independently model the C1 structure could not recover predicted QMEAN scores above −4.50 (Benkert et al., 2011). The poor scores may reflect the relatively short length of the cytochrome coding region. However, the Chloroflexi nirS gene sequence does retain several conserved residues present in the crystal structure of P.

| Expansion of eNOR diversity
Notably, the majority of genomes with the unique C1-C2-NirS structure do not appear to contain a NOR gene (nor). Though the absence of the nor gene in genomes with nirS is not unprecedented, previous genomic surveys suggest it is relatively uncommon, and the toxicity of the product of Nir (NO) makes this absence counterintuitive (Graf et al., 2014;Hendriks et al., 2000). However, analysis of the SURF MAG 42 metagenome-the originally assembled genome in which the novel nirS ORF was observed-did reveal the presence of an unusual nor homolog. Previous studies have identified the established cNOR and qNOR family enzymes (which contain cytochrome c or quinols as electron donors, respectively) in Chloroflexi, as well as a broad distribution of Proteobacteria (Hemp & Gennis, 2008;Hendriks et al., 2000;Zumft, 2005).
However, the predicted NOR in SURF MAG 42 included an active site glutamine substitution characteristic of eNOR (Hemp & Gennis, 2008;Hemp et al., 2015; Table A4). SURF MAG 42 contains both proposed subunits (Hemp & Gennis, 2008) of eNOR; the diagnostic subfamily substitution is within the heme-copper cytochrome-containing subunit I, and the cupredoxin-containing subunit II is immediately upstream in the MAG ( Figure A7). eNOR has been previously described in Archaea (Hemp & Gennis, 2008) and at least one isolated Anaerolineales bacterium (Hemp et al., 2015). Although the C2 and NirS domains do not have identical evolutionary histories or distributions, the taxonomic representation of these groups is very similar, and the presence of the paired C2-NirS domains in cytochrome-type nitrite reductases appears broadly throughout the Proteobacteria. In contrast, the taxonomic distribution and phylogeny of the C1 domain tree are strikingly different than F I G U R E 5 Phylogenetic tree for C1 domain. A phylogenetic tree for the C1 domain-with no genus-level filter and inclusion of more distant hits (see Methods)-indicates a limited taxonomic distribution of the domain. The largest group of sequences in Chloroflexi places sister to domains found in Nitrospirae, Nitrospinae, and Deltaproteobacteria. Within this clade, the branch along which C1 is inferred to have fused into nitrite reductase genes in Chloroflexi is labeled. C1 homologs that co-occur in ORFs with nitrite reductase are indicated with magenta diamonds (NirS) or yellow diamonds (NirK). Support values for selected bipartitions are labeled (aLRT/bb). Support for other nodes is indicated with the following color scheme: Strong support with both values ≥90 (black); weak support with both values ≤50 (white); intermediate support with one or both values between 50 and 90 (gray); conflicting support, with one value ≤50 and the other ≥90 (gray). ORF, open reading frame that of the other domains in the nitrite reductase ORF. Combined with the apparent absence of a full C1-C2-NirS ORF in any taxonomic group other than Chloroflexi, these data suggest that the C1 cytochrome was likely incorporated into nirS in a gene fusion event within Chloroflexi, following HGT. As there is no evidence of the C2-NirS ORF in Chloroflexi without the fused C1 domain present, the fusion probably occurred very soon after the acquisition of the C2-NirS region and may be necessary for the function of the gene in Chloroflexi.
In P. aeruginosa, cytochrome c551-encoded by nirM-is the electron donor for the adjacent cytochrome cd1 nirS (Philippot, 2002;Zumft, 1997). Homology searches do not identify any regions within SURF MAG 42 with significant similarity to Pseudomonas nirM, but C1 is identified as a putative member of the cytochrome c551/552 family. It is, therefore, possible that C1, though divergent from Proteobacterial nirM, serves a similar redox role for the cytochrome cd1 now within the same ORF.
Interestingly, putative homologs of C1 cytochrome domains were found in some Chloroflexi genomes in ORFs containing nirK, not nirS ( Figures 5 and A6). Though NirS and NirK are functionally equivalent, the two enzymes do not show a shared evolutionary origin and are often-though not always-mutually exclusive among known denitrifier genomes (Graf et al., 2014;Jones et al., 2008).
Unlike the cytochrome-containing NirS, NirK is a copper-type enzyme. The co-occurrence of cytochrome c domains in ORFs with the copper-type nirK has been identified in rare instances in Proteobacteria, and noted as surprising, given the cupredoxin-like fold of the NirK enzyme (Bertini et al., 2006). Similarly surprising is the inverse relationship revealed in the C1 domain tree: Several Chloroflexi ORFs contain a cupredoxin or similar copper-containing domain N-terminal to the C1-C2-NirS architecture ( Figure A6, Table A3). The co-occurrence of C1 with both cytochrome-and copper-dependent Nir domains suggests a general evolutionary trend within Chloroflexi to incorporate this cytochrome into denitrification ORFs. This distribution pattern raises the possibility that the C1-type cytochrome may serve an important but generalized role in nitrite reductionregardless of the evolutionary history or genetic profile of the nitrite reduction domain itself.
The apparent absence of a nor homolog in the majority of genomes with the C1-nirS fusion is unexpected. Beyond providing downstream redox capacity, NOR provides an efficient means of reducing and detoxifying NO, the highly cytotoxic product of NirS.
It is not unprecedented for bacterial genomes to harbor a nir gene without a nor gene, particularly for organisms with nirK (Graf et al., 2014;Heylen et al., 2007). This nir-nor mismatch is much rarer for putative denitrifiers with nirS, representing fewer than 4% of genomes in a recent survey-but a small number of surveyed bacteria do, interestingly, appear to harbor nirS without also harboring cNOR or qNOR (Graf et al., 2014;Heylen et al., 2007). To our knowledge, however, eNOR has not been included in such analyses of the genomic correlation between nitrite reductases and NORs.
The phylogenetic evidence for diverse eNOR homologs suggests likely undocumented or underexplored diversity for divergent NORs.
The diversity and function of cNOR and qNOR are fairly wellestablished. However, divergent enzymes such as eNOR and sNOR are less extensively documented and may not be accurately distinguished from broader oxygen reductase superfamily members in genomic or metagenomic analyses.
Cytochrome c proteins function as electron transfer proteins in anaerobic respiration and are often fused to redox enzymes to allow electron passage (Bertini et al., 2006). It is not surprising, therefore, to find cytochrome c-containing subunits in frame with nitrite reductase. NirS itself is cytochrome-dependent (Bertini et al., 2006).  (Tables A1 and A2). However, without expanded homology analyses and experimental validation, it is impossible to infer whether apparent similarities or differences reflect biologicallymeaningful patterns of inheritance or function, or are simply artifactual.
A divergent or generalized role is also possible for the eNOR homologs. Models indicate that eNOR (unlike related NORs cNOR, qNOR, sNOR, and gNOR) has a proton channel, and therefore the capacity for proton pumping (Hemp & Gennis, 2008); therefore, this gene product may serve a key electrogenic role, whether reducing NO or alternative substrates. Experimental validation would be necessary to determine if the novel Chloroflexi-associated NirS performs differently than canonical NirS in vivo, and if NirS and eNOR perform targeted denitrification or nonspecific detoxification or proton pumping. This study, therefore, suggests a promising direction for future investigations. The divergent denitrification enzymes described above may or may not reflect different metabolic strategies in situ. But the identification of both a novel nirS ORF and an expanded diversity of eNOR enzymes suggests that the existing understanding of denitrification may underestimate the genetic diversity and ecological distribution of constituent enzymes. This may be especially true in deep subsurface biomes, such as those from which several Chloroflexi analyzed in this study were isolated. These systems have garnered increasing attention in recent years; extensive evidence supports the existence of dynamic, diverse microbial subsurface ecosystems with the metabolic potential to influence global biogeochemical cycles (Hug et al., 2013;Magnabosco et al., 2018;Momper & Jungbluth, 2017;Osburn et al., 2014Osburn et al., , 2019. Chloroflexi are frequently cited as well-represented members of deep sediment and aquifer systems, where they play key roles in carbon cycling dynamics (Hug et al., 2013;Kadnikov et al., 2020;Momper & Jungbluth, 2017;Momper & Kiel Reese, 2017). But Chloroflexi are known to also harbor diverse nitrogen metabolisms (Denef et al., 2016;Hemp et al., 2015;Spieck et al., 2020), and previous studies have linked subsurface Chloroflexi to denitrification pathway genes such as nitrous oxide reductase (nos) (Hug et al., 2016;Momper & Jungbluth, 2017;Sanford et al., 2012). The role of Chloroflexi in subsurface nitrogen cycling-as well as the scope of subsurface microbial nitrogen dynamics at large-requires further investigation.

ACKNOWLEDGMENTS
The authors thank Dr. Ranjani Murali for advice on domain and residue identification and analysis. This study was supported by a National Defense Science and Engineering Graduate Fellowship to Sarah L. Schwartz, and Simons Collaboration on Origins of Life award #339603 to Gregory P. Fournier.

CONFLICT OF INTEREST
None declared.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are openly available