Reconstructed ancient nitrogenases suggest Mo-specific ancestry

The nitrogenase metalloenzyme family, essential for supplying fixed nitrogen to the biosphere, is one of life’s key biogeochemical innovations. The three isozymes of nitrogenase differ in their metal dependence, each binding either a FeMo-, FeV-, or FeFe-cofactor for the reduction of nitrogen. The history of nitrogenase metal dependence has been of particular interest due to the possible implication that ancient marine metal availabilities have significantly constrained nitrogenase evolution over geologic time. Here, we combine phylogenetics and ancestral sequence reconstruction — a method by which inferred, historical protein sequence information can be linked to functional molecular properties — to reconstruct the metal dependence of ancient nitrogenases. Inferred ancestral nitrogenase sequences at the deepest nodes of the phylogeny suggest that ancient nitrogenases were Mo-dependent. We find that active-site sequence identity can reliably distinguish extant Mo-nitrogenases from V- and Fe-nitrogenases, as opposed to modeled active-site structural features that cannot be used to reliably classify nitrogenases of unknown metal dependence. Taxa represented by early-branching nitrogenase lineages lack one or more biosynthetic nifE and nifN genes that are necessary for assembly of the FeMo-cofactor, suggesting that early Mo-dependent nitrogenases may have utilized an alternate pathway for Mo- usage predating the FeMo-cofactor. Our results underscore the profound impacts that protein-level innovations likely had on shaping global biogeochemical cycles throughout Precambrian, in contrast to organism-level innovations which characterize Phanerozoic eon.

suggest that ancient nitrogenases were Mo-dependent. We find that active-site sequence identity 25 can reliably distinguish extant Mo-nitrogenases from V-and Fe-nitrogenases, as opposed to 26 modeled active-site structural features that cannot be used to reliably classify nitrogenases of 27 unknown metal dependence. Taxa represented by early-branching nitrogenase lineages lack one 28 or more biosynthetic nifE and nifN genes that are necessary for assembly of the 29 suggesting that early Mo-dependent nitrogenases may have utilized an alternate pathway for  usage predating the FeMo-cofactor. Our results underscore the profound impacts that protein-level 31 innovations likely had on shaping global biogeochemical cycles throughout Precambrian, in 32 contrast to organism-level innovations which characterize Phanerozoic eon. 33 34 KEYWORDS 35 ancestral sequence reconstruction, cofactor pocket, metalloenzyme, nitrogenase, nitrogen fixation 36

INTRODUCTION 38
All known life requires nitrogen for the synthesis of essential biomolecules, including nucleotides 39 and amino acids. Though the atmosphere contains nearly 80% N2 by volume, most organisms are 40 not able to assimilate N2 due to the enormous energetic cost of breaking the N≡N bond (MacKay 41 & Fryzuk, 2004). Select bacteria and archaea called diazotrophs accomplish biological nitrogen 42 fixation by nitrogenase metalloenzymes (E.C. 1.18.6.1), which catalyze the reduction of N2 to 43 bioavailable NH3. Nitrogenases are an ancient family of enzymes; oldest isotopic biosignatures 44 interpreted as evidence of nitrogenase activity date to ~3.2 billion years (Ga) (Stueken, Buick, 45 Guy, & Koehler, 2015). Because nitrogen has been suggested to be an important limiting nutrient 46 on geologic timescales (Falkowski, 1997), nitrogenases have likely played a key role in the 47 expansion of the biosphere since the Archean. 48

144
Here, we explored the indicators of nitrogenase metal usage history by a combinatorial method 145 relying on ancestral sequence reconstruction, a method by which inferred, historical protein 146 sequence information can be linked to functional molecular properties evidenced by computed 147 structures or laboratory experiments (Aadland, Pugh, & Kolaczkowski, 2019;Benner, Sassi, & 148 Gaucher, 2007;Thornton, 2004). These paleogenetic approaches have been increasingly applied 149 in biogeochemically relevant molecular studies to offer insights into the coevolution of life and 150 Earth (Garcia & Kacar, 2019;Gomez-Fernandez et al., 2018;Kacar, Hanson-Smith, Adam, & 151 Boekelheide, 2017;Shih et al., 2016). We reconstructed the phylogenetic history of Mo-, V-, and 152 Fe-nitrogenases in order to resurrect ancestral nitrogenases in silico, as well as to map the 153 taxonomic distribution of cofactor biosynthetic components considered necessary for Mo-154 dependence (Curatti et al., 2007;Shah et al., 1994;Shah et al., 1986;St John et al., 1975;Tal et 155 al., 1991). By this combined approach, we find phylogenetic and ancestral sequence features 156 suggestive of Mo-dependence, potentially by an alternate pathway predating the origin of the 157 FeMo-cofactor. We speculate that this unknown and possibly transient pathway may today be 158 present in basal nitrogenase lineages. Integration of protein evolution and paleobiology is a unique 159 melding of disparate data sets and may allow construct-and-build hypotheses that address 160 interactions ranging from the external environment to the cellular environment, and from the 161 cellular environment to the that maintained around the interacting protein. Avin_01380, NifD: Avin_01390, NifK: Avin_01400) and an Expect value cutoff of <1e-5. The 175 dataset was then manually curated to remove partial and distantly related sequences. Additional 176 nitrogenase sequences were manually retrieved from the Joint Genome Institute Integrated 177 Microbial Genomes and Microbiomes database, accessed September 2018 (Chen et al., 2019). The 178 nitrogenase sequence dataset was finalized to include NifHDK sequences from 256 taxa, AnfHDK 179 sequences from 14 taxa, VnfHDK sequences from 14 taxa, and outgroup light-independent 180 protochlorophyllide oxidoreductase (Bch/ChlLNB) sequences -sharing distant homology with 181 nitrogenases (Boyd, Anbar, et al., 2011;Y. Hu & Ribbe, 2015;Raymond et al., 2004) -from 10 182 taxa (Appendix S1; additional analyses were performed with an expanded outgroup, Appendix 183 S2). Only one Nif/Anf/VnfHDK sequence set was retained per genus to broaden taxonomic 184 sampling. Equal sequence sampling for Anf and Vnf was made to remove the potential for 185 oversampling bias in ancestral sequence inference. H-, D-, and K-subunit sequences corresponding 186 to each taxon were manually checked for synteny of their encoding genes. AnfHDK and VnfHDK 187 sequences were identified by the proximity of each gene locus to anfG or vnfG, which encodes the 188 additional G-subunit present in the VeFe or FeFe protein, respectively, but not present in the MoFe 189 protein (Eady, 1996). Finally, the presence of the cofactor biosynthetic nifBEN genes was 190 investigated for all taxa represented in our dataset by BLASTp, as well as by manually inspecting 191 the nif genome region. 192 193 Reconstruction of ancestral nitrogenase sequences was performed by PhyloBot  Johnson, 2016) (www.phylobot.com), which automates multiple sequence alignment, 195 phylogenetic reconstruction, and ancestral sequence inference methods. The concatenated  sequence dataset of Nif/Anf/VnfHDK homologs (including 10 Bch/ChlLNB outgroup sequences) 197 was aligned by MSAProbs v0.9 5r1 (Liu, Schmidt, &Maskell, 2010) andMUSCLE v3.8.31 198 (Edgar, 2004). Both alignment outputs were then used to perform phylogenetic reconstruction by 199 RAxML v8.1.15 (Stamatakis, 2014) under 6 different combinations of amino acid substitution and 200 rate heterogeneity models. Branch support was evaluated by the approximate likelihood ratio test 201 (aLRT) (Anisimova & Gascuel, 2006), which assesses the gain in overall likelihood against a null 202 hypothesis of branch length = 0. Additional phylogenetic reconstructions with an expanded 203 outgroup were performed outside of Phylobot to resolve root positioning, but were not used in 204 subsequent ancestral sequence inference (Appendix S2). Ancestral sequences were inferred by 205 joint maximum likelihood using CODEML v4.2 (Z. Yang, 2007) at all nodes within the 12 206 Phylobot-constructed phylogenies (Tree-1 -Tree-12), with gaps inferred by parsimony. To assess 207 ancestral sequence robustness to phylogenetic uncertainty (Hanson-Smith, Kolaczkowski, & 208 Thornton, 2010), ancestors inferred from the top five phylogenies ranked by log likelihood scores 209 were selected for further analysis (Table 1, Appendix S2). Finally, to evaluate the effects of 210 ambiguously reconstructed sites on subsequent structural analyses, Bayesian sampled ancestors 211 were inferred from the maximum likelihood site posterior probabilities calculated by CODEML 212 (Aadland et al., 2019). 100 random Bayesian sequence were generated for each of five ancestral 213 nodes of interest across the top five phylogenies. Thus, 25 maximum likelihood and 2,500 214 Bayesian-sampled ancestral sequences were analyzed in total. All maximum likelihood 215 reconstructed trees and ancestral sequences are available for view and download at 216 http://phylobot.com/613282215/. 217 References: (Le & Gascuel, 2008;Quang, Gascuel, & Lartillot, 2008;Whelan & Goldman, 2001;Z. Yang, 1993) 225

Structural homology modeling of extant and ancestral nitrogenase D-subunits 227
Structural homology modeling of 33 extant and ancestral (25 maximum likelihood and 2,500 228 Bayesian-sampled) nitrogenase D-subunit proteins was performed by Modeller v9.2 (Sali & 229 Blundell, 1993). Extant nitrogenase sequences, broadly sampled from the reconstructed 230 nitrogenase phylogeny, were modeled to provide comparisons with ancestral models. D-subunit 231 sequences from extant and ancestral nitrogenases were aligned to 38 NifD and 2 VnfD structural 232 templates retrieved from the Protein Data Bank (Berman et al., 2000), accessed November 2018 233 (Appendix S5; published AnfD models not available at time of analysis). Information from all 40 234 templates was used to model each structure. All models were generated by specifying the inclusion 235 of the FeMo-cofactor of the 3U7Q NifD structure (Spatzal et al., 2011), selected as the highest 236 resolution Mo-nitrogenase template. To assess the effect of the template cofactor type on the 237 generated structure, additional models were constructed by specifying the inclusion of the FeV- Protein Energy scores, as previously described (Aadland et al., 2019). The ten best modeling 242 replicates per extant sequence, ten best replicates per maximum likelihood ancestral sequence, and 243 the single best replicate per Bayesian-sampled variant sequence were selected for further analysis, 244 totaling 3,080 models with the FeMo-cofactor specified. 245 246

Active-site pocket volume calculation of extant and ancestral D-subunit models 247
Volumes of the modeled ancestral and extant D-subunit active-site cofactor pockets were 248 calculated by POVME v2.0 (Durrant, Votapka, Sorensen, & Amaro, 2014). Spatial coordinates 249 and the inclusion region for volume calculation were specified manually. Pocket volumes were 250 calculated with a grid spacing of 0.5 Å and a 1.09 Å distance cutoff from any receptor atom's van 251 der Waals radius. Volume outside of the modeled convex hull of the cofactor pocket as well as 252 noncontiguous volume were removed. Statistical analysis of ancestral and extant pocket volume 253 data was performed in R (R Core Team, 2014). 254 255 3 RESULTS 256

High statistical support for ancestral nitrogenase active-site residues 348
We inferred ancestral sequences for each of the H-, D-, and K-subunits that constitute the 349 nitrogenase enzyme complex (Figure 1) across five phylogenetic topologies (Tree-1-5; Appendix 350 S2). Ancestral nitrogenase sequences were inferred for five well-supported internal nodes along a 351 phylogenetic transect between Mo-(highlighted in blue) and V-/Fe-nitrogenases (highlighted in 352 red) (Figure 2) sequences are hereafter labeled with the tree likelihood rank from which they were inferred (e.g., 360 AncA from Tree-1 is labeled AncA-1). All tree and ancestral sequence information can be found 361 at http://phylobot.com/613282215/. 362 363 Mean site posterior probabilities for ancestral nitrogenase HDK sequences across all phylogenies 364 range between ~0.83 and 0.91, and for the highest-likelihood phylogeny (Tree-1), between ~0.84 365 and 0.90 (Appendix S3). Ancestral sequence support generally decreases with increasing 366 phylogenetic node age. For example, within the uncharacterized/V-/Fe-nitrogenase linage, AncA-367 1 has the highest mean posterior probability (0.90 ± 0.18) and AncD-1 has the lowest mean 368 posterior probability (0.84 ± 0.22). Mean ancestral sequence probability for each node also does 369 not deviate by more than ~0.02 across each of the five phylogenetic topologies (Tree-1-5; 370 Appendix S3). These observations suggest that sequence support for ancestral nitrogenases is more 371 sensitive to ancestral node position than to topological differences between the analyzed trees. 372

373
In addition to surveying total ancestral HDK sequence support, we analyzed support for 30 active-374 site residues, defined as those residing within 5 Å of any atom in either the FeMo-cofactor of the 375 A. vinelandii NifD protein (PDB 3U7Q (Spatzal et al., 2011)) or the FeV-cofactor of the A.  386 His-442. These conserved residues are thus reconstructed in all ancestral nitrogenases 387 unambiguously (site posterior probability = 1.00). Statistical support for ancestral active-site 388 residues (greater than 0.92) underpins subsequent analyses of ancestral active-site properties that 389 may inform inferences of nitrogenase metal dependence. 390

Oldest ancestral nitrogenase active-site sequences resemble those of Mo-nitrogenases 455
We analyzed sequence features of both ancestral and extant nitrogenases to identify those 456 correlated with metal dependence. In particular, we focused on nitrogenase active-site sequences 457 for three reasons: (1) active-site residues are known to affect catalytic efficiency and substrate 458 specificity (Brigle et al., 1987;Christiansen, Cash, Seefeldt, & Dean, 2000;Fixen et al., 2016;459 Kim, Newton, & Dean, 1995;Sarma et al., 2010 We first assessed the sensitivity of ancestral sequence variation to phylogenetic uncertainty and 466 ancestral statistical support. Overall, mean identities for ancestral sequences compared across 467 different nodes range from ~55 to 90%. Ancestral sequences inferred from the same node across 468 alternate phylogenies (Tree-1-5) have relatively high mean identities, ranging from ~93 to 95% 469 across the total HDK sequence and from ~96 to 100% within the active site. These high mean 470 identities suggest that topological differences among the alternate phylogenies used for ancestral 471 sequence inference do not contribute to a high degree of ancestral sequence variation. Identities 472 among sequences inferred from the same node also do not appear to be correlated with statistical 473 support. For example, though full AncA HDK sequences are reconstructed with the highest mean 474 statistical support (~0.89-0.91), they exhibit neither the lowest nor the highest mean identities as 475 compared with sequences inferred from other nodes (Appendix S3). 476 477 We next identified specific residues within the nitrogenase active site that are unique to particular 478 isozymes of known metal dependence in order to survey their occurrence in ancestral sequences. 479 We found that three active-site residues are unique to Mo-nitrogenases, six are unique to V-480 nitrogenases, five are unique to Fe-nitrogenases, and six are unique to V-and Fe-nitrogenases 481 (Appendix S4). Surprisingly, most nitrogenase ancestors exhibit comparable numbers of residues 482 unique to either V-/Fe-nitrogenases or Mo-nitrogenases, and thus their occurrence does not appear 483 informative for inferring ancestral metal dependence (e.g., ancestral AncC sequences contain two 484 residues unique to V-/Fe-nitrogenases and two residues unique to Mo-nitrogenases). An exception 485 is AncA sequences, which contain a preponderance of active-site residues that are unique to extant 486 V-and Fe-nitrogenases. One of the residues unique to V-nitrogenases, Thr-355, has recently been 487 suggested to interact directly with a proposed carbonate ligand only present in the FeV-cofactor 488 (D. Sippel & Einsle, 2017). This carbonate ligand lies within a protein loop that also contains 490 Pro-358, and Leu-360 residues are observed across all AncA sequences, reflecting specific 491 sequence resemblances that may be associated with V-dependence. 492

493
In addition to analyzing specific active-site residues, we compared the total active-site sequence 494 composition of 25 ancestral nitrogenases (inferred across Tree-1-5) and all 284 extant nitrogenase 495 sequences used for phylogenetic reconstruction. Our analysis shows clear distinctions between the 496 active-site compositions of extant Mo-versus V-/Fe-nitrogenases (Figure 6a). Specifically, the 497 mean identity between Vnf/Anf and Nif-I/Nif-II is ~50%, as compared with the mean identity 498 between Vnf and Anf (~71%) and between Nif-I and Nif-II (~77%). The difference between 499 Vnf/Anf and Nif-I/Nif-II sequences is not seen as distinctly when comparing total HDK sequences 500 (Figure 6b), suggesting that the active-site sequence differences in particular may be better 501 indicators of metal dependence than whole sequence differences due to greater catalytic tuning 502 toward the cofactor. 503 with V-/Fe-only-nitrogenases (Figure 6a). Mean active-site sequence identities between AncB-518 AncE and Anf/Vnf nitrogenases range between ~50 and 63%, whereas those between AncB-AncE 519 and Nif-I/Nif-II nitrogenases range between ~69 and 87%. An exception is AncA (ancestral to Vnf and Anf), which has higher mean identity to Anf/Vnf nitrogenases (~85%) than to Nif-I/Nif-II 521 nitrogenases (~51%). Because active-site sequence identity can reliably differentiate extant Mo-522 from V-/Fe-nitrogenases, the resemblance of most ancestral active sites to those of Mo-523 nitrogenases is suggestive of Mo-dependence. 524 525

545
We used both phylogenetic reconstruction and ancestral sequence inference to explore these 546 outstanding questions of early nitrogenase evolution and metal dependence. Though the tree 547 reconstructions presented here are largely in congruence with previous phylogenetic analyses, 548 certain topological differences, particularly with regard to basal uncharacterized nitrogenases 549 lacking associated nifE and/or nifN genes, suggest important deviations from previous narratives 550 of early metal dependence (Boyd, Anbar, et al., 2011;Boyd & Peters, 551 2013;Raymond et al., 2004). The reconstruction of ancestral nitrogenase sequences in silico 552 provides the means to directly infer ancient metal-binding properties from molecular information. 553 554

Inferred metal dependence of ancestral nitrogenases 555
The active-site protein environment is known, in addition to the metal content of the cofactor, to 556 contribute to the variable catalytic properties of different nitrogenase isozymes (Brigle et al., 1987;557 Christiansen et al., 2000;Fixen et al., 2016;Kim et al., 1995;Sarma et al., 2010;Z. Y. Yang et al., 558 2012). We therefore explored active-site features of ancestral nitrogenases for correlations with 559 metal dependence. 560 561 We find that modeled active-site structural features are not informative for the inference of 562 ancestral metal dependence. Pocket volumes of modeled extant nitrogenases do not appear to be 563 strongly correlated with metal cofactor content. Problematically, the volume ranges of oldest 564 ancestors (as well as those of uncharacterized nitrogenases) overlap with both V-and Mo-565 nitrogenases (Figure 5). We acknowledge that these homology modeling results may not precisely 566 reflect true biological differences (for which a comprehensive analysis is not yet possible due the 567 limited availability of V-and Fe-nitrogenase structures). This is illustrated by the difference 568 between pocket volumes of published structures and those of homology models (e.g., A. vinelandii 569 PDB 3U7Q NifD structure pocket volume ≈ 994.750 and mean A. vinelandii NifD modeled 570 structure pocket volume ≈ 1186.412). However, it is not surprising that active-site pocket volume 571 does not predict metal dependence, given the probable similar sizes and structures of the FeMo-, 572 FeV-, and FeFe-cofactors (Eady, 1996;Krahn et al., 2002;Daniel Sippel et al., 2018;Spatzal et 573 al., 2011). These findings contrast with a previous study by McGlynn and coworkers (2012) in 574 which pocket volume comparisons were used to classify the Mo-dependence of uncharacterized 575 nitrogenases. In this previous study, the volume means for Mo-nitrogenases were sufficiently 576 distinct from V-and Fe-nitrogenases as to provide unambiguous classification of Mo-dependence. 577 Our analyses differ from this previous study in several ways: (1) we incorporated V-nitrogenase 578 structural templates (D. Sippel & Einsle, 2017;Daniel Sippel et al., 2018) for homology modeling, 579 not available at the time of the previous study (2) we modeled 20 extant sequences of known metal 580 dependence rather than 12 (3) we used the same explicit parameters for both homology modeling 581 and pocket volume calculation across all sequences (4) we included modeling replicates for pocket 582 volume calculation. Thus, an expanded modeling analysis appears to reduce the efficacy of the 583 pocket volume parameter for inference of ancestral (and uncharacterized) nitrogenase metal 584 dependence. 585 586 Unlike structure-based analyses, we find that active-site sequence features reliably differentiate 587 Mo-nitrogenases and V-/Fe-nitrogenases. Regarding AncA (ancestral to V-and Fe-nitrogenases), 588 we identified specific active-site residues that have previously been suggested to interact with a 589 proposed carbonate ligand unique to the FeV-cofactor. These residues form a 355 TGGPRL 360 loop 590 conserved only among V-nitrogenases and homologous to 355 IGGLRP 360 in A. vinelandii NifD (D. 591 Sippel & Einsle, 2017). This substitution of Thr-355 for Ile-355, as well as the exchange of Leu 592 and Pro positions may permit the inclusion of the FeV-cofactor carbonate ligand by VnfD that is 593 not possible by the NifD protein (D. Sippel & Einsle, 2017). Thr-355 and Pro-358 are unique to 594 V-nitrogenases, and Leu-360 is unique to V-and Fe-nitrogenases. All AncA sequences conserve 595 the 355 TGGPRL 360 residue loop capable of accommodating the FeV-cofactor. Furthermore, AncA 596 sequences generally exhibit greater numbers of residues unique to V-nitrogenases than those 597 unique to Fe-nitrogenases, and the mean identity of AncA active-site sequences is highest for V-598 nitrogenases (Figure 6a). Together, these observations suggest that AncA is V-dependent. 599 600 Our comparisons of ancestral and extant sequence features indicate that the active-sites of oldest 601 nitrogenase ancestors (AncB-AncE) resemble those of extant Mo-nitrogenases more than V-or 602 Fe-nitrogenases (Figure 6a). This observation is particularly significant given that these same 603 patterns are not observed across the total HDK sequence (Figure 6b). Specifically, this 604 discrepancy supports the notion that the nitrogenase active-site has been tuned to the catalytic 605 properties of each metal cofactor over its evolutionary history , and that 606 this tuning has manifested in active-site sequence differences that stand apart from baseline 607 phylogenetic distance. Though we are not able to identify specific residues that may functionally 608 relate to metal dependence (as with AncA), the resemblance of the early ancestral nitrogenase 609 active site to those of Mo-nitrogenases is highly suggestive of ancient Mo-dependence.  . A subsequently published topology is more similar to the tree 622 presented here, though lacking in Clfx sequences (Boyd, Costas, Hamilton, Mus, & Peters, 2015;623 Boyd & Peters, 2013). It is possible that the larger sequence dataset used here has refined the 624 placement of these uncharacterized clades, which is supported by our analyses with an expanded (1) It is challenging to envision a scenario in which the FeMo-cofactor biosynthetic pathway would 653 be lost in Clfx and F-Mc nitrogenases that appear to otherwise be capable of nitrogen fixation 654 (Dekas et al., 2009;Keppen et al., 1989;Kuznetsov et al., 2011;Mehta & Baross, 2006).  nitrogenases are far more efficient at reducing nitrogen than other isozymes (Eady, 1996;Harris 656 et al., 2019;Harris, Yang, et al., 2018), and the majority of all extant nitrogenases are Mo-657 dependent across both anoxic and oxic environments (Boyd et al., 2015;Mus, Colman, Peters, & 658 Boyd, 2019;Raymond et al., 2004). Even those organisms that have additional V-or Fe-659 nitrogenases still retain and preferentially express Mo-nitrogenases (Boyd, Anbar, et al., 2011;660 Boyd, Hamilton, et al., 2011;Dos Santos et al., 2012;Raymond et al., 2004). 661 Aside from Clfx and F-Mc clades, no other nitrogenases are known to lack associated nifEN genes. 662

663
(2) It has previously been proposed that early nitrogenases may have been capable of reducing 664 nitrogen prior to the origin of nifEN, and thus of the FeMo-cofactor 665 Boyd & Peters, 2013;Mus et al., 2019;Soboh, Boyd, Zhao, Peters, & Rubio, 2010). Prior to the 666 evolution of Mo-usage, an ancient Mo-independent nitrogenase may have been capable of -667 perhaps inefficiently -reducing nitrogen by a cofactor resembling the Fe-S-C cluster assembled 668 by NifB, which constitutes the biosynthetic precursor to the FeMo-cofactor (Boyd & Peters, 2013;669 Mus et al., 2019;Soboh et al., 2010). Though our sequence analyses cannot assess ancestral 670 nitrogenase dependence for a NifB-cofactor, it is likely that the NifB-cofactor resembles the 671 structure and composition of the FeFe-cofactor, excepting homocitrate (Harris, Lukoyanov, et al., 672 2018). The greater similarity of AncC-D active sites to those of Mo-nitrogenases than Fe-673 nitrogenases likely suggests ancestral dependence on a cofactor incorporating Mo rather than only 674 Fe. It is possible that, lacking NifEN, an alternative pathway for Mo-usage may have acted as a 675 transition state between Mo-independence and full FeMo-cofactor usage. It is thus reasonable to 676 speculate that this transition state of Mo-usage may be exhibited by AncC-D ancestors for which 677 the lack of one or both nifEN genes might be parsimoniously inferred. 678 (3) Given greater catalytic efficiency of Mo-nitrogenases, even an unrefined pathway for Mo-680 usage may have been of selective advantage if Mo was transiently available prior to ~2.3-2.5 Ga 681 Earth surface oxygenation (Anbar et al., 2007;Anbar & Knoll, 2002;Canfield et al., 2010;Lyons 682 et al., 2014;Raymond et al., 2004). An early generalist behavior with regard to metal usage has 683 previously been proposed (Raymond et al., 2004); such limited and possibly non-specific Mo-684 usage early in nitrogenase history would have provided an evolutionary pathway for increased  dependence via the addition of NifEN biosynthetic components. 686 687 (4) ~3.2-Ga nitrogen isotopic signatures have been interpreted to reflect ancient Mo-dependent 688 nitrogenase activity, due to similarities in isotopic fractionation by extant Mo-nitrogenases 689 (Stueken et al., 2015). However, this interpretation conflicts with age estimates of both nifEN (and 690 the FeMo-cofactor) (Boyd, Anbar, et al., 2011) and of earliest marine Mo availability (Anbar et 691 al., 2007;Lyons et al., 2014). It is possible that alternative Mo-usage in early nitrogenases may 692 have resulted in similar isotopic fractionations. Such possibilities may resolve this discrepancy and 693 remain to be tested. 694 695 The preferred model of an ancestral, alternative pathway for Mo-usage, mapped to our 696 phylogenetic reconstruction and inferred ancestors, is illustrated in Figure 7. In the first stage, 697 represented by AncD and extant Clfx nitrogenases, nitrogen fixation is achieved by a possible 698 alternative pathway for Mo-usage in the absence of NifEN, incorporating an unknown "proto-699 cofactor" (Figure 7a). Such unrefined Mo-usage is of advantage during transient Mo availabilities 700 prior to ~2.3-2.5 Ga Earth surface oxygenation (Anbar et al., 2007). In the second stage, 701 represented by AncC and extant F-Mc nitrogenases, a gene duplication event forms NifE, though 702 potentially not sufficient for FeMo-cofactor synthesis (Figure 7b). Alternative Mo-usage may still 703 be possible. In the third stage, represented by AncB, AncE, and extant Mb-Mc nitrogenases, a 704 subsequent gene duplication event forms NifN (Figure 7c). Together, NifE and NifN are capable 705 of synthesizing the FeMo-cofactor, resulting in canonical Mo-dependence. In the fourth stage, 706 gene duplication of nif results in, first, vnf, followed by anf (Figure 7d)  phylogenetic analyses of catalytic (i.e., NifHDK) and biosynthetic (i.e., NifBEN) nitrogenase 732 sequences (Boyd, Anbar, et al., 2011;Boyd & Peters, 2013;733 Raymond et al., 2004). By contrast, in our phylogenetic reconstruction, Clfx NifHDK sequences 734 diverge earlier than both F-Mc and Mb-Mc sequences, with at least one represented strain having 735 been observed to fix nitrogen (Keppen et al., 1989;Kuznetsov et al., 2011). An early association 736 of biological nitrogen fixation with anoxygenic phototrophy would not be unexpected given the 737 well-studied homology between nitrogenases and dark-operative protochlorophyllide 738 oxidoreductases (Bch/ChlLNB), enzymes that catalyze part of the bacteriochlorophyll biosynthetic 739 pathway(Y. Hu & Ribbe, 2015;Moser & Brocker, 2011;Raymond et al., 2004). Furthermore, 740 Chloroflexi strains that possess uncharacterized Clfx nitrogenases also harbor BchLNB genes. 741 Though the phylogeny presented here may still be consistent with the origin of the full FeMo-742 cofactor biosynthetic pathway (NifHDKBEN) in early methanogens (Boyd, Anbar, et al., 2011;743 Boyd, Hamilton, et al., 2011;Boyd & Peters, 2013;Raymond et al., 2004), the basal position of 744 Clfx sequences may evidence an earlier metabolic context for biological nitrogen fixation prior to 745 the origin of the FeMo-cofactor. Future investigations of nitrogen fixation by other phototrophic 746 Chloroflexi strains and additional taxa that harbor basal uncharacterized nitrogenases may resolve 747 these aspects of nitrogenase origins. 748

CONCLUSION 750
We reconstructed the phylogenetic history of nitrogenase proteins, as well as inferred ancestral 751 nitrogenase sequences, in order to explore the evolutionary trajectory of nitrogenase metal 752 dependence. We find that, whereas modeled structural features of ancestral nitrogenases do not 753 offer conclusive indications of ancient metal usage, active-site sequence features of ancestors most 754 resemble those of extant Mo-nitrogenases. The absence of associated cofactor biosynthesis 755 proteins, considered necessary for FeMo-cofactor assembly, in several early-branching 756 uncharacterized nitrogenase lineages evidences a possible alternative pathway for Mo-usage. We 757 speculate that this alternative pathway may have preceded the evolution of the FeMo-cofactor and 758 may today be present in extant uncharacterized nitrogenases, and we propose a model wherein 759 canonical Mo-usage evolved via the stepwise introduction of FeMo-cofactor biosynthetic 760 components following the divergence of more basal uncharacterized lineages. V-nitrogenases 761 subsequently diversified, followed by Fe-nitrogenases, in agreement with previous phylogenetic 762 inferences that Mo-dependence evolved first . This model helps to 763 reconcile phylogenetic and geobiological explanations of nitrogenase evolution (Anbar & Knoll, 764 2002;Boyd, Anbar, et al., 2011;. Future studies, particularly those 765 that integrate experimental assessments of laboratory-resurrected ancestral nitrogenases, may 766 continue to refine our understanding of nitrogenase and environmental coevolution. Origins of Life Initiative (Kacar, McShea), the National Science Foundation (Kacar, #1724090) 771 and the John Templeton Foundation (Kacar, #58562 and #61239). We thank Andrew Knoll for 772 constructive feedback. 773