The occurrence of 2‐methylhopanoids in modern bacteria and the geological record

The 2‐methylhopanes (2‐MeHops) are molecular fossils of 2‐methylbacteriohopanepolyols (2‐MeBHPs) and among the oldest biomarkers on Earth. However, these biomarkers’ specific sources are currently unexplained, including whether they reflect an expansion of marine cyanobacteria. Here, we study the occurrence of 2‐MeBHPs and the genes involved in their synthesis in modern bacteria and explore the occurrence of 2‐MeHops in the geological record. We find that the gene responsible for 2‐MeBHP synthesis (hpnP) is widespread in cyano‐ and ⍺‐proteobacteria, but absent or very limited in other classes/phyla of bacteria. This result is consistent with the dominance of 2‐MeBHP in cyano‐ and ⍺‐proteobacterial cultures. The review of their geological occurrence indicates that 2‐MeHops are found from the Paleoproterozoic onwards, although some Precambrian samples might be biased by drilling contamination. During the Phanerozoic, high 2‐MeHops’ relative abundances (index >15%) are associated with climatic and biogeochemical perturbations such as the Permo/Triassic boundary and the Oceanic Anoxic Events. We analyzed the modern habitat of all hpnP‐containing bacteria and find that the only one species coming from an undisputed open marine habitat is an ⍺‐proteobacterium acting upon the marine nitrogen cycle. Although organisms can change their habitat in response to environmental stress and evolutionary pressure, we speculate that the high sedimentary 2‐MeHops’ occurrence observed during the Phanerozoic reflect ⍺‐proteobacteria expansion and marine N‐cycle perturbations in response to climatic and environmental change.

methylated at the C3 position (3-MeBHPs) also exist and might indicate methanotrophic bacteria (Farrimond et al., 2004;, but here we focus exclusively on the 2-MeBHPs/ Hops. The 2-MeBHPs degrade over time and are preserved in the geological record as 2-methylhopanes (2-MeHops). First identified in oils (Seifert & Michael Moldowan, 1978), 2-MeHops are stable on geological time scales and among the oldest molecular fossils known on Earth . They are present in trace amounts or below detection limit in most natural samples through Earth's history. However, elevated levels of 2-MeHops have been reported for some periods of Earth's history (Kuypers et al., 2004;Summons et al., 1999).
The sedimentary occurrence of 2-MeHops was initially used to trace (aerobic) cyanobacteria (and photosynthesis and an oxidized atmosphere) based on initial culture data which indicated that a diverse range of aerobic cyanobacteria produce 2-MeBHPs . Kuypers et al., (2004) subsequently argued that the high relative abundance of 2-MeHops compared with regular hopanes (the 2-MeHops' index ) found during the Oceanic Anoxic Events (OAEs) of the Cretaceous reflected an expansion of marine N 2 -fixing cyanobacteria in response to the high degree of denitrification in anoxic oceans. Based on these landmark studies, 2-MeHops became a biomarker for marine (N 2 -fixing) cyanobacteria and of widespread interest to the geobiology community (Schaefer et al., 2020;Sepúlveda et al., 2009;Xie et al., 2005).
However, this interpretation of 2-MeHops reflecting an expansion of aerobic marine (N 2 -fixing) cyanobacteria has been challenged. In the last decade, advances in molecular biology showed that the anoxygenic ⍺-proteobacteria (Rhodopseudomonas palustris) can also produce 2-MeBHPs (Rashby et al., 2007). In addition, the specific gene responsible for the methylation of BHPs at the C2 position (hpnP gene; Figure 1) was identified (Welander et al., 2010).
Subsequent searches for the presence of the hpnP gene across bacterial phyla showed that hpnP is not exclusively found in cyanobacteria (Ricci et al., 2013;Welander et al., 2010). Recent studies have suggested that the high relative abundance of 2-MeHops during certain intervals of the sedimentary record can be used to identify periods when bacteria experienced stress (Garby et al., 2017b;Wu et al., 2015), potentially under anoxic ferrous conditions (Eickhoff et al., 2013), and/or that they indicate an environmental niche characterized by low oxygen and fixed nitrogen (Ricci et al., 2013). 2 | Methods

| Synthesis of 2-MeHops' occurrence in cultures and the geological record
First, we compiled those studies that report the hopanoid distribution in cultures of bacteria (Table 1). This spans over 35 years of research, ranging from the classic Bisseret et al., (1985) study that first identified 2-MeHops in cultures of a cyano-and ⍺-proteobacterium to the recent finding of a 2-MeHops producing acidobacterium (Sinninghe Damsté et al., 2017). We then compiled the occurrence of 2-MeHops in the geological record (Table 2). To the best of our knowledge, we report data from all peer-reviewed publications that report 2-MeHop data from well-dated sediments or oils. Data are presented as the maximum reported 2-MeHops' index , reflecting the highest abundance of 2-MeHops relative to the regular hopanoid, for each location. It is important to note that studies used different methods to quantify 2-MeHops (e.g., total ion chromatogram (TIC), m/z 191 and 205, or m/z 369 and 383) and hopanoids with different carbon chains (e.g., C 30 vs. C 31 ).

| Analysis of hpnF/hpnP genes in bacterial phyla
Distribution of 2-MeHops' synthesis genes across bacterial genomes publicly available was assessed by BLAST searches. We downloaded the genome and proteome files of 14,624 completely sequenced bacterial strains from GenBank (Benson et al., 2017), through the NCBI genome FTP (Retrieved December 2017, from ftp://ftp.ncbi. nlm.nih.gov/genom es/). Accession numbers for all the genomes are available in the Supplementary Information (Table S1). We performed BLAST searches (Camacho et al., 2009) using as query sequences the hpnP gene (responsible for the methylation at the C2 position (Welander et al., 2010)) from Rhodopseudomonas palustris (accession number B3QHD1.1) and Gloeobacter violaceus (accession number WP_011142311.1) and blastp version 2.6.0+.
We then built another alignment including the true orthologs of hpnP and removing positions with more than 85% gaps using AlignmentViewer online utility, http://sdsss dfd.alter vista.org/arklu mpus/Align mentV iewer/ Align mentV iewer.html. Misaligned positions at the start and end of the alignment were also removed. Finally, this alignment was used to build a Bayesian gene tree ( Figure 5) using MrBayes v3.2.7a (Ronquist & Huelsenbeck, 2003), under a mixed amino acid model prior, with a gamma rate heterogeneity model including invariant sites. Two independent runs were executed in parallel for 20,000,000 generations each, and convergence was assessed using the average standard deviation of split frequency statistic computed by MrBayes, as well as using Tracer v1.7.1 (Rambaut et al., 2018).

| Analysis of SSU rRNA from bacterial genomes and their habitat
To put into context the occurrence of hpnP in extant bacteria, we first estimated a bacterial tree ( Figure 2) using SSU rRNA sequence data extracted from genomes using rNaMMer v1.2 (Lagesen et al., 2007).
An approximate maximum-likelihood tree was built using fasTTree v2.1.10 (Price et al., 2010) with options -gtr -nt -nosupport -spr 4 -gamma -fastest -no2nd. SSU % identity between every possible pair of strains was also computed using the alignment. We collapsed nodes in the tree based on the following 16S percentage identity thresholds: (i) Strains that possessed neither hpnF/shc nor hpnP and had not been tested for the production of 2-MeHop experimentally were collapsed using an 82.5% threshold, (ii) strains that possessed hpnF/shc, but not hpnP and had not been tested for the production of 2-MeHop experimentally were collapsed using a 90% threshold, and (iii) strains that possessed hpnP or had been tested for the production of 2-MeHop experimentally were collapsed using a 98.65% threshold. In Figure 2, the presence of a gene at a tip of the tree signifies that at least one of the strains underlie that tip possessed the gene. Habitat for the strains possessing the hpnP gene was obtained from the literature and/or the sequenced genome information (see supplementary information for taxa, habitat and literature source).

| Analysis of Synechococcus biomass for 2-MeHop
We analyzed biomass from Synechococcus elongatus PCC 7942 for its hopanoid content to verify specific culture data. The biomass was freeze-dried and extracted using a modified Bligh-Dyer protocol (Bligh & Dyer, 1959 added to the combined supernatants, after which it was centrifuged at 2500 rpm (10 min) and the bottom phase (DCM) collected. This process was repeated three times, and the total lipid extract (TLE) was dried using a rotavap.
To convert any potential adenosyl(-type) hopanoids to a GC amenable form, 2 ml of 5% HCl in MeOH was added to half of the TLE and the mixture heated at 70℃ for 1 hr. After it cooled, the pH was adjusted to 11 using double distilled water and methanolic KOH. For this purpose, we used a Thermo Scientific QExactive Orbitrap MS coupled to a Trace 1310 GC system with an Agilent DB-1HT column (15 m × 0.320 mm, 0.10 µm film thickness). The GC oven was programmed as follows: 70℃ (1-min hold), increase to 210℃ at 3℃/min, followed by an increase to 410℃ at 6℃/min (10-min hold).
The MS scanned continuously between m/z 53 and 800.

| Distribution of the hpnF and hpnP genes in bacterial genomes and 2-MeBHP presence in bacterial cultures
Our genomic analyses show that the hpnF/shc gene is relatively widespread among a range of bacterial phyla, while the hpnP gene is less common (Figure 2). HpnP is predominantly found in cyano-

| 2-MeHops in the rock record
Our synthesis on 2-MeHops' occurrence in the geological record  (French et al., 2015) and not shown here. The occurrence of 2-MeHops in some of the Meso-and Paleoproterozoic samples might also have been contaminated, but at this point, we cannot objectively assess this so report the data as originally published.

| 2-MeHops and hpnP gene in modern bacteria
Expanding on earlier work (e.g., Welander et al., 2010), our analysis shows that the hpnF/shc gene, responsible for the cyclisation of squalene into a hopanoid (Figure 1), is relatively widespread among a number of bacterial phyla (Figure 2). This is consistent with the detection of hopanoids in a wide range of bacteria (Rohmer et al., 1984). However, the hpnP gene, which is responsible for the methylation at the C2 position after the hopanoid skeleton is formed (Figure 1), is less widespread. It is predominantly found in cyanoand ⍺-proteobacteria (Figure 2), in accordance with other studies (Welander et al., 2010). One acidobacterium ( Table 2 for data sources). Shown is the highest reported 2-methylhopane index for each site. *The high 2-MeHop index reported at one marginal site during the PETM is likely driven by the input of allochthonous organic matter from organic-rich black shales of Cenomanian age hpnF and hnpP genes, but the original culture work does not report the occurrence of 2-MeBHPs (Rohmer et al., 1984). It is likely that these bacteria did not produce significant amounts of 2-MeBHPs under the specific culture conditions as the amount of 2-MeBHP varies depending on culture conditions (Doughty et al., 2009;Wu et al., 2015).
More problematic is the reported production of 2-MeHops by

| 2-MeHops in the geological record
It is important to note that the 2-MeHop index in the rock record might reflect local/regional perturbations and not necessarily a  Table S1 for details and sources). Posterior probability is shown as well as whether a species produces 2-MeBHPs in culture or not (see Table 1) et al., 2019) and 1100 Myr old Taoudeni Basin (Blumenberg et al., 2012;Gueneli et al., 2018) and these studies confirm the occurrence of 2-methylhopanoids and hence bacterial life during the Mesoproterozoic.
For the Phanerozoic, while most reports come from marine sediments, 2-MeHops are also present in lacustrine sediments (e.gFarrimond et al., 2004;French et al., 2020) and marine oils and bitumens . They are more frequently reported from Mesozoic samples, likely due to a higher sampling frequency com- Of the three marine strains, filamentous cyanobacterium ESFC-1 has recently been identified as an aerobic nitrogen-fixing cyanobacterium (Everroad et al., 2016;Woebken et al., 2012). To our knowledge, this is the first and only marine N 2 -fixing cyanobacterium holding the hpnP gene. An expansion of (ancestors of) this species could fulfill the original hypothesis (Kuypers et al., 2004) that Various studies proposed that ⍺-proteobacteria produce more 2-MeBHPs under hypoxic and acidic conditions (Kulkarni et al., 2015;Wu et al., 2015). This observation agrees with the observed 2-MeHops' occurrence in the geological record, in which the highest relative abundance is associated with widespread (water column) hypoxic/anoxic conditions such as during the OAEs. However, so far, all studies on the impact of cell stress and 2-MeBHPs come from plant symbionts that operate in the terrestrial realm. It is not clear whether these results can be extrapolated to the marine realm.
Other Methylobacteria produce 2-MeBHPs in culture (Knani et al., 1994) and many have the hpnP gene ( Nitrobacter Nb-311A is the second marine ⍺-proteobacterium to contain the hpnP gene (Welander et al., 2010). This bacterium is strictly aerobic and oxidizes nitrite to nitrate during nitrification (the opposite process of denitrification), a crucial process in the marine N cycle. As with denitrification, biogeochemical modeling indicates that nitrification rates increased by one order of magnitude during OAEs (Naafs et al., 2019), which would also agree with the observed high 2-MeHops' index. The high 2-MeHops' index during some of the geological events ( Figure 4) could therefore (partly) be explained by an expansion of Nitrobacter that drove the high rates of nitrification. A similar hypothesis was recently proposed by Elling et al., (2020). They used cultures of Nitrobacter vulgaris AB1 and argued the high 2-MeHops' level observed during OAEs reflects an expansion of Nitrobacter in response to high rates of nitrification.
However, Nitrobacter vulgaris AB1 has so far only been found in terrestrial settings (Mellbye et al., 2017;Vanparys et al., 2007). Besides, the marine habitat of Nitrobacter Nb-311A is not well constrained, based solely on work in 1968 and a personal communication in Starkenburg et al., (2008). The Startkenburg paper also doubted the assumed marine habitat of Nitrobacter Nb-311A as its SSU rRNA gene sequence is identical to the one of Nitrobacter winogradskyi, an organism only found in soil (Poly et al., 2008). Thus, while there is some evidence that 2-MeHops' occurrence indicates high rates of