Dating phototrophic microbial lineages with reticulate gene histories

Abstract Phototrophic bacteria are among the most biogeochemically significant organisms on Earth and are physiologically related through the use of reaction centers to collect photons for energy metabolism. However, the major phototrophic lineages are not closely related to one another in bacterial phylogeny, and the origins of their respective photosynthetic machinery remain obscured by time and low sequence similarity. To better understand the co‐evolution of Cyanobacteria and other ancient anoxygenic phototrophic lineages with respect to geologic time, we designed and implemented a variety of molecular clocks that use horizontal gene transfer (HGT) as additional, relative constraints. These HGT constraints improve the precision of phototroph divergence date estimates and indicate that stem green non‐sulfur bacteria are likely the oldest phototrophic lineage. Concurrently, crown Cyanobacteria age estimates ranged from 2.2 Ga to 2.7 Ga, with stem Cyanobacteria diverging ~2.8 Ga. These estimates provide a several hundred Ma window for oxygenic photosynthesis to evolve prior to the Great Oxidation Event (GOE) ~2.3 Ga. In all models, crown green sulfur bacteria diversify after the loss of the banded iron formations from the sedimentary record (~1.8 Ga) and may indicate the expansion of the lineage into a new ecological niche following the GOE. Our date estimates also provide a timeline to investigate the temporal feasibility of different photosystem HGT events between phototrophic lineages. Using this approach, we infer that stem Cyanobacteria are unlikely to be the recipient of an HGT of photosystem I proteins from green sulfur bacteria but could still have been either the HGT donor or the recipient of photosystem II proteins with green non‐sulfur bacteria, prior to the GOE. Together, these results indicate that HGT‐constrained molecular clocks are useful tools for the evaluation of various geological and evolutionary hypotheses, using the evolutionary histories of both genes and organismal lineages.

RCs are divided into two classes: FeS-type (Type I) and Quinonetype (Type II). All anoxygenic phototrophs use either FeS-Type or Quinone-Type RCs; however, Cyanobacteria use both FeS-Type and Quinone-Type RCs, along with an oxygen-evolving complex, to strip electrons from water and produce O 2 (Hohmann-Marriott & Blankenship, 2011). The ability to perform oxygenic photosynthesis and the origins of the coexistence of both types of RCs in Cyanobacteria are long-standing questions in the study of the evolution of phototrophs (for a review, see Hohmann-Marriott and Blankenship (2011)).
Multiple hypotheses have been proposed to account for the coexistence of both types of RC's in Cyanobacteria, including fusion (Blankenship, 1992;Mathis, 1990), selective loss (Olson, 1970;Olson & Pierson, 1987a,b), and duplication (Allen & Martin, 2007). The fusion hypothesis proposes that the FeS-Type and Quinone-Type RCs developed within different lineages, and the ancestor lineage of Cyanobacteria acquired one or both of the RCs through horizontal gene transfer (HGT). The selective-loss hypothesis proposes that an ancestral photosynthetic organism evolved both FeS-Type and Quinone-Type RCs and either the FeS-Type or Quinone-Type RC was lost in the respective ancestor of anoxygenic phototroph lineages. The duplication hypothesis proposes that a "protocyanobacterium" gave rise to the FeS-Type and Quinone-Type RCs through a gene duplication event and subsequently transferred the RCs to other lineages through HGT.
All three hypotheses necessitate the transfer of RCs. Previous work has attempted to constrain the directionality of these transfers via inferences from the complex evolutionary history of chlorophyll biosynthesis genes (Sousa, Shavit-Grievink, Allen, & Martin, 2013) supporting reaction center duplication, rather than fusion, early in the cyanobacterial stem lineage, and by a detailed analysis of reaction center subunit evolution, including molecular clock estimates on the divergence times for these protein families (Cardona, Sanchez-Baracaldo, Rutherford, & Larkum, 2017) supporting selective loss of photosystems rather than photosystem merger within stem Cyanobacteria. Each of these approaches draws inferences from mapping the evolutionary history of specific components of the photosynthetic machinery. However, any such hypothesis must also be consistent with the history and timing of the organismal lineages that presumably donated and/or acquired these genes and phenotypes.
As FeS-Type and Quinone-Type RCs are believed to pre-date the rise and expansion of the phototrophic lineages (Cardona, 2015), the ancestor of each phototrophic clade likely possessed an evolved FeS-Type, Quinone-Type, or (in the case of Cyanobacteria) FeS-Type and Quinone-Type RC. This also constrains early RC HGTs to have occurred between the stem groups of the phototrophic clades and requires the stem lineage of HGT donors to pre-date the crown groups of HGT recipients. Using molecular clocks, we are able to better explore these RC HGT scenarios in the context of (arguably) the three most ancient phototrophic groups, Cyanobacteria (FeS-Type and Quinone-Type RCs) (Allen & Martin, 2007), green sulfur bacteria (GSB; FeS-Type RC) (Cavalier-Smith, 2001), and green non-sulfur bacteria (GNS; Quinone-Type RC) (Woese, 1987).
Other phototrophic groups are excluded from this analysis as they occupy derived positions within bacterial phylogenies. These lineages are likely to be more recent recipients of photosynthetic machinery and include the polyphyletic purple bacteria distributed throughout the Alphaproteobacteria, Betaproteobacteria, and Gammaproteobacteria classes (Nagashima & Nagashima, 2013) and Heliobacteria within Clostridia (Sattley & Swingley, 2013). Therefore, we do not consider additional hypotheses in which these lineages could be donors of photosystems to the cyanobacterial stem lineage.
Analysis of the posterior age distributions allows us to evaluate the feasibility of each RC HGT in and out of stem Cyanobacteria as well as infer the geological context in which these phototrophic taxa evolved.
In recent years, several date estimates for the origins of Cyanobacteria (Blank & Sanchez-Baracaldo, 2010;Dvořák et al., 2014;Falcón, Magallón, & Castillo, 2010;Sánchez-Baracaldo, 2015;Schirrmeister, de Vos, Antonelli, & Bagheri, 2013;Schirrmeister, Gugger, & Donoghue, 2015;Shih, Hemp, Ward, Matzke, & Fischer, 2017) and for the Quinone-Type RCs of photosystem II (Cardona et al., 2017) have been obtained using molecular clock analyses. For Cyanobacteria, date estimates vary by up to 1 Ga due to the strong influence of the fossil calibrations and priors used in such analyses (Schirrmeister, Sanchez-Baracaldo, & Wacey, 2016). The recent discovery of two related, non-photosynthetic outgroups to Cyanobacteria, Melainabacteria (Di Rienzi et al., 2013), and Sericytochromatia (Soo, Hemp, Parks, Fischer, & Hugenholtz, 2017) (formerly ML635J-21), has greatly improved our ability to infer the evolutionary history of oxygenic photosynthesis and RCs. The absence of photosynthetic machinery in either of these outgroups is parsimonious with oxygenic photosynthesis evolving within the Cyanobacteria/Melainabacteria/Sericytochromatia (CMS) Group after the divergence of Melainabacteria, either via a fusion event via HGT of photosystems from other groups (Soo et al., 2017), or photosystem origin and duplication within this lineage itself (Allen & Martin, 2007). Due to the expansion of this lineage, the renaming of O 2producing Cyanobacteria to "Oxyphotobacteria" has been proposed to reflect the derived character of oxygenic photosynthesis acquired in the ancestor lineage of this group (Shih et al., 2017;Soo et al., 2017).
By including the non-photosynthetic lineages of the CMS Group, we are able to shorten the Cyanobacteria stem lineage along which RC HGT likely occurred and increase our ability to discriminate between specific HGT hypotheses.
It is now well understood that processes (like HGT) generate conflicts between individual gene histories. These HGTs can be used as an informative character for the evolutionary history of organismal lineages (Bapteste et al., 2009;Huang & Gogarten, 2009;Soucy, Huang, & Gogarten, 2015). Traditionally, molecular clock methods model the evolutionary history of a genome within a microbial lineage as a strictly bifurcating tree; however, HGTs can also provide relative constraints between nodes (e.g., A>B) as they establish the relative timing of donor and recipient lineages. Previous analyses have incorporated this temporal dimension in the simultaneous reconstruction of supertree phylogenies and relative divergence times from individual gene trees, including reticulating branches (Szöllősi, Boussau, Abby, Tannier, & Daubin, 2012). When applied to a fixed species tree, HGT events can also propagate absolute temporal constraints under certain criteria: (i) The topology of the transfer event must be sufficiently resolved, with a well-supported branch within the ancestor of the donor lineage; (ii) the phylogeny of the donor and recipient clades must be sufficiently similar to their respective species trees, ensuring that the reticulation is not complicated by multiple parallel or subsequent HGT events; and (iii) there must be an absolute date calibration associated with either the donor or (ideally) the recipient lineage crown group (Wolfe & Fournier, 2017).
We refer to HGTs that meet these criteria as "index HGTs," roughly analogous to the concept of "index fossils" from traditional paleontological biostratigraphy. Using this approach, to model the evolution of phototrophs, we are able to better constrain the direction of RC HGTs as well as inform the relative ages of the crown and stem groups of the participant clades.
In instances where a random or neighbor-joining start tree was used for the estimation of a maximum-likelihood (ML) phylogeny, the best scoring ML tree was inconsistent with previous reports on cyanobacterial phylogeny (Data S2) (Sericytochromatia and Melainabacteria were placed sister to each other rather than the previously reported topology of Sericytochromatia placed basal to Melainabacteria and Cyanobacteria). Sericytochromatia, however, was identified as basal to Melainabacteria and Cyanobacteria in 10 of the 100 bootstrap trees. Therefore, a guide tree (Data S3) was used to force Sericytochromatia basal to the Melainabacteria/ Cyanobacteria split and generate a phylogeny ( Figure S1; Data S4) in agreement with the 16S SSU ribosomal RNA phylogeny (Data S5) that places Sericytochromatia as the most basal CMS Group lineage.
This placement is further supported by the HGT of the gene encoding SahH from within Chloroflexi to the common ancestor of Melainabacteria and Cyanobacteria, excluding Sericytochromatia, as described in the methods section Horizontal Gene Transfer Constraints below. Further discussion on this topic and age distributions estimated under the best scoring ML phylogeny (Table S3) are provided in the supplement.

| Divergence time estimation
Divergence time analyses were performed using phylobayes v3.3 (−catfix C20 −ugam −nchain 2 100 0.3 50, all other parameters default) (Lartillot, Lepage, & Blanquart, 2009). A description of the models used in this analysis is provided in Table 1. Chronograms were generated using the "readdiv" command of phylobayes. For each model, the first N × 0.2 saved points (N) were excluded the computation of the chronogram. The 95% highest posterior density (HPD) intervals were calculated from the "datedist" file (Data S6). Posterior date estimates were compared to those obtained under the prior using the "-prior" function in phylobayes (Supporting Discussion).
In an effort to compare our results to previously published literature, we designed two Cyanobacteria-centric models. These models are our "Gloeobacter Outgroup" Model and "Alphaproteobacteria Outgroup" Model ( Table 1)

| Date constraints
We constrain the timing of the last common bacterial ancestor (LCBA) by the earliest evidence of habitability (4.4 Ga zircons) (Wilde, Valley, Peck, & Graham, 2001) and the earliest direct evidence of diverse bacterial communities (3.5 Ga stromatolites in the Warrawoona Group of Australia) (Schopf & Packer, 1987). While the latter may represent phototrophic metabolisms and lineages themselves, their taxonomic affinities are unknown and thus are not suitable for use as a direct calibration on groups within our analysis.
This constraint was applied via a normally distributed gamma root prior (3900 Ma, SD 200 Ma). We selected this distribution for our analyses due to our decreasing belief that the LCBA (the root of our phylogenetic tree) originated near the 4.4 Ga and 3.5 Ga historical time points. Summaries of the internal constraints are provided in Table 2. These calibrations include constraints derived from aerobic respiration metabolisms postdating the rise of oxygen, the mitochondrial ancestor in Alphaproteobacteria (Gray, Burger, & Lang, 1999) pre-dating the cytoskeletal-containing protistan fossils of the Roper Group (Australia) (Javaux, Knoll, & Walter, 2001), Blattodea obligate symbionts in Bacteroidetes (Lo, Bandi, Watanabe, Nalepa, & Beninati, 2003;Wolfe, Daley, Legg, & Edgecombe, 2016), isorenieratane and chlorobactane biomarkers identified in the 1.64 Ga Barney Creek Formation (Brocks & Schaeffer, 2008), and fossil akinetes (rod-like resting cells) interpreted as cyanobacterial fossils (Horodyski & Donaldson, 1980;Sharma, 2006). As the age of fossil akinetes has been contested, all models were run with either a −1 to
All calibration files are included in Data S9.

| Horizontal gene transfer constraints
For this study, we identified two index HGTs between the taxonomic groups surveyed. The first index HGT used was a transfer of a gene encoding Mg-chelatase (BchH) from within GNS to stem GSB (Data S10) that was originally identified by Sousa et al. (2013). This gene also underwent a clear duplication in the GSB stem lineage subsequent to transfer. We also identified a second index HGT of a gene encoding S-adenosyl-L-homocysteine hydrolase (SahH) from within Chloroflexi to the common ancestor of Cyanobacteria and Melainabacteria (Data S11). HGT information was incorporated into our models by post-sampling the posterior for steps in MCMC chains where the HGT conditional constraints were met (Data S6), permitting multiple HGT constraints to be included in a single analysis.

| Divergence date estimates for phototrophic taxa
The results of Phototroph Models A and D (see Table 1 for description) are shown in Table 3 (Table 3).
When the SahH and BchH HGT constraints are applied (Phototroph Model D), GNS and Cyanobacteria are the oldest phototrophic lineages, and date estimate precision is improved. In particular, the 95% HPD intervals of stem GNS divergence date estimates are reduced by T A B L E 1 Model Summary. When applicable, models were run using either a 1.2 Ga or 1.6 Ga akinete constraint along with the calibrations indicated in Figure 1 and Table 2 Cyanobacteria models

| Origin of phototrophs and their relationship to geologic time
The evolutionary history of phototrophy is innately tied to the biochemical history of our planet. It is generally accepted that the loss of the mass-independent sulfur isotope fractionations from the sedi- F I G U R E 1 Phylogenetic relationship of taxa included in this study. A phylogenetic tree derived from the alignment of 30 ribosomal proteins is displayed. Cyanobacteria (green), GNS (blue), and GSB (red), and major groups of taxa are labeled. Individual tips of the same phylogenetic tree are labeled in Figure S1. Illustrations of the calibrations used in the molecular clock analyses are placed to the left of the node that they constrain. Further description of these calibrations can be found in the legend and Table 2. Circles on the nodes indicate bootstrap values that are further described in the legend. Yellow squares 1-8 are placed to the right of nodes and are further described in Cyanobacteria (Dvořák et al., 2014;Falcón et al., 2010;Sánchez-Baracaldo, 2015;Schirrmeister et al., 2013Schirrmeister et al., , 2015Shih et al., 2017), large differences in outgroup, root prior, and internal calibration selection in these models have resulted in a wide variety of date estimates for the origin of oxygenic phototrophs-ranging from younger than 2.0 Ga (Shih et al., 2017) to older than 3.4 Ga (Schirrmeister et al., 2013). To compare our phototroph models to publications within the literature, we designed 3 "Gloeobacter Outgroup" and 4 "Alphaproteobacteria Outgroup" models with the same internal calibrations as our phototroph models (  Figure 2). This broad of a root prior appears to be problematic as the common ancestor of Melainabacteria and Cyanobacteria postdate the GOE, a prior assumption incompatible with oxygenic photosynthesis arising after the divergence of these lineages and within the cyanobacterial stem lineage. In instances where a GOE con- 2) of Cyanobacteria, calibrated nodes were prevented from moving younger than the GOE (Figure 2) but necessitated an a priori assump-  (Luo et al., 2016) (Figure 2, Table S5).
To the best of our knowledge, our phototroph models provide the first divergence date estimates of Cyanobacteria independent of a direct GOE constraint. Using a 1.2 Ga akinete-calibrated Phototroph Model D, the phototrophic cyanobacterial lineages split from non-phototrophic lineages ~2.8 Ga (Cyanobacteria stem) and crown Cyanobacteria arise ~2.2-2.3 Ga (  (Table 3). These age intervals suggest that an increase in photosynthetic efficiency at the GOE could be due to physiological innovation within crown Cyanobacteria, namely the evolution of the thylakoid membrane. With the exception of Gloeobacter (the earliest branching lineage of Cyanobacteria) whose RCs are located along the plasma membrane, cyanobacterial RCs are located within thylakoid membranes (Vothknecht & Westhoff, 2001). This difference in cellular structure limits the photosynthetic capacity of Gloeobacter, which is far less than its thylakoid-containing relatives (Vothknecht & Westhoff, 2001 (Table 3). These similarly old stem Cyanobacteria date estimates are compatible with oxygenic interpretations of the Archean trace metal record, although this in itself does not provide additional evidence for these interpretations (Crowe et al., 2013).
Although the GOE marked a rise in the partial pressure of atmospheric oxygen, there is strong evidence that the deep oceans remained anoxic until the Gaskiers glaciation ~580 Ma (for a comprehensive review, see: Meyer & Kump, 2008). Rather than fully oxygenating Earth's oceans, the GOE marks a transition from a ferruginous state to a stratified ocean with the potential for sulfidic conditions (Meyer & Kump, 2008). This transition is highlighted by the loss of the banded iron formations (BIFs) from the sedimentary rock record ~1.8 Ga (Cloud, 1972;Holland, 1984). Although a more complex picture of oceanic Fe/S ratios throughout the Proterozoic is coming to light (Lyons et al., 2014), increased rates of anoxygenic photosynthesis and aerobic heterotrophy are believed to have maintained the low levels of O 2 throughout this period (Johnston, Wolfe-Simon, Pearson, & Knoll, 2009). In modern anoxic waters, GSB have been found to account for as much as 83% of total productivity when H 2 S concentrations are high (Culver & Brunskill, 1969;Van Gemerden & Mas, 1995) and are likely to be important members of Proterozoic marine microbial communities (Johnston et al., 2009). In all phototroph models of this study, crown GSB appear ~1.7 to 1.8 Ga-just after the loss of BIFs from the sedimentary record (Table 3). Today, almost all known GSB couple anoxygenic photosynthesis to sulfide oxidation (Van Gemerden & Mas, 1995); however, one derived GSB, Chlorobium ferrooxidans, has been found to oxidize Fe(II) rather than sulfide (Heising, Richter, Ludwig, & Schink, 1999). As the most basal GSB are sulfide oxidizers, our date estimates suggest that modern sulfide-oxidizing GSB arose to fill a niche introduced by increased levels of H 2 S in the F I G U R E 2 Comparison of cyanobacteria crown divergence date estimates under different models. Cyanobacteria crown date estimates for Gloeobacter Outgroup (cyan), Alphaproteobacteria Outgroup (orange), and Phototroph Models (pink) using a 1.6 Ga akinete constraint (where applicable) are shown. Gray bars represent the 95% HPD, and the solid lines indicate the median of the Cyanobacteria crown date estimates. Date estimates derived from this study (phylogenetic tree from 30 ribosomal proteins) are indicated by a *. These estimates vary from other published models (bars without stars) using similar constraints but different input phylogenies (e.g., 16S SSU ribosomal RNA). From left to right, the molecular clock results used to derive the date estimates are as follows: Schirrmeister et al. (2013Schirrmeister et al. ( , 2015, oceans. These sulfide-oxidizing GSB are likely to have outcompeted their Fe(II)-oxidizing relatives that lived in Fe(II)-rich waters prior to the GOE. Today, ferruginous ocean analogs are rare; however, a stratified lake in Indonesia (Crowe et al., 2008) provides the most compelling evidence for an Fe(II)-oxidizing GSB in an Archaean Fe ocean.

| Direction of RC HGT
Further supporting Fe(II) or alternative modes of phototrophy are the 3.4 Ga fossilized mat structures restricted to the photic zone of the Buck Reef Chert (Barberton Greenstone Belt, South Africa) (Tice & Lowe, 2004, 2006) that pre-date the stem ages of all phototrophic lineages estimated by our models (  , and GSB (c) are indicated by the colored squares. The probabilities that stem GNS is older than crown Cyanobacteria (P(a>green circles)) and stem GSB is older than crown Cyanobacteria (P(c>green circles)) are provided in (i). The probabilities that stem Cyanobacteria is older than crown GNS (P(b>blue circle)) and stem Cyanobacteria is older than crown GSB (P(b>red circle)) are provided in (

| Inferring ancestry from reaction center diversity and evolution
An additional constraint on photosystem evolution that has previously been investigated is the molecular organization of the reaction center proteins themselves. RCII subunits show heterodimeric structure arising from multiple, independent gene duplication events that occurred early in their evolution, as evidenced by maximumlikelihood phylogenetic reconstructions (Cardona, 2015). Many evolutionary scenarios are consistent with the observed distribution of RCII across phototrophic groups, although the paraphyletic relationship between L and M types of RCII in Proteobacteria and GNS argues against RCII HGT between Cyanobacteria and either of these groups after gene duplications occurred (Cardona, 2015;Cardona et al., 2017). The evolutionary pattern of L and M types of RCII, however, is consistent with HGT of both heterodimer subunits between the stem lineages of GNS and PSB, after their divergence in a donor lineage. The phylogeny of these subunits does not constrain the lineage in which the paralog ancestor of RCII first emerged or the direction of any subsequent HGT. The structural differences between the cyanobacterial and L/M type RC proteins likely emerged after any putative HGT events, representing more derived states within their recipient lineages. Presumably, before their respective duplications, the RCII ancestors (in both cases) were homodimeric and likely much more similar to one another.
The very long branch lengths separating these RCII types and paralogs and the lack of information for root placement limit the value of directly estimating the relative or absolute timing of these events with a molecular clock approach. Regardless of the specific history of each RC subunit, structural studies support that extensive modifications have occurred within each RCII paralog lineage, especially within cyanobacterial heterodimeric paralogs as part of the evolution of the water-evolving complex. The phylogeny of RCI proteins shows a similar evolutionary signal, although in this case, RCI complexes within GSB, Acidobacteria, and Heliobacteria are homodimeric while the cyanobacterial type is a heterodimer resulting from a gene duplication (Cardona, 2015;Cardona et al., 2017). Similar to RCII, the unrooted tree topology, limited sequence information, and long branch lengths imposed by very high evolutionary rates prevent placement of the root along the deep branches separating RCI types. As such, the organismal lineages traversed by these deep branches cannot be directly inferred nor can the direction of any putative HGT between these groups.
It remains entirely possible that the origins of each RC pre-date the extant phototrophic diversity that can be sampled (Cardona et al., 2017), as these deep histories are neither sufficiently constrained by RC phylogenies or species tree molecular clocks.
Hypotheses explaining the extant diversity of phototrophic microbial lineages as a product of these deep evolutionary processes therefore include (i) a phototrophic ancestor deep in the bacterial tree, with a pattern of RC diversification and extensive photosystem gene loss giving rise to the observed extant phototrophic diversity (Cardona et al., 2017)

| CONCLUSIONS
Using HGT-constrained molecular clocks, we are able to obtain more precise divergence date estimates of GNS and other phototrophic taxa. These molecular clocks indicate that GNS and Cyanobacteria are older than GSB and that the transfer of FeS-Type RCs from stem GSB to stem Cyanobacteria is less probable than the transfer of Quinone-Type RCs from stem GNS to stem Cyanobacteria. Depending on the akinete constraint used, crown