Early Archean origin of Photosystem II

Abstract Photosystem II is a photochemical reaction center that catalyzes the light‐driven oxidation of water to molecular oxygen. Water oxidation is the distinctive photochemical reaction that permitted the evolution of oxygenic photosynthesis and the eventual rise of eukaryotes. At what point during the history of life an ancestral photosystem evolved the capacity to oxidize water still remains unknown. Here, we study the evolution of the core reaction center proteins of Photosystem II using sequence and structural comparisons in combination with Bayesian relaxed molecular clocks. Our results indicate that a homodimeric photosystem with sufficient oxidizing power to split water had already appeared in the early Archean about a billion years before the most recent common ancestor of all described Cyanobacteria capable of oxygenic photosynthesis, and well before the diversification of some of the known groups of anoxygenic photosynthetic bacteria. Based on a structural and functional rationale, we hypothesize that this early Archean photosystem was capable of water oxidation to oxygen and had already evolved protection mechanisms against the formation of reactive oxygen species. This would place primordial forms of oxygenic photosynthesis at a very early stage in the evolutionary history of life.


| INTRODUC TI ON
The transition from anoxygenic to oxygenic photosynthesis initiated when an ancestral photochemical reaction center evolved the capacity to oxidize water to oxygen (Rutherford, 1989). Today, water oxidation is catalyzed in the Mn 4 CaO 5 oxygen-evolving cluster of Photosystem II (PSII) of Cyanobacteria and photosynthetic eukaryotes. How and when Type II reaction centers diversified, and how and when one of these reaction centers evolved the capacity to oxidize water are questions that still remain to be answered.
While there is agreement that by 3.5 Ga (billion years before the present) a form of anoxygenic photoautotrophy had already evolved (Butterfield, 2015;Nisbet & Fowler, 2014;Tice & Lowe, 2004), the sedimentological and isotopic evidence for the origin of oxygenic photosynthesis has been interpreted to range from 3.7 Ga (Frei et al., 2016;Rosing & Frei, 2004) to the Great Oxidation Event (GOE) at ~2.4 Ga . Molecular clock studies have generated a wider range of age estimates for the origin of Cyanobacteria spanning between 3.5 Ga (Falcon, Magallon, & Castillo, 2010) and <2.0 Ga (Betts et al., 2018;Shih, Hemp, Ward, Matzke, & Fischer, 2017). There is thus great uncertainty and no consensus. For this reason, determining when PSII evolved the capacity to oxidize water should greatly advance our understanding of the origin of oxygenic photosynthesis.
The evolution of Type II reaction center proteins has been described and discussed in some detail before (Beanland, 1990;Blankenship, 1992;Cardona, 2015Cardona, , 2016Nitschke & Rutherford, 1991;Rutherford & Nitschke, 1996;Sadekar, Raymond, & Blankenship, 2006) and it is presented and schematized in Figure 1. Type II reaction centers can be divided into two major families: the oxygenic and the anoxygenic Type II reaction centers. The oxygenic Type II reaction center is also known as PSII, and its electron transfer core is made of two homologous reaction center proteins, D1 and D2, exclusively found in Cyanobacteria and photosynthetic eukaryotes. The Mn 4 CaO 5 cluster is bound by D1 and the core antenna protein CP43 (Ferreira, Iverson, Maghlaoui, Barber, & Iwata, 2004). On the other hand, anoxygenic Type II reaction centers are found in phototrophic members of the phyla Proteobacteria, Chloroflexi, and Gemmatimonadetes, with the latter obtaining the reaction center via horizontal gene transfer (HGT) from a gammaproteobacterium (Zeng, Feng, Medova, Dean, & Koblizek, 2014). The core subunits of the anoxygenic Type II reaction centers are known as L and M and lack an oxygen-evolving cluster.
There is no doubt that D1, D2, L, and M share a common origin: Beanland (1990) was the first to record this but has been followed by many others (Cardona, 2015;Nitschke & Rutherford, 1991;Sadekar et al., 2006). That is to say that D1, D2, L and M, all descended from a single protein (denoted II in Figure 1). The earliest event in the evolution of Type II reaction centers can be described as the divergence of this ancestral protein into two new forms, one ancestral to D1 and D2, the oxygenic branch; and a second one ancestral to L and M, the anoxygenic branch ( Figure 1). Hence, D1 and D2 originated from a gene duplication event and together make a monophyletic clade of Type II reaction center proteins, distinct from that which gave rise to L and M (Cardona, 2015(Cardona, , 2016. The ancestral protein to D1 and D2 F I G U R E 1 Evolution of Type II reaction center proteins. (a) A maximum likelihood phylogeny of Type II reaction center proteins. (b) A schematic representation of the phylogeny shown in (a). All reaction centers have common ancestry and descended from a homodimeric reaction center, marked A. From A, two new reaction centers emerged, one ancestral to all Type II reaction centers (II) and a second ancestral to all Type I reaction centers. This is the earliest diversification event of reaction center proteins that can be inferred from sequence and structural data and it is marked 1. The ancestral Type II reaction center protein (II) gave rise to two new proteins, one ancestral to D1 and D2, named here D0 and a second ancestral to L and M named K. The ancestral L and M subunits further diversify into Chloroflexi-type and Proteobacteria-type L and M subunits (5).
Step 6 indicates that Type I reaction center proteins also diversified in parallel to Type II reaction center proteins. (c) Evolution of cyanobacterial D1 and D2, modified from Cardona et al. (2015). G0, G1, and G2 represent atypical D1 forms, and G3 and G4 standard D1 forms. ΔT marks the span of time between D0 and the appearance of the ancestral standard form of D1, which characterizes PSII and predates the most recent common ancestor of all known Cyanobacteria capable of oxygenic photosynthesis [Colour figure can be viewed at wileyonlinelibrary.com] will be referred to as D0 and the ancestral protein to L and M will be referred to as K.
As a result of the monophyletic relationship of D1 and D2 and the conserved structural and functional characteristics between these two proteins, it is possible to reconstruct traits of the ancestral photosystem. Some of the conserved traits, present in both D1 and D2, but absent in L and M, suggest that the ancestral homodimeric photosystem, made of a D0 dimer, was already unlike any of the known anoxygenic Type II reaction centers and had acquired characteristics associated with the highly oxidizing potential required for water oxidation (Cardona, 2016;Cardona, Sedoud, Cox, & Rutherford, 2012;Rutherford & Faller, 2003;Rutherford & Nitschke, 1996). One of these conserved traits is a redox tyrosine-histidine pair strictly conserved in both D1 and D2, Y Z -H190 and Y D -H189, respectively. The presence of these tyrosine-histidine pairs indicates that the midpoint potential (E m ) of the photochemical chlorophylls at the heart of the reaction center was oxidizing enough to generate the neutral tyrosyl radical on either side of the homodimeric reaction center (Rutherford & Faller, 2003;Rutherford & Nitschke, 1996). That is an E m of at least 1 V (DeFelippis, Murthy, Faraggi, & Klapper, 1989;DeFelippis et al., 1991), sufficient to drive the oxidation of water to oxygen, which has an E m of 0.82 V at pH 7 (Dau & Zaharieva, 2009;Tachibana, Vayssieres, & Durrant, 2012). Based on this and other arguments, Rutherford and Nitschke (1996) suggested that before the gene duplication that led to D1 and D2, this ancestral photosystem was well on its way toward the evolution of water oxidation, and may have been able to oxidize water, even if only inefficiently.
Several types of D1 can be distinguished phylogenetically (Cardona, Murray, & Rutherford, 2015) and their evolution is schematized in Figure 1c. The early evolving forms, referred to as atypical D1 forms (G0, G1, G2 in Figure 1), are characterized by the absence of some, but not all, of the ligands to the Mn 4 CaO 5 cluster and have been recently found to be involved in the synthesis of chlorophyll f, which supports oxygenic photosynthesis using low energy far-red light (Ho, Shen, Canniffe, Zhao, & Bryant, 2016;Nurnberg et al., 2018); or the inactivation of PSII when anaerobic processes are being carried out (Murray, 2012;Wegener, Nagarajan, & Pakrasi, 2015). The late evolving forms, referred to as the standard D1 forms, are characterized by a complete set of ligands to the Mn 4 CaO 5 cluster and are the main D1 used for water oxidation. Among the standard forms, there are also several types, which have been roughly subdivided into two groups: the microaerobic forms of D1 (G3) and the dominant form of D1 (G4). The microaerobic forms are suspected to be expressed only under low-oxygen conditions. The dominant form, G4, is the main D1 used for water oxidation by all Cyanobacteria and photosynthetic eukaryotes. Most Cyanobacteria carry in their genomes an array of different D1 types, yet every strain has at least one dominant form of D1 (G4). Therefore, all Cyanobacteria descended from a common ancestor that already had evolved efficient oxygenic photosynthesis, had a dominant form of D1, and was able to assemble a standard PSII virtually indistinguishable from that of later evolving strains. Furthermore, because the atypical D1 forms support or regulate oxygenic photosynthesis under specific environmental conditions it can be argued that when these branched out water oxidation to oxygen had already evolved.
Based on the phylogeny of reaction center proteins, several stages in the evolution of oxygenic photosynthesis can be envisaged: The earliest of these stages is the divergence of Type I and Type II reaction center proteins (1, Figure 1b); this is then followed by the divergence of the anoxygenic family (L/M) and the oxygenic family (D1/D2) of Type II reaction center proteins (2), then by the duplication event that led to the divergence of D1 and D2 (3), and the subsequent (7) gene duplication events and specializations that created the known diversity of D1 forms, which ultimately resulted in the emergence of the standard form of D1. Because a photosystem made of a D0 had already acquired some of the fundamental features required to oxidize water such as highly oxidizing chlorophyll cofactors and the capacity to generate the neutral tyrosyl radical at each side of the reaction center: Then, it can be suggested that some of the earliest stages specific to the evolution of PSII and oxygenic photosynthesis had occurred between stages 2 and 3 as depicted in Figure 1b. Therefore, the span of time between D0 and the ancestral standard form of D1 (marked 8 in Figure 1c) represents the duration of the evolutionary trajectory of PSII from a simpler homodimeric highly oxidizing reaction center to the more complex enzyme inherited by all organisms capable of oxygenic photosynthesis. We denote this span of time by ΔT. If ΔT is small, such as a few million years or less for example, then the evolution of oxygenic photosynthesis may be better described as a sudden and fast process only getting started shortly before the GOE as suggested by some recent analyses (Shih, Hemp et al., 2017;Ward, Kirschvink, & Fischer, 2016). On the other hand, if ΔT is large: Several hundred million years or more for example, then the earliest stages in the evolution of oxygenic photosynthesis could significantly predate the GOE as suggested by some geochemical (Mukhopadhyay et al., 2014;Planavsky et al., 2014;Satkoski, Beukes, Li, Beard, & Johnson, 2015) and phylogenetic data (Blank & Sanchez-Baracaldo, 2010;Schirrmeister, de Vos, Antonelli, & Bagheri, 2013).
Here, we report an in-depth evolutionary analysis of Type II reaction center proteins including Bayesian relaxed molecular clocks under various scenarios for the origin of photosynthesis. The data presented here indicate that a photosystem with the structural and functional requirements to support the oxidation of water to oxygen could have arisen in the early Archean and long before the most recent common ancestor of Cyanobacteria.

| Change in sequence identity as a function of time
A first approximation to the evolution of Type II reaction centers as a function of time can be derived from the level of sequence identity between D1 and D2 of different species with known or approximated divergence times as shown in Figure 2. For example, the D1 protein of the dicotyledon Arabidopsis thaliana shares 99.7% amino acid sequence identity with that of the dicotyledon Populus trichocarpa, and these are estimated to have diverged between 127.2 and 82.8 Ma (Clarke, Warnock, & Donoghue, 2011), see Figure 2 and Supporting information Table S1. On the other hand, A. thaliana's D1 shares 87.7% sequence identity with that of a unicellular red alga Cyanidioschyzon merolae. Complex multicellular red algae are known to have diverged at least 1.0 Ga ago (Butterfield, 2000;Gibson et al., 2017) and recently described fossils could push this date back to 1.6 Ga (Bengtson, Sallstedt, Belivanova, & Whitehouse, 2017;Sallstedt, Bengtson, Broman, Crill, & Canfield, 2018). At the other end of this evolutionary line, the three dominant forms of D1 from Gloeobacter violaceous (G4) share on average 79.2% sequence identity with that of C. merolae or 78.5% with that of A. thaliana.
If the percentage of sequence identity between pairs of species is plotted as a function of their divergence time, a linear decrease of identity is observed among reaction center proteins at a rate of less than 1% per 100 million years (Supporting information Table S2). The trend in Figure 2 indicates that the rate of evolution of D1 and D2 since the GOE and since the emergence of photosynthetic eukaryotes has remained very slow and stable until the present time, if con- Let us reiterate that all the evidence suggests that all reaction center proteins originated from a single ancestral protein that diversified as the multiple groups of photosynthetic bacteria arose.
As a result of this common ancestry, any standard D1 shares on average about 29% sequence identity with any D2 across their entire sequence. Any standard D1 or D2 shares on average 17% sequence identity with any L or M. The level of sequence identity falls well below 10% if any Type II reaction center protein is compared with any Type I reaction center protein (Cardona, 2015). As a result of this, the rate of evolution of D1 and D2 since the GOE, as esti- Taking into consideration that D1 and D2 share only about 29% sequence identity, two other observations can be made, as illustrated in Figure 2: (a) that the duplication that led to the divergence of D1 and D2 is more likely to have occurred closer to the origin of the primordial reaction center proteins at the origin of photosynthesis in the early Archean, than closer to or after F I G U R E 2 Decrease of sequence identity of D1 and D2 proteins as a function of divergence time. D1 subunits are shown in gray and D2 in orange. The divergence time between pairs of species is plotted against the level of sequence identity as tabulated in Supplementary Table S1. The red circle, placed at 79.2%, corresponds to the average sequence identity of the three distinct Group 4 D1 sequences of Gloeobacter violaceous in comparison with that of Cyanidioschyzon merolae. The light orange bar marks the GOE. The dashed line is fitted from a linear function and shows that over a period of at least 2.0 Ga, no dramatic changes in the rates of evolution of D1 and D2 are observed. The red dashed lines show an extrapolation of current rates of evolution throughout Earth's history. This line highlights that the rate is too slow for the divergence of D1 and D2 to have started right before the GOE. The gray dots around 3.5-3.8 Ga mark a speculative timing for the earliest events in the history of photosynthesis: the divergence of D1 and D2 (~29% sequence identity), the divergence of anoxygenic (L/M) and oxygenic (D1/D2) reaction center proteins (~17%), and the divergence of Type I and Type II reaction center proteins (≤10%). The curved blue line highlights that any scenario for the diversification of reaction centers after the origin of life requires faster rates of evolution at the earliest stages in the evolution of photosynthesis [Colour figure can be viewed at wileyonlinelibrary. com] F I G U R E 3 Relaxed molecular clock of Type II reaction center proteins. A log-normal autocorrelated relaxed clock is shown implementing a CAT + GTR + Γ model with flexible boundaries on the calibration points. Red dots are calibration points as described in Materials and Methods. The gray dot denoted II represents the ancestral Type II reaction center protein, as schematized in Figure 1. The orange dot (D0) marks the initial divergence of D1 and D2. The violet dot marks the divergence point between G2 atypical D1 sequences and standard D1. The green dot marks the divergence point between the microaerobic D1 forms (G3) and the dominant form of D1 (G4). This point represents the last common ancestral protein to all standard D1 forms predating crown group Cyanobacteria. The blue dot represents the origin of the dominant form of D1 inherited by all extant Cyanobacteria and photosynthetic eukaryotes. The gray bars represent the standard error of the estimated divergence times at the nodes. The orange bar shows the GOE [Colour figure can be viewed at wileyonlinelibrary.com] the GOE; and (b) that the MRCA of Cyanobacteria is more likely to have existed closer to the GOE than closer to the origin of photosynthesis.

| Bayesian relaxed molecular clock analysis
The simple approach used above indicates that the divergence of D1 and D2 is likely placed well before the GOE, to confirm this observation we applied a molecular clock to the phylogeny of Type II reaction center proteins. Figure 3 shows a Bayesian relaxed lognormal autocorrelated molecular clock built using the CAT + GTR + Γ model allowing for flexible boundaries on the calibration points (Lartillot, Lepage, & Blanquart, 2009). As an informed starting point, we first specified the age of the root (root prior) at 3.5 Ga with a standard deviation of 0.05 Ga. That is to say, that the most ancestral form of a Type II reaction center protein is assumed to have already  It follows then that the difference in time between D0 and the first standard form of D1, ΔT, is 1.02 Ga, with the level of uncertainty on the estimated ages resulting in a range for ΔT between 1.44 and 0.60 Ga (see Table 1 and Figure 4a and b). This large ΔT agrees with the predictions made from the comparisons of sequence identity plotted in Figure 2.
To test the effect of different root priors on our results, we varied the age of the root and the standard deviation over a broad range. Table 1 lists estimates of divergence times of key ancestral Type II reaction center proteins and the respective ΔT value using different root priors. For example, under the assumption that Type II reaction centers had already evolved by 3.8 Ga ago (Czaja et al., 2013;Nisbet & Fowler, 2014;Rosing, 1999), ΔT is found to be centered at 1.17 Ga. Similarly, if it is assumed to be a late event occurring at 3.2 Ga, though unlikely, ΔT is still 0.80 Ga. Furthermore, increasing the standard deviation on the root prior pushes the timing of the earliest events in the evolution of Type II reaction centers to even older ages rather than younger ages, see Table 2

| Rates of evolution
The inferences derived from Figure 2 revealed that the rates of evolution had to be faster in the initial stages during the Archean compared with the Proterozoic, even when ΔT is as large as one billion years. To gain a better understanding of the changes of the rate of evolution of Type II reaction center proteins, we plotted the rates as a function of divergence time. In Figure 5a, the rate of evolution (ν) of each node in the tree, expressed as amino acid substitutions per site per unit of time, is plotted against the estimated divergence time for each respective node. It can be seen that the rate at the earliest stage is much faster than the rates observed since the Proterozoic.
Thus, faster rates are necessary to explain the origin and evolution of Type II reaction centers at any point in the Archean and regardless of when exactly photosynthesis originated, as seen in Figure 5b.
The decrease in the rate of evolution is consistent with the observations derived from Figure 2 and can be roughly fitted with a firstorder exponential decay curve (fitting parameters are presented in Supporting information Table S3). Figure 5a additionally shows that L and M have been evolving at a faster rate than D1 and D2. From TA B L E 2 Effect of varying the standard deviation (SD) on the root prior at 3.5 Ga this slow-down of the rates, it can be calculated that since each respective duplication event (stages 2 and 4 in Figure 1) it took about 168 million years for D1 and D2 to fall to 50% sequence identity and about 115 million years for the same to occur for L and M.
The maximum rate of evolution in the D1 and D2 family of reaction center proteins is placed at the node that represents D0. We will refer to this rate as ν max . Figure 5a shows that the rate of evolution flattens out to comparatively slow rates during the Proterozoic.
These rates correspond to the rates of Group 4 D1 and D2. We will refer to the average rate of evolution during this zone of slow change as ν min and it is calculated as the average rate from each node in Group 4 D1 and D2. In Figure 5a, ν max is 5.03 ± 1.42 amino acid substitutions per site per Ga, while ν min is 0.12 ± 0.04 substitutions per site per Ga. Therefore, if Type II reaction centers had evolved by 3.5 Ga, to account for the divergence of D1 and D2 in one billion years, the initial rate of evolution had to be about 40 times larger than that observed since the MRCA of Cyanobacteria. Table 3 lists the rates of evolution of a diverse number of proteins reported in independent studies in a broad range of organisms. It is found that the rate of evolution of the core subunits of PSII (ν min ) is similar to the rate of other proteins that are billions of years old and highly conserved such as subunits of the ATP synthase, the cytochrome b 6 f complex, or the ribosome. Our estimated rates fall well within the expected range of other cyanobacterial proteins, thus validating our calibration choices and consistent with the expected level of sequence identity as derived from Figure 2. Furthermore, even ν max is found to be within plausible levels when ΔT is about a billion years: slower than known fast evolving proteins such as peptide toxins (Duda & Palumbi, 1999), the influenza virus (Carrat & Flahault, 2007), or proteins of the immune system (Hughes, 1997; Table 3).
A rate of evolution of 0.12 substitutions per site per Ga, as seen for standard D1 and D2, means that it would take about 8 billion years for each position of the sequence to have changed at least once, assuming-just for the sake of simplicity-that each position has a similar chance of mutating. This very slow rate of evolution is the reason why standard D1 and D2 have changed little in more than 2.0 Ga, as seen in Figure  From these comparisons in the rates, it can be concluded that the homodimeric stage (D0) was likely very short-lived, even when ΔT is in the order of a billion years.
F I G U R E 5 Rates of evolution as a function of time. (a) Change in the rate of evolution of oxygenic (gray) and anoxygenic (orange) Type II reaction center proteins. The rates correspond to the tree in Figure 3, assuming an origin of photosynthesis at about 3.5 Ga. The dashed lines represent a fit of a single-component exponential decay and the rates are given as amino acid substitutions per site per million years.
(b) Changes in the rate of evolution constraining the root to younger and younger ages. The red curve farthest to the right was calculated using a root prior of 4.2 ± 0.05 Ga, while the green curve farthest to the left was calculated using a root prior of 0.8 ± 0.05 Ga. Younger divergence times imply initial faster rates of evolution. (c) Change in the rate of evolution as a function of ΔT, the dashed lines represent a fit to a power law function. The curve in orange was calculated using ΔT values subtracting the mean average of the divergence times of D0 and the ancestral standard D1. The curve in gray was calculated using ΔT values subtracting the minimum age of D0 and the maximum age for ancestral standard D1 [Colour figure can be viewed at wileyonlinelibrary.com] Our Bayesian analyses show that the evolution of PSII is better described by a long span of time since the appearance of a homodimeric photosystem (with sufficient power to oxidize water) until the emergence of standard PSII (inherited by all known Cyanobacteria capable of photosynthesis). Notwithstanding, a fast rate of evolution at the earliest stage implies that ν max would increase if ΔT is considered to be smaller, as would be the case for an evolutionary scenario in which PSII evolves rapidly before the GOE after an event of gene transfer of a bacteriochlorophyll a-based anoxygenic Type II reaction center with low oxidizing power (i.e., before the evolution of tyrosine oxidation) like those found in phototrophic Proteobacteria and Chloroflexi (Shih, Hemp et al., 2017;Soo, Hemp, Parks, Fischer, & Hugenholtz, 2017). We illustrate this concept in Figure 5b and c.
These figures depict the change of the rate of evolution as a function of ΔT. This manipulation of the molecular clock can only by accomplished computationally by changing the root prior to younger and younger ages. The increase of ν max with decreasing ΔT can be fitted using a power law function (see Supporting information Table S4 for fitting parameters). This function can then be used to calculate ν max under varying ΔT. then ν max would need to be more than 400 thousand amino acid changes per site per Ga, which is orders of magnitude greater than the rate of evolution of any known protein (  (2007) a Estimated using a root prior of 3.5 Ga under a autocorrelated log-normal molecular clock as described in the text and Materials and Methods. b Proline-rich transcript overexpressed in the brain (PRTB). The human protein shares 99% sequence identity compared to that in mice. Rodents are estimated to have diverged about 74 Ma ago (Kay & Hoekstra, 2008). c The authors pointed out that the rate of evolution of this methyltransferase has remained unchanged from bacteria to humans.
TA B L E 3 Comparison of rates of protein evolution to occur. Even the "simplest" known homodimeric reaction center is made of at least four protein subunits binding 62 chlorophyll-derived pigments, two carotenoids, two lipids, four Ca 2+ ions, and a Fe 4 S 4 cluster (Gisriel et al., 2017). It is therefore likely that reaction centers have always been under strong purifying selection (Shi, Bibby, Jiang, Irwin, & Falkowski, 2005). In fact, even the scenario in which ΔT is a billion years (ν max = 5.03 ± 1.42) may be an overestimation and could potentially indicate that the age of the duplication event that led to D1 and D2 occurred immediately after the origin of the earliest reaction center. In consequence, the evolution of the core subunits of PSII is more consistent with a scenario in which oxygenic photosynthesis originated long before the GOE as supported by the geochemical record of inorganic redox proxies (Crowe et al., 2013;Havig, Hamilton, Bachan, & Kump, 2017;Planavsky et al., 2014;Wang et al., 2018).

| The D1/D2 duplication is older than the L/M duplication
In Figure 3, it can also be seen that the divergence of the L and M subunits occurs after the divergence of D1 and D2. The estimated time for the divergence of L and M is 2.87 ± 0.16 Ga, while the time for the divergence of D1 and D2, as we saw above, is 3.22 ± 0.19 Ga.
Because no calibration points were set on L and M, greater levels of uncertainty are observed in this part of the tree: Hence, we refrain from making strong inferences on the timing of specific diversification events within phototrophic Proteobacteria or Chloroflexi and only focus on the general trends. Still, we found the above result puzzling as it would place the roots of PSII before the roots of anoxygenic Type II reaction centers. After a closer inspection, we noted that this effect is caused by faster rates of evolution computed for L and M, relative to D1 and D2, across all time points (Figure 5a and Table 3). In consequence, at a faster rate of evolution, it would take less time for L and M to converge to node K than D1 and D2 to node D0. What is more, the phylogeny of Type II reaction centers, as seen in Figure 1, also shows that L and M branches are overall longer than D1 and D2 branches, which is suggestive of accelerated rates.
Longer branches can be caused by a relatively early diversification due to a slow rate of evolution, or alternatively by a late diversifica- while that for D1 and D2 is 0.12 ± 0.04 (see Table 3). Therefore, according to our clock L and M are evolving on average 4.7 times faster than D1 and D2. This result is nicely within the range suggested by the two independent studies referenced above and confirms that our approach using a single protein produced similar rates of evolution as those computed using a large set of highly conserved concatenated sequences (Magnabosco et al., 2018;Sanchez-Baracaldo, 2015;Shih, Hemp et al., 2017).
It is worth noting here that under every scenario tested in this study, the duplication leading to D1 and D2 was always found to be the oldest node after the root. The late divergence of L and M relative to D1 and D2 does not seem to be artifactual but a consequence of the apparent faster rates of evolution measured in anoxygenic phototrophs. A ramification of this is that the hypothesis that Cyanobacteria obtained a Type II reaction center via HGT from an anoxygenic phototroph, right before the GOE, becomes untenable because D1 and D2 would predate L and M.

| Sensitivity analysis
To test the reliability of the method, we explored a range of contrasting models. We compared the effect of the model of relative exchange rates on the estimated divergence times: Supporting information Figure S1 provides a comparison of estimated divergence times calculated with the CAT + GTR model (Yang & Rannala, 2006) Table 4.
To understand the effects of the oldest calibration point (point 11, Figure 3) on the estimated divergence time, we tested a second set of boundaries restricting this point to a minimum age of 2.7 Ga (Calibration 2) to consider the possibility that the record for oxygen several hundred million years before the GOE was produced by crown group Cyanobacteria (Havig et al., 2017;Planavsky et al., 2014). Supporting information Figure S2 Table S6 for linear regression).
Calibration 2 did not seem to have a very strong effect on the estimated divergence times nor ΔT. For example, under a root prior of 3.5 ± 0.05 Ga and employing Calibration 2, ΔT was centered at 0.97 Ga (Table 4). We also tested the effect of removing the oldest calibration point from the analysis (point 11, Figure 3), this has the effect of making many nodes younger, yet ΔT remained in the range of 0.8 to 1.3 Ga depending on the level of flexibility allowed (Table 1).
In contrast, the choice of model for the evolution of substitution rates had a strong impact on the estimated divergence times as shown in Supporting information Figure S3 and Table S7. Supporting information Figure S3 presents a comparison of divergence time estimates of a tree calculated using a relaxed log-normal autocorrelated molecular clock with a tree calculated using an uncorrelated gamma model on the rates of evolution. The autocorrelated model assumes that neighboring branches are more likely to evolve at a similar rate, while the uncorrelated model assumes that the rate of evolution of each branch can vary independently (Ho & Duchene, 2014;Lepage, Bryant, Philippe, & Lartillot, 2007). Under the uncorrelated model, the estimated divergence times of many nodes were aberrantly shifted to younger ages: for example, most cyanobacterial and eukaryotic D1 clustered in the range of 700 to 0 Ma, which is inconsistent with the fossil record. The molecular mechanism behind this difference could be related to the fact that photochemistry imposes a strong constraint on the evolution of reaction center proteins: as all of them must coordinate and maintain all redox cofactors, chlorophylls, quinones, carotenoids, and a non-heme iron, at a precise orientation and distance from each other to allow for control of electron transfer rates and redox potentials. These rates and potentials are crucial not only for function but also for protection against the formation of reactive oxygen species (Cardona et al., 2012;Rutherford, Osyczka, & Rappaport, 2012). It seems reasonable then, that the rates of evolution of all Type II reaction center proteins should be similar between closely related groups, thus corresponding to an autocorrelated model.

| Change in sequence identity as a function of time
As an approximation to the evolution of the core subunits of PSII, we

| Bayesian relaxed molecular clock analysis
The application of a Bayesian molecular clock analysis to the phylogeny of Type II reaction centers can be problematic because this was designed to deal with heterotachy of orthologs within and between lineages (Ho & Duchene, 2014) and thus they may not be able to perfectly model the variation in the rates of evolution across the long history of ancient paralogs. Therefore, the reader should interpret the reported age estimates as an approximation, as simulations  In this scenario, assuming that the earliest events in the history of photosynthesis started about 3.5 Ga, the divergence of D1 and D2 is set at about 3.2 Ga.
In the second scenario, we considered that the MRCA of Cyanobacteria occurs before the GOE as suggested by other molecular clock analyses (Falcon et al., 2010;Sanchez-Baracaldo, 2015;Schirrmeister et al., 2015). In the present work, this scenario can be fitted most satisfactorily with the application of a relaxed molecular clock using an empirical amino acid substitution model (LG + Γ). In this scenario, under a root prior of 3.5 Ga, the appearance of the ancestral standard form of D1 is set at about 2.6 Ga; and this has the consequence of pushing the divergence of D1 and D2 closer to the root, and thus D0 is set at about 3.4 Ga (Table 1 and 2, Figure 6). This effect is due to the fact that the phylogenetic distance between D1 and D2 is invariable, and thus under any scenario, the data are better explained by a long span of time separating D0 and the standard heterodimeric PSII. What can be concluded from this is that the older the MRCA of Cyanobacteria is, the more likely it is that the divergence of D1 and D2 started near the origin of photochemical reaction centers and thus, near the origin of photosynthesis. Our

| Rates of evolution
It should be observed that if a relatively constant rate of change were to be applied to the evolution of D1 and D2, the ancestral duplication would be placed long before the formation of the planet (dashed red line in Figure 2). Given the fact that the rate of evolution of D1 and D2 is constrained at slow rates from the Proterozoic until present F I G U R E 6 Scenarios for the evolution of Type II reaction centers. The rows of colored dots represent estimated divergence times of key nodes as highlighted in Figure 3 and calculated using the CAT + GTR + Γ and LG + Γ models and root priors of 3.8 or 3.5 ± 0.05 Ga. A highly oxidizing photosystem with enough power to split water is likely to have originated before the gene duplication event that led to D1 and D2 (orange dot). Making the MRCA of Cyanobacteria older (green arrow) pushes the earliest stages in the evolution of PSII and water oxidation closer to the origin of photosynthesis (orange arrow). The yellow vertical bar marks the GOE [Colour figure can be viewed at wileyonlinelibrary.com] times, the period of fast evolutionary change has to be located at the earliest stages to account for the evolution of photosynthesis within a reasonable timeframe. The rates calculated using the molecular clocks are in agreement with the observations based solely on a comparison of the level of sequence identity and revealed that even considering an origin of Type II reaction centers at 3.8 Ga the maximum rate during the evolution of D1 and D2, ν max , is more than thirty times greater than the measured rates since the Proterozoic.
The much faster rates required to place the early stages of reaction center evolution (including the D0 duplication) in the Paleoarchean are difficult to reconcile with the structural complexity inherited by all known reaction centers. This structural complexity should have subjected the rates of evolution to strong constraints from early on.
We hypothesize that such rates were only possible near the origin of reaction centers when life was still "figuring out" how to do photosynthesis for the first time.
There are other scenarios that can potentially account for the exponential decrease in the rates observed here, and these are further discussed in the Supporting information Discussion section "Rates of Evolution": briefly, (a) duplication followed by neofunctionalization (Innan & Kondrashov, 2010;Lynch & Conery, 2000), (b) a proposed exponential decrease in the temperature-dependent rate of deamination of cytosine on a warm early Earth (Lewis, Crayle, Zhou, Swanstrom, & Wolfenden, 2016), (c) greater exposure to UV radiation in the photic zone in the absence of an ozone layer (Cockell, 2000), and (d) a combination of the above.

| Was there water oxidation before D1 and D2?
Even before the gene duplication that allowed the divergence of D1 and D2, the ancestral homodimeric photosystem had enough oxidizing power to form the neutral tyrosyl radical: high enough to surpass the E m of water oxidation to oxygen. However, this does not necessarily imply that water oxidation was occurring at this time. Is there any evidence that would support or argue against an origin of water oxidation before the D1 and D2 duplication event?
Almost every major structural difference between anoxygenic Type II reaction center proteins and the core proteins of PSII can be explained in the context of water oxidation, protection against the formation of reactive oxygen species, and enhanced repair and assembly mechanisms due to oxidative damage from the formation of singlet oxygen around the photochemical pigments. A similar rationale has recently been proposed for the divergence of homodimeric Type I reaction centers of anoxygenic photosynthesis and heterodimeric Photosystem I of oxygenic photosynthesis (Orf, Gisriel, & Redding, 2018).
Five major structural differences distinguish D1 and D2 from L and M (Figure 7a and b). Starting from the N-terminus:

| The N-terminus itself (Figure 7c)
PSII is known to generate singlet oxygen, a very damaging form of reactive oxygen species that without control can lead to irreparable damage to the organism and death. Singlet oxygen is produced when molecular oxygen interacts with the excited triplet state of chlorophyll (Krieger-Liszkay, Fufezan, & Trebst, 2008;Rutherford et al., 2012;Vass & Cser, 2009). Triplet chlorophyll is in turn formed when excess harvested light energy cannot be efficiently dissipated or when the forward electron transfer reactions of PSII are blocked and instead a backflow of electrons occurs (back-reactions) (Krieger-Liszkay et al., 2008;Santabarbara, Bordignon, Jennings, & Carbonera, 2002). Thus, the unavoidable production of singlet oxygen by PSII results in rapid damage of the core proteins in such a way that the half-lifetime of D1 is about 30 min. D1 is known to be the protein with the fastest turnover in the photosynthetic membrane (Aro, Virgin, & Andersson, 1993). The half-lifetime of D2 is also relatively fast, measured at about 3 hours. In comparison, the halflifetime of Photosystem I core proteins is about 2 days (Yao, Brune, & Vermaas, 2012). Damaged D1 and D2 are degraded by dedicated FtsH proteases, which target and recognize the N-terminus of both subunits. Deletion of the N-terminus results in impairment of degradation and repair Krynicka, Shao, Nixon, & Komenda, 2015). The preserved sequence and structural identity at the N-terminus of both D1 and D2 suggests that the evolution of enhanced repair mechanisms had started to evolve before the duplication. Consistent with this, the evolution of all bacterial FtsH proteases confirms that the lineage of proteases specifically dedicated to the repair of PSII makes a monophyletic and deepbranching clade (Shao, Cardona, & Nixon, 2018). As is the case for the evolution of reaction center proteins, this deep-branching clade of PSII-FtsH proteases appeared to have diverged before the radiation of those found in all the other groups of phototrophs (Shao et al., 2018).

| A protein fold between the 1st and 2nd transmembrane helices (Figure 7d)
In D1 and D2, this region is made of 54 and 52 residues in comparison with 28 and 35 residues in L and M, respectively. This fold is enlarged in D1 and D2 to provide a site for protein-protein interactions with the small peripheral subunits and the extrinsic polypeptides (Cardona, 2015(Cardona, , 2016, none of which are present in anoxygenic Type II reaction centers. In D1, this site provides a connection to PsbI, M, T, and O; and in D2 to the cytochrome b 559 , PsbH, J, and X. The small subunits are necessary to support a more complex assembly and disassembly cycle due to much higher rates of repair (Komenda, Sobotka, & Nixon, 2012). They provide stability, help with photoprotective functions, assist with the photoassembly of the Mn 4 CaO 5 cluster (Dobakova, Tichy, & Komenda, 2007;Hamilton et al., 2014;Komenda, Lupinkova, & Kopecky, 2002;Popelkova & Yocum, 2011;Sugiura, Nakamura, Koyama, & Boussac, 2015), and even contribute to the highly oxidizing potential of PSII (Ishikita, Saenger, Biesiadka, Loll, & Knapp, 2006). The structural homology of CP43 and CP47 indicates that these originated from a gene duplication event making the homodimeric antenna to a homodimeric core (Cardona, 2016). The reason why CP47, and in particular CP43, interact with the donor side of PSII is an unsolved mystery given the fact that their main role is that of light harvesting. It can be rationalized however if water oxidation started in a homodimeric reaction center early during the evolution of photosynthesis (Cardona, 2017) [Colour figure can be viewed at wileyonlinelibrary.com] ChlZ D1 and ChlZ D2 . These peripheral pigments are absent in anoxygenic Type II reaction centers but are present in Type I reaction centers indicating that they existed before the divergence of D1 and D2 (Cardona, 2015). Both peripheral chlorophylls are required for photoautotrophic growth as mutations that impair their binding cannot assemble functional PSII (Lince & Vermaas, 1998;Ruffle et al., 2001).
ChlZ D1 and ChlZ D2 are each in direct contact with a beta-carotene molecule, known as Car D1 and Car D2 respectively, seen using crystallography first by Ferreira et al. (2004) and Loll, Kern, Saenger, Zouni, and Biesiadka (2005), but detected and characterized by spectroscopy well before that; see for example (Hanley, Deligiannakis, Pascal, Faller, & Rutherford, 1999;Kwa, Newell, van Grondelle, & Dekker, 1992;Noguchi, Mitsuka, & Inoue, 1994). The position of Car D1 and Car D2 differs in that the former is positioned perpendicular to the membrane plane while the latter is parallel to the membrane plane: However, one of the beta-rings of each carotenoid links to ChlZ D1 and ChlZ D2 via strictly conserved tryptophan residues (D1-W105 and D2-W104, respectively), located in the unique protein fold between the 1st and 2nd transmembrane helices described above, and therefore absent in L and M. Car D2 is tilted relative to Car D1 partly to give way to the exchangeable plastoquinone, Q B . The core carotenoids of PSII have been shown to contribute little to light harvesting and have dominantly a protective role (Stamatakis, Tsimilli-Michael, & Papageorgiou, 2014). The close association of ChlZ D1 and ChlZ D2 with carotenoids would suggest a role in protection, by quenching chlorophyll triplet states or directly scavenging singlet oxygen (Cogdell et al., 2000;Telfer, 2002). A role for the direct scavenging of singlet oxygen for both ChlZ D1 -Car D1 and ChlZ D2 -Car D2 has been suggested based on spectroscopy of isolated reaction centers (Telfer, Dhami, Bishop, Phillips, & Barber, 1994). Furthermore, ChlZ D2 -Car D2 have been demonstrated to be involved in protective electron transfer side pathways within PSII. For a detailed overview of these pathways see for example (Faller, Fufezan, & Rutherford, 2005). That ChlZ D1 and ChlZ D2 have been retained since before the divergence of Type I and Type II reaction centers indicates that they predate the D1 and D2 divergence. The acquisition of closely interacting carotenoids seems to have occurred therefore after the K and D0 divergence, but before the D1 and D2 split, in support of water oxidation before heterodimerization. However, carotenoids at a similar position to Car D1 and Car D2 have recently been identified in the homodimeric Type I reaction center of Heliobacteria (Gisriel et al., 2017) suggesting that these may predate the Type I/Type II split (Figure 8).

| An extended loop between the 4th and the 5th transmembrane helices (Figure 7f)
This is required for the coordination of bicarbonate, a ligand to the non-heme iron (Ferreira et al., 2004), which is a distinctive feature of PSII. In anoxygenic Type II reaction centers, the nonheme iron is coordinated asymmetrically by a glutamate from the M subunit. There is significant sequence and structural conservation of the bicarbonate binding site in D1 and D2. Two strictly conserved tyrosine residues D1-Y246 and D2-Y244 provide symmetric hydrogen bonds to bicarbonate (Ferreira et al., 2004). This indicates that bicarbonate binding was a feature existing before F I G U R E 8 Carotenoids in the reaction center core. (a) Overlap of the M subunit of the anoxygenic Type II reaction center from Thermochromatium tepidum (light blue) with the D2 subunit of PSII from Thermosynechococcus vulcanus (gray) and the PshA subunit of the homodimeric Type I reaction center of Heliobacterium modesticaldum (orange), PDB ID: 5v8k (Gisriel et al., 2017). The protein backbone is showed in transparent ribbons, some of the photochemical pigments are displayed as thin sticks, and the closest carotenoids to the core are shown as thick sticks. SX stands for spirilloxanthin and DN for 4,4′-diaponeurosporene. Car D2 is the core beta-carotene bound by D2. Both SX and Car D2 have been demonstrated to have photoprotective roles, while the role of DN in the homodimeric Type I reaction center has not been demonstrated yet. However, based on its structural position overlapping with half of Car D2 and in Van der Waals contact with several bacteriochlorophyll g molecules, it can be predicted that it also has a photoprotective role. P denotes the photochemical "special pair" pigments. (b) A rotated view of the position of the carotenoids in the overlapped structures. The protein backbone has been hidden for clarity. ChlZ D2 and the bacteriochlorophyll g molecule, GBF 1024 , occupy homologous positions [Colour figure can be viewed at wileyonlinelibrary.com] the divergence of D1 and D2. The role of bicarbonate had been a long-standing mystery, but recently it was shown that binding and unbinding of bicarbonate modulates the E m of the quinones, working as a switch from a productive state into a protective state that prevents chlorophyll triplet state and singlet oxygen formation (Brinkert, De Causmaecker, Krieger-Liszkay, Fantuzzi, & Rutherford, 2016). Previously, G. N. Johnson, Rutherford, and Krieger (1995) had shown that a shift in the E m of the fixed quinone, Q A , plays a key role in protection of PSII during assembly of the Mn 4 CaO 5 cluster, a light-driven process. It is understood now that such a shift is mediated by bicarbonate binding (Brinkert et al., 2016). A further conclusion from this is that the ancestral photosystem made of a D0 homodimer had already evolved to incorporate bicarbonate-mediated protective mechanisms as well, implying oxygen evolution, and by extension, the assembly of a primordial water-oxidizing complex.

| An extended C-terminus and the Mn 4 CaO 5 cluster binding site (Figure 7g)
D1 and D2 share an extended C-terminus made of about 50 residues and with a distinctive alpha-helix parallel to the membrane plane. The C-terminus is necessary for the coordination of the Mn 4 CaO 5 cluster, Clbinding, water channels, and proton pathways (Debus, 2001;Linke & Ho, 2013;Umena, Kawakami, Shen, & Kamiya, 2011). Remnants of this C-terminal extension may be found in some of the M and L subunits of phototrophic Proteobacteria and Chloroflexi (Cardona, 2015), but see Figure 7a.
In D1 H332, E333, D342, and A344 coordinate the Mn 4 CaO 5 cluster ( Figure 7h). H337 provide a hydrogen bond to one of the bridging oxygens. In addition, E354 from the CP43 antenna subunit coordinates two of the Mn atoms and R357 offers a hydrogen bond to another bridging oxygen. While there is not a cluster in D2, an examination of the donor side in the immediate vicinity of the redox active tyrosine shows that the site has been blocked by a number of phenylalanine residues (Svensson, Vass, & Styring, 1991). Every ligand to the cluster is matched by a phenylalanine residue in D2 (Figure 7h). In the CP47 antenna, the ligands found in the CP43 are also replaced by phenylalanine residues (Figure 7h and i). The only exception is a histidine in a position equivalent to H337, which perhaps not coincidentally, provides a hydrogen bond to a water molecule locked within the hydrophobic patch made by the phenylalanine residues. No such phenylalanine patch is found in any other reaction center protein, except D2 (Cardona, 2016). The presence of a redox tyrosine in D2 and what seems like a vestigial metal-binding site would be puzzling if the wateroxidizing cluster evolved after the divergence of D1 and D2 in a heterodimeric system, but it would make sense if a primordial water-oxidizing cluster appeared first on both sides of the reaction center in the ancestral homodimeric photosystem.
In conclusion, based on the above structural and functional evidence we find it highly likely that water oxidation originated before the divergence of D1 and D2. Hence, in a homodimeric Type II ancestor containing two identical exchangeable quinones charge separation would result in enhanced back-reactions caused by electron transfer to a quinone site when empty or when occupied by a reduced form of the quinone; and additionally, by shorter backreaction pathways (Cardona et al., 2012;Rutherford et al., 2012).
Back-reactions would give rise to chlorophyll triplet states in the heart of the reaction center (Dutton, Leight, & Seibert, 1972;Rutherford, Paterson, & Mullet, 1981). In the evolution of a watersplitting homodimeric ancestor with an increased oxidizing potential, as mentioned above, avoidance of photodamage could be a significant selective pressure for heterodimerization, as the enhanced back-reactions intrinsic to the homodimeric Type II reaction centers would have become a major problem only if oxygen were present. In addition, two primordial clusters on each side of the reaction center would double the chance of forming back-reacting intermediary states driving forward heterodimerization (Keren, Ohad, Rutherford, Drepper, & Krieger-Liszkay, 2000). An inefficient homodimeric water-splitting photosystem would have encountered this problem first and thus come under strong selection pressure toward heterodimerization at an early time. The results of the present work fit with this picture and indicate that this transitional water-oxidizing homodimeric state was very short-lived.
Our present finding that the duplication leading to L and M occurred significantly later in anoxygenic Type II reaction centers opens the possibility that oxygen could have also been a driving force in their heterodimerization process, since K would have encountered these selection pressures at a time when oxygen concentrations began to rise or fluctuate in localized environments during the late Archean (Bosak, Liang, Sim, & Petroff, 2009;Lyons, Reinhard, & Planavsky, 2014;Riding, Fralick, & Liang, 2014;Wang et al., 2018). Mirroring the evolution of Type II reaction centers, a molecular clock study on Type I reaction centers showed that the duplication event that led to the heterodimerization of the core of Photosystem I was also more likely to be the oldest node after the root (Cardona et al., 2012). This duplication event is widely accepted to have been an evolutionary adaptation to oxygenic photosynthesis (Ben-Shem, Frolow, & Nelson, 2004;Hohmann-Marriott & Blankenship, 2008;Rutherford et al., 2012) and was found to predate the earliest diversification event of anoxygenic Type I reaction centers (Cardona, 2018); namely, the divergence of the reaction center of Heliobacteria from that which gave rise to those in phototrophic Chlorobi and Acidobacteria.
It is rather remarkable that the anoxygenic Type II reaction center of phototrophic Proteobacteria contains an asymmetrically located carotenoid in contact with a core bacteriochlorophyll (Deisenhofer & Michel, 1989), see Figure 8. The role of this carotenoid is to quench bacteriochlorophyll triplet states to prevent the formation of singlet oxygen (Cogdell et al., 2000). A carotenoid with a similar position to Car D1 and Car D2 in PSII has been now discovered in the structure of a homodimeric Type I reaction center (Gisriel et al., 2017).

| FINAL REMARK S
The link to carbon fixation is of particular importance since CO 2 levels in the atmosphere were higher than now (Nutman, Bennett, & Friend, 2017;Sheldon, 2006  and Subtree Pruning and Regrafting method, with the tree topology optimized using five random starts. The ML tree using all sequences is shown in Figure 1 and it replicates earlier evolutionary studies of Type II reaction centers that used simpler methods and fewer sequences (Beanland, 1990;Cardona, 2015).

| Bayesian relaxed molecular clock and fossil calibrations
A total of 54 bacterial sequences spanning the entire diversity of Type II reaction centers were selected for Bayesian molecular clock analysis, including atypical and standard forms of D1, Alpha-, Beta-, Gammaproteobacteria, Chloroflexales, and Gemmatimonas phototrophica. Furthermore, 10 D1 and 10 D2 sequences from photosynthetic eukaryotes from taxa with a well characterized fossil record were added to allocate calibration points. The phylogeny of Type II reaction centers was cross-calibrated on D1 and D2 as listed in Table 5 and calibration points were assigned as presented in Figure 3, red dots.
Dates for the Arabidopsis/Populus divergence, the divergence of the angiosperms (Amborella), gymnosperms (Cycas), and land plants  Sims, Mann, and Medlin (2006). A minimum age of 600 Ma based on a Late Neoproterozoic Chinese multicellular red alga fossil (Xiao, Knoll, Yuan, & Pueschel, 2004) was assigned to the split between the diatom sequences and the sequences from Porphyra purpurea, as a conservative estimate for the divergence of complex red algae, which predates this time (Yang et al., 2016). The oldest calibration point in photosynthetic eukaryotes was assigned as a minimum age of 1.2 Ga to the divergence of the sequences from Cyanidium caldarium a unicellular early-branching red algae. This was used as a conservative estimate for the origin of photosynthetic eukaryotes.
Pleurocapsales are characterized by multiple fissions during cell division and fossils retaining this morphology have been described at 1.7 Ga (Golubic & Lee, 1999;Sergeev, Gerasimenko, & Zavarzin, 2002). The earliest well-assigned filamentous Cyanobacteria fossils of comparable size to those of the early-branching Pseudanabaena have been reported at 1.9 Ga (Golubic & Lee, 1999;Sanchez-Baracaldo et al., 2017;Schirrmeister et al., 2016;Sergeev et al., 2002), and this was assigned as a minimum age to the sequences from Pseudanabaena biceps.
The oldest calibration point, point 11, was selected to be the branching point of the D2 and the G4 D1 from Gloeobacter violaceous. This was set to be around the age for the GOE and was assigned as a minimum age of 2.45 Ga (Calibration 1) (Bekker et al., 2004). For comparison, a calibration of 2.7 Ga was also used (Calibration 2) to test the effect on the estimated divergence times of an older age for crown group Cyanobacteria. Geological evidence suggests that oxygen "whiffs" or "oases" could significantly predating the GOE (Havig et al., 2017;Lyons et al., 2014;Planavsky et al., 2014;Wang et al., 2018) so this scenario is not entirely implausible. The calibration points on D1 were allocated on Group 4 because this type of D1 is the only one retained in all Cyanobacteria with PSII, it is the only type of D1 inherited by photosynthetic eukaryotes, and it is the main D1 used to catalyze water oxidation under most growth conditions: see Cardona et al. (2015) for a detailed analysis of the evolution of D1 proteins. It should be noted therefore that the duplications leading to all other forms of D1 occurred before the most recent common ancestor of Cyanobacteria .
It is well accepted that a form photosynthesis had already evolved by 3.5 Ga, which is usually assumed to be anoxygenic. This is demonstrated by both sedimentological and isotopic evidence for photoautotrophic microbial communities recorded in Paleoarchean rocks (Butterfield, 2015;Nisbet & Fowler, 2014;Tice & Lowe, 2004). In addition, sedimentary rocks and banded iron formations from Isua, Greenland, hint at the presence of photosynthetic bacteria in the marine photic zone as early as 3.7-3.8 Ga (Czaja et al., 2013;Grassineau, Abell, Appel, Lowry, & Nisbet, 2006;Knoll, 2015;Rosing, 1999;Rosing & Frei, 2004;Schidlowski, 1988). Therefore, we used a root prior of 3.5 Ga as a conservative estimate for photoautotrophy based on photochemical reaction centers. Nevertheless, because it is not yet known exactly when photosynthesis originated for the first time we also tested the effect of varying the root prior from 3.2 to 4.1 Ga on the estimated divergence time under restrictive and flexible scenarios.
A Bayesian Markov chain Monte Carlo approach was used to calculate the estimated divergence times. We used Phylobayes 3.3f, which allows for the application of a relaxed log-normal autocorrelated molecular clock under the CAT + GTR + Γ model (Lartillot & Philippe, 2004;Lartillot et al., 2009) necessary for the implementation of flexible boundaries on the calibration points (Yang & Rannala, 2006). To understand the effect of different evolutionary models on the age estimates we compared the CAT + GTR + Γ model with (a) a LG + Γ model that sets less flexible boundaries on the calibration points, (b) a CAT + Γ model assuming a uniform (Poisson) distribution of amino acid equilibrium frequencies, or (c) an uncorrelated gamma model where the rates of substitution can vary independently. The flexible bounds on the CAT + GTR + Γ model were set to allow for 2.5% tail probability falling outside each calibration boundary or 5% in the case of a single minimum boundary. All molecular clocks were computed using four discrete categories for the gamma distribution and four chains were run in parallel until convergence.
In this work, we define the period of time between the duplication event that led to the divergence of D1 and D2 and the appearance of the ancestral standard D1 as ΔT. This value is calculated as the subtraction of the mean age of the latter node (Figure 3, green dot) from the former's mean node age (Figure 3, D0, orange dot). The instant rates of evolution, which are necessary for the computation of divergence time from the phylogeny, were retrieved from the output files of Phylobayes. These rates are calculated by the software as described by the developers elsewhere (Kishino, Thorne, & Bruno, 2001;Lepage et al., 2007) and are expressed as amino acid changes per site per unit of time. The rate at node D0 was termed ν max and a baseline rate of evolution during the Proterozoic was obtained as the average of all node rates in Group 4 D1 and D2 and denoted ν min . All sequence datasets and estimated divergence times for each node of each tree used in this analysis are provided in the Supporting information.