Primordial‐like enzymes from bacteria with reduced genomes

The first cells probably possessed rudimentary metabolic networks, built using a handful of multifunctional enzymes. The promiscuous activities of modern enzymes are often assumed to be relics of this primordial era; however, by definition these activities are no longer physiological. There are many fewer examples of enzymes using a single active site to catalyze multiple physiologically‐relevant reactions. Previously, we characterized the promiscuous alanine racemase (ALR) activity of Escherichia coli cystathionine β‐lyase (CBL). Now we have discovered that several bacteria with reduced genomes lack alr, but contain metC (encoding CBL). We characterized the CBL enzymes from three of these: Pelagibacter ubique, the Wolbachia endosymbiont of Drosophila melanogaster (wMel) and Thermotoga maritima. Each is a multifunctional CBL/ALR. However, we also show that CBL activity is no longer required in these bacteria. Instead, the wMel and T. maritima enzymes are physiologically bi‐functional alanine/glutamate racemases. They are not highly active, but they are clearly sufficient. Given the abundance of the microorganisms using them, we suggest that much of the planet's biochemistry is carried out by enzymes that are quite different from the highly‐active exemplars usually found in textbooks. Instead, primordial‐like enzymes may be an essential part of the adaptive strategy associated with streamlining.


Introduction
Over 40 years ago, Yčas and Jensen independently proposed similar scenarios for the evolution of metabolic pathways (Yčas, 1974;Jensen, 1976). They envisaged primordial organisms with small genomes and, therefore, small numbers of protein-encoding genes. They proposed that the only way such organisms could carry out the metabolic biochemistry required for life would be if many of the proteins were multifunctional. Each author reached the conclusion that primordial enzymes were likely to have exhibited broad substrate specificities and/ or to have catalyzed classes of related reactions. From this starting point, gene duplication and divergence could give rise to the larger genomes and highly specific enzymes that epitomize modern organisms.
This model has gained considerable support from the finding that many, probably most, and perhaps all modern enzymes are promiscuous (Khersonsky and Tawfik, 2010). That is, modern enzymes have the ability (often weak) to catalyze reactions or to act on substrates that are different from those required physiologically. While the multiple activities of primordial enzymes were all assumed to be required for the viability of primordial cells, the promiscuous activities of modern enzymes are, by definition, not physiologically relevant (Copley, 2015). Nevertheless, these promiscuous activities are often assumed to be relics of their multifunctional ancestors, and the homologous members of enzyme superfamilies often share overlapping promiscuous activities (Glasner et al., 2006;Khersonsky and Tawfik, 2010;Copley, 2015). Beginning 16 years ago (Matsumura and Ellington, 2001), many practitioners of directed evolution have also observed ancestor-like intermediates, with broadened substrate specificities, on mutational trajectories that traverse enzymatic functions. More recently, phylogenetic approaches have been used to resurrect ancestral enzymes with broad specificities (Voordeckers et al., 2012;Risso et al., 2013). These approaches have provided enormous insight into enzyme evolution. However, they also suffer from either studying an activity that is no longer physiologically relevant (i.e., promiscuity), or studying a single enzyme outside of its metabolic context.
Our long-term goal is to assess the viability of the Yčas/Jensen model of primordial metabolism. Can a cell function with only a minimal set of multifunctional enzymes, and what would this set comprise? What are the structural, functional, dynamic and regulatory characteristics of the enzymes in such a cell? And can we recapitulate the evolutionary trajectory from this cell, to something that more closely resembles modern microbial life? As a first step toward answering these questions, here we have sought to discover primordial-like, multifunctional enzymes from extant bacteria. The defining characteristic of these enzymes is that a single active site is responsible for catalyzing two or more reactions, each of which are required for the growth of the organism. Further, the evolutionary model of Yčas and Jensen predicts that the multiple activities of a primordial-like enzyme are likely to be carried out by separate, specialized enzymes in most modern-day organisms.
Only a small number of primordial-like enzymes have been reported to date. Perhaps the best characterized is PriA, a bi-functional isomerase originally found in Mycobacterium tuberculosis and Streptomyces coelicolor, which catalyzes the reactions carried out by two separate enzymes (HisA and TrpF) in most bacteria (Barona-G omez and Hodgson, 2003;Due et al., 2011). An example from archaeal and deep-branching bacterial lineages is the bi-functional fructose 1,6-bisphosphate aldolase/phosphatase, which remodels its active site to catalyze two consecutive steps in gluconeogenesis (Say and Fuchs, 2010;Du et al., 2011). The TrpF enzyme from Chlamydia trachomatis has also been shown to play a second physiological role in folate biosynthesis (Adams et al., 2014).
Recently, we showed that the cystathionine b-lyase (CBL) from Escherichia coli has promiscuous alanine racemase (ALR) activity (Soo et al., 2016). CBL is encoded by the metC gene and catalyzes the penultimate step in methionine biosynthesis, in which belimination of cystathionine yields homocysteine (Fig.  1A). ALR catalyzes the interconversion of L-alanine and D-alanine (Fig. 1B), with the latter being required for peptidoglycan biosynthesis. While both enzymes utilize the cofactor pyridoxal 5 0 -phosphate (PLP), they do not share any sequence or structural similarities. Of the seven protein folds that are known to bind PLP (Raboni et al., 2010), CBL adopts fold type I, while ALR adopts fold type III (Fig. 1). A parsimonious explanation of our previous results was that modern CBL enzymes are descended from a bi-functional CBL/ALR ancestor, but that the alanine racemase function has been rendered vestigial in lineages (such as the one leading to E. coli) that gained a non-homologous ALR specialist. was absent. We have characterized the CBL enzymes from three of these species: Pelagibacter ubique, the Wolbachia endosymbiont of Drosophila melanogaster and Thermotoga maritima. In doing so, we have also shed new light on the pathways of methionine biosynthesis in these organisms, particularly T. maritima. Together, our results suggest that free-living bacteria with reduced genomes are the best models for studying primordial enzymology and metabolism.

Results
The search for multifunctional CBL/ALR enzymes We began by searching for bacterial species with a requirement for peptidoglycan, a metC gene, but with no alr gene. We reasoned that these criteria would maximize the likelihood of identifying a metC-encoded enzyme acting either as the physiological alanine racemase, or as both CBL and ALR in vivo.
The annotated lists of proteins (feature tables) for each bacterial genome in the NCBI database were downloaded and parsed for the presence of alr and metC homologues, based on annotations in the Clusters of Orthologous Groups (COG) database (Galperin et al., 2015). While we were particularly interested in metC genes, it is difficult to distinguish metC from metB (encoding cystathionine g-synthase, CGS) by sequence alone (Ferla and Patrick, 2014). As a result, these two genes are grouped into a single COG. At the time of our survey (August 2010), there were 1023 fully sequenced and annotated genomes to analyze. The presence or absence of alr (COG0787) and metC/metB (COG0626) in each genome was visualized on a tree that was based on taxonomy, in which the branches of taxa sharing the same pattern were collapsed (Fig. 2). Only six taxa (colored yellow in Fig. 2) met our criteria of possessing metC/metB but not alr. Of these, the Planctomycetes were thought to lack peptidoglycan (Fuerst and Sagulenko, 2011), although this view has recently been updated van Teeseling et al., 2015). Thermomicrobium roseum possesses a thin peptidoglycan layer with unusual features (Wu et al., 2009b), and the peptidoglycan status of the endosymbiotic gammaproteobacteria that lack alr but contain metC/metB is unclear.
We narrowed our search to the three remaining taxa: the alphaproteobacterial orders Pelagibacterales and Rickettsiales (sensu Ferla et al., 2013); and the genus Thermotoga. Members of all three taxa are characterized by having reduced genomes. The fine structure of the Thermotoga maritima peptidoglycan has been determined biochemically, and it is known to contain Dalanine (Boniface et al., 2009). The well-characterized alr. The tree is a taxonomic tree based on NCBI annotation and the clades are colored based on the presence or absence of metC/metB (COG0626) and alr (COG0787). Taxa in green have metC/metB and alr. Taxa in blue lack metC/metB, but have alr. Taxa in yellow lack alr, but have metC/metB. Taxa in red lack both genes. Taxa with alr, but with mixed presence/absence of metC/ metB, are in grey. Taxa with metC/metB but without alr (yellow) are also labelled in larger type. member of the Pelagibacterales is Pelagibacter ubique, the genome sequence of which confirms that it synthesizes peptidoglycan (Giovannoni et al., 2005). Within the Rickettsiales, we chose to focus on the Wolbachia endosymbiont of D. melanogaster (wMel). Members of the genus Wolbachia are intracellular parasites of arthropods and nematodes. While they do not require a stress-bearing cell wall, it has recently been shown that they require the Lipid II component of peptidoglycan (containing D-alanine) for cell division (Vollmer et al., 2013). Most interestingly, the same authors also showed that the Wolbachia endosymbiont of Brugia malayi possesses a CBL that catalyzes alanine racemization, although no kinetic parameters were reported for this enzyme.

Cloning, expression and purification
The metC genes from P. ubique HTCC1062 (locus tag SAR11_RS04165) and wMel (locus WD_RS04170; Wu et al., 2004) were synthesized with their codons optimized for heterologous expression in E. coli. The corresponding gene in T. maritima MSB8 (locus TMAR-I_RS06470) is variously identified as either metB (Latif et al., 2013) or metC (Pysz et al., 2004) so we amplified TMARI_RS06470 from genomic DNA and investigated both possible functions.
To begin, all three genes were cloned into the pBAD expression vector. The His 6 -tagged T. maritima enzyme (TmCBL) and Wolbachia enzyme (wMelCBL) were produced solubly, with yields routinely exceeding 20 mg of purified protein per liter of culture medium. However, numerous attempts to optimize expression of the P. ubique enzyme (PuCBL) in a soluble form were unsuccessful. Eventually, we sub-cloned P. ubique metC into the pMAL vector, for expression fused to maltose binding protein (MBP). This greatly improved solubility and resulted in yields of >10 mg of purified fusion protein per liter of culture. However, proteolytic cleavage of the MBP fusion partner from PuCBL immediately led to its precipitation in any buffer system that we tested. Therefore, we conducted all functional tests on the MBP-PuCBL fusion protein.

Complementation tests
In vivo complementation assays were performed as an initial test for the CBL and ALR activities of the three metC gene products. The expression vectors for TmCBL, wMelCBL and MBP-PuCBL were each used to transform the E. coli DmetC strain from the Keio collection (Baba et al., 2006), and also the D-alanine auxotroph E. coli MB2795 (Dalr DdadX) (Soo et al., 2016).
Given that TmCBL had been annotated as a cystathionine g-synthase (CGS), we also tested each enzyme for its ability to complement the methionine auxotrophy of E. coli DmetB. Complementation tests were carried out at 288C and 378C, as the growth temperatures of wMel and P. ubique are below 308C and we hypothesized that their enzymes may be thermolabile. However, we observed no differences in the rates of colony formation at the lower temperature.
At 378C, expression of each enzyme rescued the alanine racemase knockout, E. coli MB2795, as quickly as expressing E. coli ALR itself (Table 1). Neither TmCBL, nor either of the other enzymes, was able to rescue E. coli DmetB. In contrast, all three enzymes were able to complement the methionine auxotrophy of E. coli DmetC, albeit by taking 5-14 days to effect colony formation (Table 1). These data provided the first qualitative indication that T. maritima, P. ubique and wMel all possess metC genes that encode bi-functional CBL/ALR (but not CGS) enzymes.

Kinetic analysis of CBL and ALR activities
The three enzymes were purified and assayed for CBL and ALR activity (in the physiologically relevant L-Ala ! D-Ala direction), as described previously for the promiscuous E. coli CBL enzyme (Soo et al., 2016). In the case of MBP-PuCBL, cystathionine b-elimination was readily detectable, with an overall catalytic efficiency (k cat /K M ) of 470 s 21 M 21 (Table 2). It is possible that this is an underestimate of the true activity, as the effect of the MBP fusion partner on the activity of the enzyme is unknown (but it is unlikely to be rate-enhancing). The ALR activity of MBP-PuCBL was 40-fold lower than its CBL activity, reflecting both a lower turnover number (k cat ) and a higher Michaelis constant (K M , i.e., the substrate concentration required for half the maximum reaction rate) for the ALR reaction (Table 2). The wMelCBL enzyme showed an opposite pattern of activities to MBP-PuCBL. Consistent with the auxotroph complementation data (Table 1), its cystathionine belimination activity was barely detectable above the level of background noise. While extensive controls demonstrated that the activity was present, our estimates of the k cat and K M values for three batches of enzyme were consistently low, and somewhat variable (Table 2). In contrast, wMelCBL was more efficient as an alanine racemase (k cat /K M 5 580 s 21 M 21 ) and this ALR activity was readily quantified.
In spite of being sourced from a thermophile, we began by assaying TmCBL under the same conditions as the other two enzymes (i.e., 378C, TrisÁHCl buffer, pH 8.8). Under these conditions, it was the most active CBL, with a higher k cat and a lower K M than MBP-PuCBL, and a catalytic efficiency of k cat /K M 5 5800 s 21 M 21 (Table 2). Conversely, its ALR activity (k cat / K M 5 5.2 s 21 M 21 ) was even weaker than that observed for MBP-PuCBL (Table 2).
Next we sought to test the multifunctionality of TmCBL at a more physiologically-relevant temperature. Given the large effect of temperature on the pH of solutions buffered with Tris, we switched to Bicine buffer (DpH 5 20.018 per degree Celsius) and tested CBL specific activity over the temperature range 35-858C, in solutions that were pH 8.0 at each temperature. The results showed that TmCBL has a temperature optimum of 708C under these conditions (Fig. 3). To allow robust comparisons between the two different buffer conditions, we determined the steady state kinetic constants in Bicine, pH 8.0, at both 378C and 708C (Table 2). At 378C, TmCBL was marginally less active as a CBL in Bicine than in Tris. As expected, the enzyme was more active at 708C, although a threefold increase in k cat was offset by a small increase in K M , meaning that the overall increase in k cat /K M was only 1.6-fold (Table 2). We also implemented a discontinuous assay that allowed us to estimate ALR activity in Bicine buffered solutions at 378C and 708C. As before, this activity was weak, although a small decrease in k cat and a larger decrease in K M meant that catalytic efficiency increased eightfold from 0.79 s 21 M 21 at 378C to 6.3 s 21 M 21 at 708C (Table 2).

An unexpected third enzymatic activity
During our bioinformatics investigations, we noticed that murI, encoding glutamate racemase, was absent from the genomes of P. ubique, wMel and T. maritima. D-Glutamate is an essential component of peptidoglycan  of TmCBL. Bicine-buffered solutions were prepared such that they were pH 8.0 at each temperature. TmCBL activity was measured with a single concentration of the substrate, cystathionine (1.5 mM). The data plotted are mean 6 standard error for technical triplicates. (Schleifer and Kandler, 1972;Vollmer et al., 2008). However, all characterized glutamate racemase (GLR) enzymes use two active site cysteine residues, and not the PLP cofactor, to effect catalysis (Tanner, 2002). Canonical GLR enzymes are unrelated in sequence, structure or mechanism to either CBL or ALR. Nevertheless our bioinformatics suggested that glutamate racemization must be catalyzed by an alternative enzyme, so we tested the three CBL enzymes for GLR activity as well. MBP-PuCBL did not possess detectable GLR activity, but the other two enzymes did. Control reactions showed that the E. coli CBL also has no GLR activity. To the best of our knowledge, wMelCBL and TmCBL are the first PLP-dependent enzymes shown to racemize glutamate. The GLR activity of wMelCBL was very weak: approximately threefold less efficient than cystathionine b-elimination; and 28-fold less efficient than alanine racemization (Table 2). Conversely, TmCBL was found to be a substantially better GLR than an ALR ( Table 2). As observed for CBL activity, both k cat and K M increased with temperature, such that these parameters are comparable for the CBL and GLR reactions at 708C and the overall difference in k cat /K M for these two activities is only 11-fold at this temperature.

Methionine biosynthesis in T. maritima
We started this work with a simple evolutionary and biochemical hypothesis: that some organisms may use their CBLs as bi-functional CBL/ALR enzymes. The discovery that TmCBL has very weak ALR activity but much stronger GLR activity, and the confusion around the gene encoding it (annotated as either metB or metC) caused us to examine the likely physiological role of this enzyme in more detail.
In the standard trans-sulfurylation pathway for methionine biosynthesis, as utilized by E. coli, cystathionine gsynthase (encoded by metB) produces cystathionine which is then cleaved by CBL to produce homocysteine (Ferla and Patrick, 2014). Our complementation tests suggested that TmCBL is indeed a metC-encoded CBL, and not a metB-encoded CGS. Regardless, we set out to determine biochemically whether TmCBL possessed CGS activity in vitro.
A complication arose because the activated substrate of E. coli CGS is O-succinyl-L-homoserine (produced from succinyl-CoA by the protein product of its metA gene), whereas in most bacteria (Ferla and Patrick, 2014), including T. maritima (Goudarzi and Born, 2006), metA encodes an enzyme that produces O-acetyl-Lhomoserine from acetyl-CoA. The T. maritima enzyme has 30-fold greater activity with acetyl-CoA than succinyl-CoA (Goudarzi and Born, 2006), so O-acetyl-Lhomoserine would be the preferred substrate for any T. maritima CGS enzyme. However, O-acetyl-L-homoserine is not commercially available, so we synthesized it according to a previous scheme (Nagai and Flavin, 1971). We note that we are the first to report a complete set of analytical chemical data on the compound synthesized by this route, which is included in the relevant Methods sub-section (vide infra).
TmCBL was incubated with either O-acetylhomoserine or O-succinylhomoserine, plus L-cysteine (the second substrate used by CGS enzymes), and the reaction products were analyzed for the presence of cystathionine by mass spectrometry. None was detected ( Fig. 4A and B), even after a greatly extended incubation period of 16 h. We determined that the lower limit of detection for a cystathionine standard in our mass spectrometry protocol was 0.5 nmol (Supporting Information Fig. S1). From this it is possible to estimate that had CGS activity been present, we would have detected it if the TmCBL in the assay was catalyzing cystathionine formation at any rate greater than 1.7 3 10 24 turnovers per active site per second ($0.6 turnovers per hour). This upper bound on the putative CGS activity of TmCBL is >50 000-fold lower than the turnover number associated with CBL activity under the same conditions (k cat 5 9.2 s 21 ; Table  2). Indeed, the fact that TmCBL did not produce any detectable cystathionine (using either O-acetylhomoserine or O-succinylhomoserine as a substrate) makes it extremely unlikely that the enzyme has any CGS activity at all, and rules out a physiologically-relevant role for the enzyme in cystathionine biosynthesis.
Genome analysis suggests that T. maritima possesses a metY gene to compensate for the absence of metB. The enzyme encoded by metY, O-acetylhomoserine thiolase, usually catalyzes the direct production of homocysteine from O-acetylhomoserine and hydrogen sulfide (H 2 S), thus bypassing CGS and CBL completely (Ferla and Patrick, 2014). We expressed and purified the T. maritima O-acetylhomoserine thiolase, to test whether it possessed the expected activity. We also tested whether it might catalyze the closely related reaction, observed in Bacillus subtilis (Auger et al., 2002), in which cystathionine is produced from O-acetylhomoserine and Lcysteine. Using mass spectrometry, we observed the enzyme-catalyzed formation of homocysteine from Oacetylhomoserine and H 2 S. However, we found no evidence for the formation of cystathionine from O-acetylhomoserine and L-cysteine ( Fig. 4C and D).
The absence of metB and the inability of O-acetylhomoserine thiolase to utilize cysteine as a substrate means that cystathionine is never synthesized by T. maritima. In the absence of this substrate, there appears to be no physiological role for the CBL activity of TmCBL. Instead, methionine biosynthesis occurs via the one-step conversion of O-acetylhomoserine and H 2 S into homocysteine, catalyzed by O-acetylhomoserine thiolase.
Is TmCBL the physiological ALR?
Our results implied that metC is being maintained in the genome of T. maritima because of its role in peptidoglycan biosynthesis, and not methionine biosynthesis. However, the weak alanine racemase activity of TmCBL (Table 2) led us to question whether such an inefficient enzyme was a reasonable candidate for performing such a critical physiological task. Specialist alanine racemase enzymes typically have k cat /K M values of greater than 10 4 s 21 M 21 (Patrick et al., 2002;Soo et al., 2016). At k cat /K M 5 65 s 21 M 21 (Soo et al., 2016), even the non-physiological, promiscuous ALR activity of E.
coli CBL is an order of magnitude greater than the ALR activity of TmCBL. Thus, we returned to bioinformatics to identify alternative alanine racemase candidates.
Potential alanine racemase genes were identified by performing BLASTP searches of known racemases against the fully sequenced genome (Latif et al., 2013). D-Alanine can also be formed from D-glutamate by the action of a D-amino acid transaminase, so we also searched for homologues of this enzyme. These searches turned up yggS (locus tag TMARI_RS08860, a distant homologue of alanine racemase), ilvA (TMAR-I_RS01820, threonine dehydratase that is a homologue of human serine racemase), dapF (TMARI_RS07765, diaminopimelate epimerase that interconverts L,Land meso-diaminopimelate), ilvE (TMARI_RS04255, branched chain amino acid aminotransferase that is a homologue of D-amino acid transaminases) and TMAR-I_RS02240 (encoding a homologue of the larA lactate racemase). Interestingly, we also discovered that a gene previously annotated as encoding a hypothetical protein, with the original locus identifier TM1597 (Nelson et al., 1999), had been re-annotated as an alanine racemase, locus tag TMARI_RS08180, in more recent genome assemblies (Latif et al., 2013).
Expression vectors for these six candidates were used to transform E. coli strain MAD2 (Dalr DdadX DmetC), which we found to be much less prone to reversion events than E. coli MB2795. Transformants were cultured on LB medium (containing methionine but not D-alanine) to test each candidate enzyme for its ability to catalyze the formation of D-alanine. None was able to complement the D-alanine auxotrophy of E. coli MAD2. Finally, we also purified the protein encoded by TMAR-I_RS08180. In spite of its annotation as an alanine racemase, the purified protein was colorless (indicating an absence of bound PLP cofactor) and it had no detectable ALR activity at either 378C or 708C. Thus, TmCBL remains the most likely candidate for providing T. maritima with the D-alanine it requires for peptidoglycan biosynthesis.

Evolutionary history of CBL
Based on our previous knowledge of the promiscuous ALR activity of E. coli CBL (Soo et al., 2016), we started this study with the hypothesis that modern CBL enzymes are descended from a bi-functional CBL/ALR ancestor (PLP-dependent enzyme fold type I), but that the alanine racemase function has been replaced in most lineages by an alternate and non-homologous ALR specialist (PLP-dependent enzyme fold type III). However, our combination of biochemistry and bioinformatics provided evidence that CBL activity is not required for methionine biosynthesis in P. ubique, wMel or T. maritima. Instead, the physiological roles of wMel and TmCBL appear to be in acting as primordial-like, broadspecificity amino acid racemases.
To investigate the evolutionary history of CBL enzymes further, we inferred a phylogenetic tree of representative CBL sequences and compared it to a tree made with concatenated 16S and 23S rRNA sequences from the same species (Supporting Information Fig. S2). Simplified cladograms are shown in Fig. 5. The species and CBL trees are not congruent, but instead they are consistent with patterns of frequent metC loss and gain by horizontal transfer. For example, the Thermotogae CBL sequences cluster with those from the Bacteriodetes species Pontibacter roseus (Fig. 5B), whereas they are only more distantly related to sequences from Deinococcus-Thermus, which is a more closely-related phylum ( Fig. 5A and Hug et al., 2016). The CBL sequences from the Alphaproteobacteria are also found in different clades (Fig. 5B). While rRNA sequences place P. ubique and wMel together in the subclass Rickettsidae ( Fig. 5A and Ferla et al., 2013), wMelCBL shared a more recent common ancestor with E. coli CBL than PuCBL (Fig. 5B). In contrast, a tree of alphaproteobacterial ALR sequences shows them to be vertically transmitted (Supporting Information Fig. S3). Thus, a parsimonious explanation is that gain of a multifunctional CBL by horizontal transfer has led to the subsequent displacement of ALR in the lineages leading to P. ubique, wMel and T. maritima.

Primordial-like enzymes in non-canonical pathways
The goal of this study was to identify and characterize primordial-like enzymes, to shed light on primordial metabolism and processes of enzyme evolution. We discovered that the extant bacteria P. ubique, T. maritima and the Wolbachia endosymbiont of D. melanogaster have multifunctional CBL enzymes. However, our work has also highlighted the difficulties associated with assigning physiological functions to enzymes from non-model microorganisms with non-canonical metabolic pathways, in which gene knockouts are technically unfeasible.
The simplest situation arises in wMel. This obligately intracellular, parasitic bacterium has a heavily reduced genome of only 1.27 Mbp, which encodes 1270 proteins (Wu et al., 2004). The only met gene in its genome is metC, demonstrating that it is a methionine auxotroph. Our kinetics data (Table 2) show that metC has been gained and retained because of the alanine racemase and glutamate racemase activities of the enzyme it encodes, wMelCBL. Indeed, the vestigial CBL activity of wMelCBL has eroded to the point where it is now 10fold weaker than the ALR activity. While the enzyme retains a Michaelis constant for cystathionine (K M $ 20 mM; Table 2) that is comparable to that of E. coli CBL for the same substrate (K M 5 39 mM; Soo et al., 2016), its ability to turn over the substrate has almost entirely disappeared (k cat $ 4 h 21 ). Conversely, wMelCBL readily turns over L-alanine (k cat 5 2.3 s 21 ) and the enzyme appears sufficiently active to provide the cell with the small amount of D-alanine it requires during cell division (Vollmer et al., 2013). The kinetic parameters of wMelCBL for the GLR reaction are poor -particularly k cat , which is 135-fold lower than for the ALR reaction (Table 2). This decreased turnover number is likely to be offset somewhat by the relative intracellular abundance of the substrate L-glutamate, compared with Lalanine. The former is present at a 40-fold higher concentration than the latter in E. coli (Bennett et al., 2009). Thus, it is highly likely that CBL is actually a primordial-like, bi-functional ALR/GLR enzyme in wMel, required for catalyzing two steps in the synthesis of the Lipid II component of peptidoglycan. Presumably the ancestor of wMelCBL was bi-functional when it was gained via horizontal transfer, such that it released alr and murI from selection and led to their loss.
Our attempts to characterize PuCBL were hindered by our inability to purify the enzyme without a maltose binding protein fusion partner. Nevertheless, the MBP-PuCBL fusion was active as a CBL and an ALR (but not as a GLR). As in wMel, the CBL activity of PuCBL is likely to be vestigial. P. ubique has the smallest genome known for any free-living organism, encoding 1354 open reading frames with its 1.31 Mbp chromosome (Giovannoni et al., 2005). A defining characteristic of this oligotrophic bacterium is its requirement for a reduced source of sulfureither methionine or 3-dimethylsulfoniopropionate -for growth (Tripp et al., 2008;Carini et al., 2013). In cases where exogenous methionine is not available, it possesses enzymes for degrading 3-dimethylsulfoniopropionate to methanethiol (Reisch et al., 2011). P. ubique has an Oacetylhomoserine thiolase (encoded by metY), and it is known that these enzymes can catalyze the conversion of methanethiol, plus O-acetylhomoserine, directly into methionine (Ferla and Patrick, 2014). Therefore, P. ubique has no requirement for a CBL in methionine biosynthesis, and it is unsurprising that other unnecessary met genes, including metB and the methionine synthases (metE and metH), have been lost from its streamlined genome.
The implication is that P. ubique is now retaining CBL as its specialist alanine racemase. On one hand, the poor kinetic parameters of MBP-PuCBL for the ALR reaction (k cat /K M 5 12 s 21 M 21 ) make this explanation appear unlikely. On the other hand, its Michaelis constant (K M 5 12 mM) is comparable to those reported for the fold type III alanine racemases from species such as Pseudomonas fluorescens, Bacillus psychrosaccharolyticus and Shigella dysenteriae (Yokoigawa et al., 1993; Okubo et al., 1999;Yokoigawa et al., 2001). The relatively poor turnover number of MBP-PuCBL (k cat 5 0.15 s 21 ) may in part reflect the fusion to MBP. Further, these kinetics may not be maladaptive given the slow growth rate (and, therefore, low peptidoglycan requirement) of P. ubique. Even under optimized laboratory conditions, this bacterium completes less than one division per day (Tripp et al., 2008;Carini et al., 2013). In the absence of any more convincing candidates, we propose that the physiological role of PuCBL is to act as an alanine racemase. Whether this is its only physiological role remains an open question. It is possible that PuCBL catalyzes other (as yet undiscovered) reactions, especially as there are fold type I PLP-dependent enzymes that are known to catalyze transamination, b-replacement, g-elimination, decarboxylation and sidechain cleavage reactions (Raboni et al., 2010), in addition to b-elimination and racemization.
The third CBL we have characterized is from the obligately anaerobic thermophile, T. maritima. Like wMel and P. ubique, this bacterium has a small genome (1.87 Mbp; 1872 protein-encoding genes; Latif et al., 2013). TmCBL retains CBL activity, but in spite of its genomic annotation as metB (Latif et al., 2013) it is not active as a CGS (Fig. 4A and B). T. maritima also has a metY gene, which we have shown encodes an active O-acetylhomoserine thiolase (Fig. 4C), enabling CBL to be bypassed in methionine biosynthesis.
Like wMel, T. maritima appears to have gained a CBL and retained it as a bi-functional alanine/glutamate racemase. Our efforts to identify alternate alanine racemases were unsuccessful, although it remains possible that some other T. maritima protein may possess this activity (albeit while lacking detectable sequence similarity with any previously-described candidate). At the nonphysiological temperature of 378C, the three activities of TmCBL are present in the ratio 5900:120:1 (CBL:GLR:ALR; ratio of k cat /K M values in Bicine buffer in Table 2). At the physiological temperature of 708C, this ratio changes to 1200:110:1; that is, the enzyme is proportionately worse as a CBL and the ratio of ALR activity to GLR activity remains unchanged. In general the k cat and K M of enzymes both increase with temperature (Somero, 1995), and this is observed for the CBL and GLR activities of TmCBL (Table 2). Unusually, k cat and K M for the alanine racemase reaction both decrease with increasing temperature, with the 22-fold decrease in K M ensuring that TmCBL has an overall catalytic efficiency (k cat /K M 5 6.3 s 21 M 21 ) that is comparable to that of MBP-PuCBL. This dramatic change in K M is largely responsible for changing the ratio of the three activities, and suggests a structural rearrangement that alters the ability of the enzyme to discriminate between its substrates. Structural biology and protein dynamics experiments will be required to explore this hypothesis further.

Active site similarities and differences
As expected, homology models show that the structures of PuCBL, wMelCBL and TmCBL are all fold type I enzymes, akin to the promiscuous E. coli CBL (Fig. 1A). The PLP cofactor is anchored in the active site of the E. coli enzyme by Lys210 (Clausen et al., 1996). The incoming substrate (either cystathionine or L-alanine) displaces Lys210 and forms a Schiff base with the cofactor. Lys210 then acts as a catalytic base, and in the alanine racemase reaction Tyr111 acts as a catalytic acid (Soo et al., 2016). These two catalytic amino acids, as well as the key active site residues Ser339 and Tyr56 0 (the prime indicates a residue from the neighboring subunit), are conserved in PuCBL, wMelCBL and TmCBL.
Another important active site residue is Arg58 0 , which modulates the pK a of Tyr111 via a hydrogen bonding interaction (Lodha et al., 2010;Soo et al., 2016). Arg58 0 makes no interaction with Pro113 in wild type E. coli CBL, but when Pro113 is mutated to serine, a new hydrogen bond with Arg58 0 is introduced and alanine racemase activity is enhanced (Soo et al., 2016). The enzymes with poor ALR activity -PuCBL and TmCBL -both have an arginine in the position equivalent to Arg58 0 , with the former having a proline equivalent to Pro113 and the latter having tryptophan instead. In contrast, wMelCBL, which is the only enzyme more active as an ALR than a CBL, has asparagine and arginine respectively, in place of Arg58 0 and Pro113. These observations suggest that the evolutionary route to enhancing ALR activity at the expense of CBL activity lies in optimizing the hydrogen bonding network around the catalytic acid, Tyr111.
Our discovery that wMelCBL and TmCBL both possess glutamate racemase activity was unexpected, as comprehensive surveys have failed to ascribe this activity to any PLP-dependent enzyme (Percudani and Peracchi, 2009;Raboni et al., 2010). The PLP-dependent racemization of L-glutamate should proceed via the same mechanism as L-alanine racemization (deprotonation by the catalytic lysine and reprotonation by the catalytic tyrosine). On one hand, it is not surprising that glutamate can be accommodated in the CBL active site, given that it is intermediate in size between cystathionine and alanine (Fig. 1). On the other hand, neither the E. coli CBL nor MBP-PuCBL possesses GLR activity. Experimentally determined structures (rather than homology models) will be required to better understand the source of GLR activity; to this end crystals of the wMelCBL and TmCBL proteins have recently been grown in our laboratory.

Biochemical evolution in bacteria with reduced genomes
Our bioinformatics-based search for primordial-like CBL enzymes was not biased toward species with any particular life history trait or genome architecture. Nevertheless, it led us to wMel, P. ubique and T. maritima, all of which have atypically small genomes (< 2 Mbp). For endosymbionts such as wMel, rapid genome reduction is due to their ability to obtain metabolites from the host and, critically, also due to population structures that result in high levels of drift but weak selection (Wu et al., 2004;McCutcheon and Moran, 2011). Similarly, the population characteristics of endosymbionts -low effective population size, frequent bottlenecks during transmission and a lack of recombination -lead to rapid sequence evolution (McCutcheon and Moran, 2011). In the case of wMelCBL, this appears to have manifested as a particularly rapid erosion of its ancestral CBL activity (Table 2).
While there is clearly much more to be learned about the enzymology of endosymbionts, our primary goal is to understand the origins and evolution of metabolic networks. Logically, the first cells could not have been endosymbionts. Therefore, we are particularly interested in free-living bacteria with primordial-like enzymes, such as P. ubique and T. maritima. In these species, unlike endosymbionts, selection may have driven genome reduction to favor cell architectures that minimize the resources required for growth (Wolf and Koonin, 2013;Giovannoni et al., 2014). The evidence we have presented suggests that the genes for several specialized enzymes (e.g., ALR and GLR) can indeed be lost from these bacteria, provided that a physiologically multifunctional (rather than promiscuous) enzyme, such as TmCBL, is gained. We have also provided evidence that options which shorten metabolic pathways (e.g., O-acetylhomoserine thiolase instead of CGS and CBL) may be favored in free-living organisms with reduced genomes.
Primordial-like enzymes have certainly been characterized from bacteria with larger genomes. For example, the bi-functional isomerase PriA was initially identified in M. tuberculosis (with a genome of 4.41 Mbp) and in S. coelicolor, which has one of the largest bacterial genomes known, at 8.67 Mbp (Barona-G omez and Hodgson, 2003). Nevertheless, genome reduction is a pervasive mode of evolution (Wolf and Koonin, 2013), and two other examples of primordial-like enzymes are from endosymbionts: TrpF from Chlamydia trachomatis (Adams et al., 2014), which has a 1.04 Mbp genome; and IlvC from Buchnera (Price and Wilson, 2014), which has a 416 kbp genome. While the data on primordiallike enzymes from free-living bacteria are still limited, it is tantalizing to speculate that selection is reducing the genomes of many species, while simultaneously returning their metabolic networks to a state that is comparable to the one envisaged by Yčas and Jensen for primordial cells (Yčas, 1974;Jensen, 1976).
Primordial-like enzymes are under selection for multifunctionality. In the case of our CBLs, this has yielded enzymes with K M values that are in the range of more specialized enzymes. In their landmark survey of all the enzyme parameters in the comprehensive BRENDA database (Chang et al., 2015), Bar-Even et al. found that $60% of enzymes have K M values in the range 0.01-1 mM (2011). None of the K M values we determined (Table 2) lie in the top 5% or bottom 5% of the Bar-Even dataset. Thus, our multifunctional CBLs show substrate recognition properties that are comparable to specialized enzymes. In contrast, the primordial-like CBL enzymes have poor k cat values; that is, they are slow at converting substrate into product. Of the 6,530 k cat values that were compiled by Bar-Even et al., only 73 (1.1%) are lower than the value we measured for TmCBL acting as an alanine racemase at its physiological temperature (k cat 5 0.0024 s 21 ; Table 2). Similarly, the turnover number of wMelCBL acting as a glutamate racemase is in the lowest 3% of those examined by Bar-Even et al.
The general trend for enzymes from organisms without reduced genomes (i.e., >99% of the enzymes in BRENDA) is for those in central metabolism to be more efficient than those in intermediate or secondary metabolism (Bar-Even et al., 2011). In spite of their roles in central metabolism, the primordial-like CBL enzymes have kinetic parameters that place them among the least efficient enzymes inhabiting any part of the metabolic network. Nevertheless, bacteria with reduced genomes are the most abundant organisms in the biosphere (Wolf and Koonin, 2013;Giovannoni et al., 2014). As we learn more about the enzymes from these organisms, it seems likely we will conclude that most of the enzymes on the planet are substantially less efficient catalysts than the subset we have been studying for the past century.

Concluding remarks
This study provides evidence that the weakly active, multifunctional enzymes encoded by reduced genomes are excellent (but previously underappreciated) comparative models for studying primordial enzymes. Further, we suggest that the patchwork metabolic networks within free-living bacteria such as P. ubique and T. maritima are ideal starting points for understanding the ancient origins of metabolic biochemistry. However, our data also emphasize the difficulty in assigning functions to many of the genes in these bacteria, based solely on sequence similarity. This is particularly true in the case of PLP-dependent enzymes, for which many different functions can be found on each different fold, with short mutational routes between these functions (Eliot and Kirsch, 2004;Raboni et al., 2010;Soo et al., 2016). For simplicity, we have persevered with calling the subjects of this study metC genes and CBL enzymes. However, we have presented evidence that these are misnomers, in spite of the sequence evidence to the contrary. We propose the new function-based annotation of aar (amino acid racemase) for the metC genes of wMel and T. maritima, and alrX (alanine racemase, distinct from the alr genes that encode fold type III enzymes) for P. ubique metC.
A recent editorial highlighted the extraordinary richness and diversity of the 'esoteric, niche enzymology' that is largely absent from the textbooks (Tawfik and van der Donk, 2016). We have gone further, and suggested that esoteric enzymes -such as the poorly-active, multifunctional ones we have characterized in this studyrepresent the rule, and not the exception, in the biosphere. The emerging interest in esoteric enzymology highlights the need for an 'Enzymatic Encyclopaedia of Bacteria and Archaea', in analogy with the Genomic Encyclopaedia of Bacteria and Archaea (GEBA) project (Wu et al., 2009a). Continuing the analogy with phylogenomics, we suggest that the term 'phyloenzymology' encapsulates our approach of identifying enzymes to characterize, based solely on their phylogenetic novelty.

Materials
Oligonucleotides were from Integrated DNA Technologies (Coralville, IA). Restriction enzymes were from New England Biolabs (Ipswich, MA). Chemicals were from Sigma Chemical (St Louis, MO) unless noted otherwise.

Search for candidate species
A search to identify bacterial genomes without an alr homologue but with a metC homologue was conducted. A Perl script (available at https://github.com/matteoferla/Perly-scripts/ tree/master/COG%20genome%20tool) first downloaded the protein tables of each fully sequenced bacterial genome from the NCBI http site (http://www.ncbi.nlm.nih.gov/genomes/ lproks.cgi) and then parsed them based on the presence of the COG0787 (alanine racemase) and COG0626 (cystathionine b-lyase/g-synthase) annotations. The species present were placed on a taxonomic tree generated by the iTOL server (itol.embl.de; Letunic and Bork, 2011) and annotated with the presence or absence of the two genes. This allowed groups of species with the same presence/absence pattern of alr and metC/metB to be collapsed into higher taxa, simplifying the tree. The list of species without alr was validated with BLASTP and TBLASTN searches in the NCBI database (Sayers et al., 2009).

Cloning of metC genes
The wMel metC gene was synthesized with codon optimization for expression in E.coli by GeneArt (Life Technologies), and modified to possess an N-terminal hexahistidine tag and a linker that contained a KpnI site. It was sub-cloned into pBAD/myc-His(B) (Invitrogen) with NcoI and HindIII. T. maritima metC was amplified from genomic DNA using the custom primers Tma_metC_KpnI_F (CAG GTA CCG AGA ACC TGT ATT TCC AAG GAA ACA CAG ACG ACA TTC TGT TTT CTT ACG G) and Tma_metC_XbaI_R (CTT GTT CTA GAT TAA ATC TTT TTG AGT GCC TGA TCC AGA TCT TC). The product was cloned into the vector above with KpnI and XbaI. P. ubique metC was synthesized by DNA2.0, with codon optimization. The gene was amplified with primers Pub_metC_AvaI_F (AAC AAC CTC GGG ATC GAG GGA AGG ATG ACC AAA TCC TTT AAA ACC TTT C) and Pub_-metC_BamHI_R (GAA TTC GGA TCC TTA TTT GAT ATA TTT CAG GCT TTT TTT CAG) and cloned into pMAL-c5X (New England Biolabs) with AvaI and BamHI-HF.

In vivo complementation
Complementation tests employed the E. coli DmetC and DmetB strains from the Keio collection (Baba et al., 2006), and the alanine racemase knockout strain, E. coli MB2795 (Dalr DdadX) (Soo et al., 2016). In addition to the CBL expression vectors, an empty pBAD plasmid was used as a negative control. Positive controls for rescuing each strain were the plasmids pCA24N-metC, pCA24N-alr and pCA24N-metB from the ASKA collection of E. coli overexpression vectors (Kitagawa et al., 2005). Cells harboring the relevant plasmid were grown overnight in rich medium, pelleted, washed and re-suspended in 1 3 M9 salts. For testing complementation of the E. coli DmetC and DmetB strains, $10 5 colony forming units were spread on M9 agar plates containing 0.4% (w/v) glucose, the relevant antibiotic for maintaining the plasmid (100 mg ml 21 ampicillin or carbenicillin for pBAD and pMAL, or 34 mg ml 21 chloramphenicol for pCA24N), and inducer (0.02% arabinose or 50 mM IPTG). In the case of E. coli MB2795, LB medium was used instead of M9 medium, as it lacks D-alanine and the strain has multiple uncharacterized auxotrophies. Plates were incubated in airtight containers at 288C or 378C, for up to 4 weeks, and colony formation was monitored.
Each expression strain was cultured in Terrific Broth (Formedium; Hunstanton, UK) supplemented with the appropriate antibiotic, at 378C with shaking until an OD 600 of 0.6 was reached. Cultures were transferred to 288C and overnight protein expression was induced by the addition of 0.02% (w/v) arabinose or 0.5 mM IPTG. The cell pellets were harvested by centrifugation and then re-suspended in lysis buffer containing 4 ll ml 21 protease inhibitor cocktail (Sigma), 0.5 mg ml 21 chicken egg white lysozyme (Sigma) and 5 U ml 21 benzonase nuclease (Merck). Cells were lysed by sonication on ice and cellular debris was separated from soluble protein by further centrifugation. After clarification through a 0.45 mm syringe-driven filter, the soluble lysate was applied to either Talon resin (Clontech; Mountain View, CA) for IMAC or to amylose resin (New England Biolabs) for amylose-affinity chromatography. After incubation at 48C with rocking, for at least 1 h, the resins were washed extensively with lysis buffer, packed into Bio-Spin gravity flow columns (BioRad, Hercules, CA) and washed further. For IMAC, protein was eluted from the resin with lysis buffer supplemented with 100-500 mM imidazole. Fractions were pooled and exchanged into storage buffer (50 mM potassium phosphate buffer, 200 mM NaCl, 10% v/v glycerol, pH 7.0) using an Amicon centrifugal filter unit with a 50 kDa molecular weight cut-off (EMD Millipore; Billerica, MA). For amylose-affinity chromatography, the protein was eluted with 5 ml of lysis buffer containing 10 mM maltose. Fractions containing pure MBP-PuCBL were dialyzed extensively against storage buffer (50 mM TrisÁHCl, 600 mM KCl, 10% v/v glycerol, pH 7.5) using a dialysis cassette with a 10 kDa molecular weight cut-off (Pierce Biotechnology; Waltham, MA). Enzyme concentrations were determined using molecular extinction coefficients that were calculated as described previously (Pace et al., 1995). All proteins were snap frozen in liquid nitrogen and stored at 2808C until use; control assays verified that this led to no significant loss of activity.

Kinetic assays
Continuous assays were performed using a Cary 100 UV-Vis spectrophotometer with temperature controller (Agilent Technologies; Santa Clara, CA). Cuvettes containing each reaction mixture were incubated at assay temperature for 2 min prior to the start of the assay. The absorbance was monitored for 4 min to establish a stable baseline and then for a further 8 min following the addition of enzyme. Progress curves were plotted and the initial rate was calculated by subtracting the slope of the baseline from the slope of the enzyme-catalyzed reaction. Except as noted in Table 2, technical triplicates and minimally two biological replicates were performed. Appropriate enzyme concentrations were determined on a protein-by-protein basis for each assay, depending on the rate of reaction. A minimum of 7 different substrate concentrations were measured to construct each kinetic curve. Initial rate data were plotted and fitted to the Michaelis-Menten equation using GraphPad Prism.
A continuous assay (Esaki and Walsh, 1986) was used to measure alanine racemization (L-alanine ! D-alanine) by wMelCBL and PuCBL. The assay utilized coupled reactions converting the product, D-alanine, to pyruvate by D-amino acid oxidase and the subsequent reduction of pyruvate by lactate dehydrogenase, which is coupled to the oxidation of NADH. Reaction mixtures contained 50 mM Tris-HCl pH 8.8, 10 lM PLP, 0.2 mM NADH, 1 U ml 21 D-amino acid oxidase (from porcine kidney; Sigma), 120 U ml 21 lactate dehydrogenase (from bovine heart; Sigma), up to 100 mM L-alanine and up to 5 lM CBL. Assays were performed at 378C and monitored the disappearance of NADH at 340 nm (E 5 6220 M 21 cm 21 ).
The alanine racemase activity of TmCBL (L-alanine ! Dalanine) was measured using a discontinuous assay (Patrick et al., 2002). Reaction mixtures contained 10 lM PLP, up to 5 lM CBL and up to 40 mM L-alanine, buffered with either 50 mM Bicine (pH 8.0) or 50 mM TrisÁHCl (pH 8.8). The reaction was incubated at 378C or 708C for up to 20 min. To stop the reaction, TmCBL was inactivated by incubation at 958C for 5 min. Next, 10 ll of reaction mixture was transferred to a flat-bottomed Costar 96 well plate (Corning; Corning, NY) and 90 ll of color development reagent (100 mM sodium phosphate pH 7.0, 1.8 U ml 21 D-amino acid oxidase, 20 U ml 21 horseradish peroxidase from Sigma and 2 mg ml 21 O-phenylenediamine) was added. Colour was allowed to develop by incubation at 378C for 45 min and then stopped by the addition of 100 ll of HCl (3 M). The absorbance was measured at 492 nm in a Multiskan plate reader (Thermo Scientific; Waltham, MA) and quantified by reference to a standard curve (0-10 nmol D-alanine, freshly made for each assay).
Glutamate racemization (D-glutamate ! L-glutamate) was measured using a continuous assay (Lundqvist et al., 2007) that utilized a coupled reaction catalyzed by glutamate dehydrogenase (converting L-glutamate to a-ketoglutarate), which in turn was coupled to the reduction of NAD 1 . The reaction mixture contained 10 lM PLP, 5 mM NAD 1 and up to 5 mM D-glutamate, buffered with either 50 mM Bicine (pH 8.0) or 50 mM TrisÁHCl (pH 8.8). For assays performed at 378C, 0.4 U ml 21 of bovine liver glutamate dehydrogenase (Sigma-Aldrich) was used as the coupled enzyme. For assays at 708C, we expressed and purified the glutamate dehydrogenase from T. maritima, and used it at a concentration of 30 lM. CBL concentrations of 0.2-5 mM were used. The appearance of NADH was monitored at 340 nm.

Synthesis and analysis of O-acetyl-L-homoserine
O-acetyl-L-homoserine was synthesized with minor modifications to a previous protocol (Nagai and Flavin, 1971).

Mass spectrometry
To test whether TmCBL could act as a cystathionine gsynthase and catalyze the formation of cystathionine a reaction mixture was set up containing: 10 mM Bicine pH 8.0; 5 mM L-cysteine, 10 lM PLP; 10 lM TmCBL; and either 5 mM O-acetylhomoserine or 5 mM O-succinylhomoserine. To determine whether the T. maritima O-acetylhomoserine thiolase could catalyze the production of homocysteine and/or cystathionine, from O-acetylhomoserine and either H 2 S or cysteine respectively, the following reaction mixture was set up: 10 mM Bicine pH 8.0; 5 mM O-acetylhomoserine; 10 lM PLP; 10 lM O-acetylhomoserine thiolase and either 5 mM Na 2 S or 5 mM L-cysteine. The reactions were incubated for 16 h at 708C, and then for 5 min at 958C to inactivate the enzyme, before being stored at 2208C prior to analysis by mass spectrometry.
For analysis, the samples were diluted 10-fold with 5% (v/v) acetonitrile and 0.2% (v/v) formic acid and directly injected at 800 nL min 21 into an Ultimate 3000 nano-flow UHPLC (Dionex; Sunnyvale, CA) coupled to an LTQ-Orbitrap XL mass spectrometer (Thermo Scientific). MS-1 spectra were generated in positive ion mode over 11 min in the mass range of 119-300 m/z, with a resolution of 100 000. Species were identified based on the m/z of the [M 1 H] 1 ions. To determine the lower limit of detection for L-cystathionine, a standard curve from 0.5 nmol to 50 nmol was produced under identical conditions (Supporting Information Fig. S1).

Experiments with additional T. maritima enzymes
Expression vectors for the metY, ilvA, dapF, ilvE and TMARI_RS02240 genes of T. maritima were purchased from the DNASU Plasmid Repository (https://dnasu.org). These vectors were from the pMH series, constructed by the Joint Consortium for Structural Genomics (Lesley et al., 2002), and they facilitated arabinose-induced expression of His 6 -tagged enzymes. The yggS gene was amplified from T. maritima MSB8 genomic DNA using primers yggS_fwd (TAC CGA GAA CCT GTA TTT CCA AGG AGG ATT GAA AGA AAA CCT CGA AAG GG) and yggS_rev (TGA GAT  GAG TTT TTG TTC TAG AAG CTC ACT TCC CTC CTT  CGA ATA TGG CG). TMARI_RS08180 was amplified from  genomic DNA with primers TM1597_fwd (TAC CGA GAA  CCT GTA TTT CCA AGG AGT GTA TCC CAG GCT TCT  GAT AAA TC) and TM1597_rev (TGA GAT GAG TTT TTG  TTC TAG AAG CTC AGA TTG AGG GTT CAT ACA CCT  TC). The two amplified inserts were cloned into pBAD using Gibson Assembly, after the vector had been amplified with pBAD_fwd (GCT TCT AGA ACA AAA ACT CAT C) and pBAD_rev_His (TCC TTG GAA ATA CAG GTT C).
For activity and mass spectrometry experiments, the proteins encoded by metY (O-acetylhomoserine thiolase) and TMARI_RS08180 were expressed and purified using IMAC, as described above. To test for alanine racemase activity, the vectors for expressing the proteins encoded by ilvA, dapF, ilvE, yggS, TMARI_RS02240 and TMARI_RS08180 were used to transform E. coli MAD2 (Dalr DdadX DmetC).
Complementation tests were carried out as described above, using LB supplemented with ampicillin (100 mg ml 21 ) and arabinose (0.1%). Plates were incubated in airtight containers at 378C, for 4 weeks.

Phylogenies of CBL and ALR
The CBL tree was constructed as follows. The protein sequences were chosen from representatives of each major CBL-containing clade, as determined from a large reference tree that in turn had been assembled from >200 sequences found by concatenated BLASTP searches. The sequences were aligned with MUSCLE (Edgar, 2004) and the header names were changed with a Python3 script and nw_rename from the Newick utilities (Junier and Zdobnov, 2010). The tree was inferred with RAxML (Stamatakis, 2014) using fast bootstrap with 500 replicates under a WAG model with C distribution and with NP_253712.1 (O-acetylhomoserine thiolase from Pseudomonas aeruginosa PAO1) as the outgroup. Six iterations were done until a satisfactory tree was obtained. The rRNA tree for the species in the CBL tree was built using ARB SINA to obtain good quality alignments of the 16S and 23S rRNA sequences, which were then concatenated with a Python3 script. RAxML was used to infer the tree of the concatenated sequences, with a Cdistributed GTR model and 500 bootstraps. The newick files of the CBL and rRNA trees were viewed with FigTree and several clades were rotated to match the two trees as closely as possible. To determine the ALR protein tree, two methods were used to collect the sequences. First, a preliminary BLASTP search was done with Rickettsia prowazekii ALR (NP_220488) as the query, to test whether the most similar sequences were alphaproteobacterial. Second, the ALR sequences for species across the Alphaproteobacteria were chosen manually. These two sets of sequences were combined, aligned with MUSCLE and used for the tree inference, which was carried out with RAxML under a GTR model with a C distribution, repeated for 500 replicates.