Re‐Programming and Optimization of a L‐Proline cis‐4‐Hydroxylase for the cis‐3‐Halogenation of its Native Substrate

Non‐heme iron/α‐ketoglutarate dependent halogenases acting on freestanding substrates catalyze the regio‐ and stereoselective halogenation of inactivated C(sp3)‐H bonds. Yet, with only a handful of these halogenases characterized, the biosynthetic potential of enzymatic radical halogenation remains limited. Herein, we describe the remodeling of L‐proline cis‐4‐hydroxylase from Sinorhizobium meliloti into a halogenase by introduction of a single point mutation (D108G) into the enzyme's active site. The re‐programmed halogenase displays a striking regio‐divergent reaction chemistry: While halogenation of L‐proline exclusively occurs at the C3‐position, the retained hydroxylation activity leads to derivatization at the C4‐position, corresponding to the regioselectivity of the wildtype enzyme. By employing several rounds of directed evolution, an optimized halogenase variant with 98‐fold improved apparent kcat/Km for chlorination of L‐proline compared to the parental enzyme SmP4H (D108G) was identified. The development and optimization of this novel halogenation biocatalyst highlights the possibility to rationally harness the chemical versatility of non‐heme Fe/αKG dependent dioxygenases for C−H functionalization.

Non-heme iron/α-ketoglutarate dependent halogenases acting on freestanding substrates catalyze the regio-and stereoselective halogenation of inactivated C(sp 3 )-H bonds. Yet, with only a handful of these halogenases characterized, the biosynthetic potential of enzymatic radical halogenation remains limited. Herein, we describe the remodeling of L-proline cis-4-hydroxylase from Sinorhizobium meliloti into a halogenase by introduction of a single point mutation (D108G) into the enzyme's active site. The re-programmed halogenase displays a striking regio-divergent reaction chemistry: While halogenation of L-proline exclusively occurs at the C3-position, the retained hydroxylation activity leads to derivatization at the C4-position, corresponding to the regioselectivity of the wildtype enzyme. By employing several rounds of directed evolution, an optimized halogenase variant with 98-fold improved apparent k cat /K m for chlorination of L-proline compared to the parental enzyme SmP4H (D108G) was identified. The development and optimization of this novel halogenation biocatalyst highlights the possibility to rationally harness the chemical versatility of non-heme Fe/αKG dependent dioxygenases for CÀ H functionalization.
The introduction of a halogen into a molecule's scaffold can alter its bioactivity or act as a useful synthetic handle for diversification and modification of late-stage synthetic intermediates. [1,2] The recently discovered non heme iron/αketoglutarate (Fe/αKG) halogenases, e. g., WelO5, [3] WelO5*, [4] Wi-WelO15, [5] AmbO5 [6] and BesD, [7] acting on free-standing substrates have broadened the biocatalytic potential of radical halogenation by overcoming the complexity barrier of previously known Fe/αKG halogenases requiring carrier-proteintethered substrates (BarB1 and BarB2, [8] SyrB2, [9] CytC3, [10] CmaB, [11] HctB, [12] CurA, [13] KthP [14] ). Fe/αKG halogenases belong to the superfamily of α-ketoglutarate-dependent oxygenases which coordinate the co-factor Fe(II) by two conserved histidine residues, a carboxylate residue (in halogenases replaced by a halogen ligand) and the cofactor α-ketoglutarate in their active site. Mechanistically, Fe/αKG hydroxylases are the most well studied among the family of Fe/αKG oxygenases. It has been proposed that Fe/αKG halogenases follow a similar mechanism. [15] This catalytic mechanism involves the formation of an Fe(IV)-oxo species (known as the ferryl intermediate) as a result of the oxygen activation process. The ferryl intermediate abstracts a hydrogen atom from the substrate yielding a ferric hydroxo complex [Fe(III)-OH] and a substrate radical. In hydroxylases, hydroxyl radical rebound leads to the formation of the hydroxylated product while in case of halogenases the coordinated halogen reacts with the substrate radical yielding the halogenated product instead [16] (Scheme 1).
Over 5000 halogenated natural products have been reported to date, [17] many of which have important biological functions such as antibacterial, antitumor, antidiabetic and antioxidant activity. [18,19] Unsurprisingly, halogenation also plays an important role in the pharmaceutical industry: Around 25 % of small molecule drugs on the market are halogenated including blockbuster drugs such as rivaroxaban, empagliflozin, sitagliptin and aripiprazole. [20][21][22] Owing to the crucial role of halogenated compounds in modern pharmaceutical industry, many efforts have been devoted to the development of novel biohalogenation strategies in an attempt to increase ease and sustainability of halogen incorporation. However, the number of available Fe/αKG dependent halogenases acting on free standing substrates is small and despite first enzyme engineering efforts [5,23] the accessible substrate spectrum remains limited for the time being.
Most known Fe/αKG halogenases, which accept freestanding substrates, have been identified through biosynthetic pathway elucidation, such as WelO5, AmbO5 and BesD. [24][25][26] Using the original hits for further bioinformatic mining led to the successful identification of additional halogenases, yet these candidates were shown to possess a similar substrate scope as the query enzymes. [7] Alternatively, new halogenases can be obtained by re-engineering Fe/αKG hydroxylases given the high structural similarity between the two enzyme families. In the successful re-engineering study of Boal et al., the authors utilized the WelO5 structure as a reference model to identify a starting scaffold, the Fe/α-ketoglutarate-dependent hydroxylase SadA from Burkholderia ambifaria (PDB ID: 3W21). [27] By exchanging the aspartate in the Fe-coordinating HxD motif to a glycine, the hydroxylase was successfully converted to a halogenase, despite its low sequence identity to WelO5 (19 % sequence identity). While this engineered SadA variant showed halogenation activity towards its natural substrate N-succinyl-Lleucine, it retained a predominant hydroxylation activity. To date, the study of Boal et al. represents the only successful example of repurposing a hydroxylase to a halogenase [27] while many other endeavors to re-engineer Fe/αKG hydroxylase have failed. [28,29] Intrigued by the catalytic promiscuity and broad substrate scope of Fe/αKG dioxygenases, we re-examined the possibility to re-program the abundantly available Fe/αKG hydroxylases to halogenases. Using protein engineering methods, we then intended to further increase the efficiency of any successfully re-programmed biocatalyst.
We embarked on our enzyme engineering campaign by performing a bioinformatic analysis of 20 literature-known Fe/ αKG dependent hydroxylases (Table S1). The enzymes were ranked based on their sequence similarity to the native halogenase WelO5 as well as the re-programmed SadA. In addition, soluble expression levels were taken into account to lead to the selection of five Fe/αKG dependent hydroxylases, namely L-proline cis-4-hydroxylase from Sinorhizobium meliloti, [30] L-isoleucine dioxygenase from Bacillus thuringiensis, [31] L-pipecolic acid cis-5-hydroxylase (SruPH) from Segniliparus rugosus, [32] leucine hydroxylase (griE) from Streptomyces muensis [33] and L-lysine 4-hydroxylase from Flavobacterium johnsoniae. [34] For all five enzymes, a substitution of the active site Fe(II) carboxylate ligand with a glycine residue was performed. Enzymes for which the crystal structure was not available, a bioinformatic search of the conserved His-X 1 -(Asp/ Glu)-X 2 -His motif was carried out to identify the position of the Fe-chelating carboxylate residue (Table S2).
After soluble expression of all enzyme variants in E.coli BL21 (DE3), crude lysate biocatalysis reactions in the presence of 250 mM NaCl were carried out using the respective native substrates of the five wild type enzymes ( Figure S1). The biotransformation products were analyzed by liquid chromatography coupled to mass spectrometry analysis (LC-MS). From the set of the five engineered enzyme variants, only L-proline cis-4hydroxylase from Sinorhizobium meliloti yielded a new product with an m/z ratio of 383 which is consistent with the mass of the dansylchloride derivatized chloro-L-proline exhibiting the characteristic isotope pattern of M : M + 2 = 3 : 1 of chlorinated compounds ( Figure S3 and S4). Importantly, the new product peak was not observed in the spectra of the wildtype enzyme, in which only the hydroxylated peak of dansylated proline with an m/z ratio of 365 was identified ( Figure S2 and S3). Even though the re-programmed halogenase exhibited promising stereo-and regioselectivity, reaction selectivity (halogenation vs. hydroxylation) of the enzyme variant was low and cis-4hydroxy-L-proline was generated in significant amounts (24 : 1 hydroxylation to halogenation ratio). This low reaction selectivity in SmP4H D108G (SmP4H-0) was not unexpected: In the only other previous example of a successful hydroxylase-to-halogenase re-programming, the engineered enzyme, SadA, also retained dominant hydroxylase activity for its natural substrate N-succinyl-L-Leucine (ca. 70 % of total product pool). [27] Structural elucidation of the products via nuclear magnetic resonance (NMR) revealed that the hydroxy-group and the halogen were both introduced in a cis-configuration, however, in function of the reaction chemistry different C-atoms of the pyrrolidine ring were derivatized (Scheme 2). SmP4H-0 hydroxylates L-proline at the C-4 position and therefore retains the regioselectivity that has been reported by Hara and Kino for the wildtype hydroxylase. [30] Interestingly, halogenation of L-proline by the engineered variant SmP4H-0 leads to the formation of a chloro-substituted product at the C-3 position of the pyrrolidine ring (Table S4). This regiospecific chemistry is particularly striking as described wildtype halogenases typically halogenate and hydroxylate their substrates at identical positions. [23,27,35,36] In absence of a suitable crystal structure, we carried out molecular docking studies with a homology model in order to understand the factors governing the regio-and chemoselectivity of L-proline hydroxylation/halogenation by SmP4H-0. Based on the sequence similarity of SmP4H with L-proline-cis-4hydroxylase from Mesorhizobium loti (identity: 66 %; similarity: 91 %) we used a crystal structure of this enzyme (PDB ID: 4P7W) to create a homology model of our target enzyme using Swiss Model. [37] Docking of L-proline into the active site of SmP4H-0 was performed with AutoDock Vina. [38] In this way, we identified two L-proline binding modes which could account for the respective reaction outcomes, namely hydroxylation at the C4 carbon and halogenation at the C3 carbon. In binding mode A, dubbed "hydroxylation binding mode", the C4 atom of L-proline was found to be in closer proximity to the ferryl intermediate favoring hydrogen abstraction and hydroxyl rebound at this position, whereas in binding mode B, named "halogenation binding mode", the C3 atom of the pyrrolidine ring was in closer vicinity to the chloride anion compared to binding mode A ( Figure S5).
A closer inspection of the homology model moreover highlighted a hydrogen bond network between Arg118, αKG and L-proline. Based on this analysis, we hypothesize that through Arg118's altered hydrogen binding network in the halogenation binding mode ( Figure S7), this residue might influence the orientation of the oxygen in the Fe=O intermediate to favor halogen over hydroxyl rebound in a similar manner as the proposed Ser189 in WelO5 and Asn219 in BesD [7,36] or have an influence on the charge state of the halogen as proposed for Arg245 in HctB. [39] In support of this hypothesis, we observed that when Arg118 was subjected to full saturation mutagenesis (NNK codons) in variant SmP4H-0 only arginine containing variants exhibited chlorination activity. In the context of the evolved variant SmP4H-7 the same saturation mutagenesis analysis revealed that, besides arginine, also the residue lysine at position 118 allowed for a minor chlorination activity, albeit with a 100-fold reduced conversion yield (Figure S8).
Having obtained an active, re-programmed halogenase, we opted for a directed evolution approach to boost SmP4H-0's chlorination activity towards L-proline. In a first step, we employed structure-guided directed mutagenesis to target residues lining the active site of the enzyme. For this purpose, we used our L-proline docked homology model of SmP4H to select six residues in close proximity to the substrate for site saturation mutagenesis using NNK codons (1 st generation library) (Figure 1). A lysate-based screening protocol in a 96-well plate format was implemented and enabled to evaluate the activity of all library variants. For each library, we oversampled the expected library size by 3-fold, consequently expecting a 95 % coverage of all possible library variants in our screening.
The LC-MS analysis of the single site libraries revealed two promising hits, SmP4H-1 (V57L) and SmP4H-2 (S107T), which showed 1.9-and 1.5-fold improved chlorination activity compared to SmP4H-0, respectively (Table S5). In the second evolution round, a two-site randomization of position 57 and 107 was performed with the aim to identify potential epistatic effects. The targeted library was designed using DKR degener-Scheme 2. Proposed mechanism for the divergent reactivity of SmP4H wild type and the SmP4H D108G variant. ate codon at position 57 and RST codon at position 107 leading to the identification of SmP4H-3 (V57 L/S107T) showing a 2.7fold increased activity over wildtype. This variant was further used as the parent for the third round of mutagenesis based on error prone PCR. Chlorination activity of around 730 variants was evaluated via LC-MS and led to the identification of three variants SmP4H-4 (V57L/S107T/D113E), SmP4H-5 (A26T/V57L/ V58E/S107T) and SmP4H-6 (V57L/S107T/P187L/R274H), which showed 3.1-, 4.2-and 4.4-fold halogenation activity increase over wildtype, respectively. To further optimize the enzyme's ability to introduce a chloride into L-proline, SmP4H-4, SmP4H-5 and SmP4H-6 were used as templates for DNA-shuffling experiment leading to variant SmP4H-7 (V57L/S107T/D113E/ T115P (offÀ target) /R274H), which showed 18.7-fold increase in chlorination yield in the lysate-based assay compared to SmP4H-0 (Figure 2a). While SmP4H-7 showed the highest chlorination activity of all investigated variants, the hydroxylation activity was equally retained with a 12 : 1 hydroxylation to halogenation ratio ( Figure S2-S4). Notably, for none of the engineered halogenases did we detect the formation of simultaneously hydroxylated and chlorinated product, indicating that cis-3-chlorinated and the cis-4-hydroxylated proline are not accepted as substrates by the enzyme variants.
Comparison of our engineered halogenase variant SmP4H-7 to the original re-programmed enzyme through in vitro biocatalysis reactions using purified enzymes ( Figure S9) highlighted that the introduced mutations in SmP4H-7 improved its apparent k cat up to 12-fold and the apparent k cat /K m by almost 100-fold compared to the parent enzyme (SmP4H0) ( Table 1).
To determine the anion promiscuity of the most active variant (SmP4H-7), an alternative halogen salt (250 mM NaBr) was added to the reaction mixture. Incubation yielded a product with an m/z ratio of 427, which is consistent with the calculated mass of the dansyl-derivatized brominated compound. It should be noted, however, that the evolved variant exhibited a strong preference towards the smaller chloride anion even in the  presence of excess NaBr ( Figure S10 and S11) which is in accordance with previously reported halogenases. [23,27,40] To understand the basis of the activity improvement in SmP4H-7, we carried out homology and docking studies using SwissModel [37] and AutoDockVina. [38] While the detailed mechanism behind the improved activity of the evolved variants remains elusive, our docking studies suggest that the distance between the C3 carbon of L-proline and Fe is diminished from 5.58 Å (SmP4H-0) to 5.39 Å (SmP4H-7) during directed evolution. In the same evolutionary trajectory, the distance between the C4 carbon of L-proline and Fe did not appear to have changed significantly (5.57 Å vs 5.65 Å in SmP4H-0 and SmP4H-7, respectively) ( Figure S5, S6). An additional feature at the basis of the improved halogenation activity in SmP4H-7 could be the emergence of a hydrogen bond between L-proline and Glu111 which appears only in the halogenation binding mode of the best variant ( Figure 3). This additional hydrogen bond might stabilize the substrate and orient the C3-carbon in axial position to favor halogenation.
As most introduced mutations accumulated in close proximity to the surface (Figure 2b), the observed distance difference between the relevant sp 3 carbons and the reactive FeIV=O species of SmP4H-0 and SmP4H-7, respectively, do not seem to fully explain the improved halogenation efficiency. Using the software tool CaverDock, [41] we modelled the transportation efficiency of the substrate into the active site by identifying potential substrate trajectories and calculating the corresponding energy profiles of the transport process for both variants. Strikingly, the most plausible tunnel for SmP4H-0 highlighted multiple bottleneck residues (Val57, Glu111 and Arg274) which directly correspond to mutated residues in SmP4H-7 or are residues in close vicinity to the mutated sites ( Figure 4). Moreover, the increased bottleneck radius (1.1 Å vs 0.9 Å) indicates that the accumulated mutations in SmP4H-7 may facilitate substrate transport and thus increase the engineered enzymes' overall activity.
In summary, we describe the successful re-programming of an Fe/α-ketoglutarate-dependent hydroxylase to a halogenase. The halogenation activity of the re-programmed enzyme was improved via directed evolution which succeeded to increase the apparent k cat by 12-fold and k cat /K m by 98-fold compared to the initial halogenation biocatalyst. The evolved halogenase variant SmP4H-7 displays a striking regio-divergent reaction chemistry and, to our knowledge, is the first example of an enzyme that can regio-and stereoselectively halogenate Lproline at the C3-position. As current bioinformatic discovery rates for novel halogenases acting on freestanding substrates are low, [15] the re-programming of Fe/α-ketoglutarate-dependent hydroxylases will help to complement the biocatalytic toolbox of the synthetically valuable aliphatic halogenases. Using bioinformatic analysis tools, we have shed first light on the factors governing reaction outcome (hydroxylation vs halogenation) in these enzymes representing another example of tailoring biocatalysts to carry out alternate reactions. [42] Future efforts might profitably concentrate on the structural analysis of re-programmed halogenases as this could lead to a [a] SmP4H-0: D108G, SmP4H-7: D108G/V57L/S107T/D113E/T115P/R274H.  . CaverDock analysis of SmP4H-0 (cyan) and SmP4H-7 (magenta) resulted in the identification of two substrate access tunnels. Strikingly, 3 out of 5 mutated residues in SmP4H-7 represent bottleneck residues in the tunnel identified in SmP4H-0 (cyan), namely, V57, D113 and R274 (shown in blue).
deeper insight into structure-function relationship of Fe/αketoglutarate-dependent dioxygenases.