Engineered Enzymes Enable Selective N‐Alkylation of Pyrazoles With Simple Haloalkanes

Abstract Selective alkylation of pyrazoles could solve a challenge in chemistry and streamline synthesis of important molecules. Here we report catalyst‐controlled pyrazole alkylation by a cyclic two‐enzyme cascade. In this enzymatic system, a promiscuous enzyme uses haloalkanes as precursors to generate non‐natural analogs of the common cosubstrate S‐adenosyl‐l‐methionine. A second engineered enzyme transfers the alkyl group in highly selective C−N bond formations to the pyrazole substrate. The cosubstrate is recycled and only used in catalytic amounts. Key is a computational enzyme‐library design tool that converted a promiscuous methyltransferase into a small enzyme family of pyrazole‐alkylating enzymes in one round of mutagenesis and screening. With this enzymatic system, pyrazole alkylation (methylation, ethylation, propylation) was achieved with unprecedented regioselectivity (>99 %), regiodivergence, and in a first example on preparative scale.


Introduction
Ther egioselective alkylation and functionalization of molecules bearing multiple heteroatoms with similar properties is ap articular challenge in synthesis. [1] This is especially true for N-heterocyclic compounds such as pyrazoles,t riazoles or pyridones where tautomerization leads to heteroatoms with comparable reactivity.A lkylation of such Nheterocyclic compounds is under substrate control and produces product mixtures that are often laborious to separate ( Figure 1A and Figure S1). [2] Selective alkylation depends on protecting group strategies [3] and ag eneral catalyst-controlled alkylation does not exist. [4,5] Given the significant number of CÀNa lkylations conducted in medicinal chemistry [6,7] and based on the importance of such Nalkylated heterocycles in biologically active agents (Figure S2), [8] it becomes clear why selective alkylation methods are on top of many wish lists for organic chemistry. [4,5,9] New methods that form such CÀN bonds selectively could shorten current synthetic routes and even make new molecules accessible that were previously difficult to prepare.
In nature,selective heteroatom alkylation is carried out by various well-known enzymes with unprecedented precision and activity. [10,11] Prenyl-, methyl-and glycosyltransferases S-adenosyl-l-homocysteine (SAH) is generated as byproduct and can be used by halide methyltransferases to regenerate SAM using iodomethane as methyl source. [12] (C)The research outlined here uses promiscuoushalide methyltransferases to generate and recycle nonnatural SAM analogs (NSA) from simple haloalkanes. In combination with engineered alkyl transferases highly selective pyrazole alkylation is envisioned.
assemble C-heteroatom bonds with high selectivity on alarge set of substrates including small molecules,peptides,proteins and DNA. Enzymes achieve excellent selectivities in such challenging reactions because their active sites offer ah igh level of molecular recognition to accurately pre-organize substrates prior to catalysis.Abroad application of alkyl transferases in synthesis is,h owever, hampered by the complexity of the used alkyl donors.
Enzyme-catalyzed selective heteroatom alkylation largely depends on alkyl donors with leaving groups that are synthetically difficult to access,u nstable and/or lead to low atom economy in the overall reaction ( Figure S3). If we were able to combine enzymatic alkylation with the utilization of simple and readily available alkyl donors,solutions for many challenging reactions could be envisioned.
Recently,s ignificant progress towards enzymatic alkylation chemistry using "off the shelf" alkylation reagents has been made,inparticular exploiting SAM-dependent methyltransferases (MTs). MTs use the cosubstrates S-adenosyl-lmethionine (SAM) and an atural carboxy-analog [17] of SAM in highly selective methylation and carboxymethylation reactions.M ultiple studies have further demonstrated that MTs are functional with non-natural SAM analogs (NSA) to selectively transfer ah uge variety of alkyl groups. [11,18] However,t he synthesis of NSA is currently not particularly straight forward ( Figure S4). Several elegant strategies for enzymatic synthesis of NSA have been developed. [19][20][21] These approaches still depend on the laboratory synthesis of methionine analogs as NSA precursors that limits their application. [11,18,22] Liao and Seebeck have recently shown that SAM can be enzymatically synthesized and recycled ( Figure 1B). [12] In this cyclic enzyme cascade,ahalide methyltransferase charges S-adenosyl-l-homocysteine (SAH) with iodomethane to generate SAM. As econd MT uses SAM to methylate asubstrate with high selectivity.A tthe same time SAH is regenerated and is therefore only required in catalytic concentrations to shuttle the methyl group between both enzymes.C urrently,t here is very little evidence that this system can be used beyond methylation to synthesize and recycle NSA directly from simple haloalkanes ( Figure S4).
Here we report selective N-alkylation of pyrazoles by engineered enzymes in ac yclic two enzyme cascade (Figure 1C). Them ethod was implemented based on two major findings.F irst, an identified promiscuous halide methyltransferase can synthesize and recycle NSA using haloalkanes as sole stoichiometric reagents.Second, acomputational enzyme library design tool [13] successfully transformed as econd promiscuous MT into apyrazole-alkylating enzyme family in one round of mutagenesis and screening. Thee nzyme system catalyzed desired CÀN bond formations with unprecedented selectivities using simple haloalkanes as starting materials.

Results and Discussion
Enzyme Engineering to Generate Pyrazole Methyltransferases Because pyrazoles are extremely rare in nature, [23] pyrazole MTs are currently not known. As ar esult, we aimed to access this enzyme function by engineering ap romiscuous natural MT.I no rder to identify as uitable starting point for enzyme engineering,w ee xamined several MTs for promiscuity towards pyrazole methylation using SAM as cosubstrate.I np articular, we studied the homo sapiens nicotinamide N-methyltransferase, [24] phenylethanolamine N-methyltransferase [25] and histamine N-methyltransferase [26,27] due to their reported substrate flexibility.O ut of these 3e nzymes, only the homo sapiens nicotinamide N-methyltransferase (NMT) revealed promiscuous activity with 3-methylpyrazole (1)a ss ubstrate (Figure 2A). Ther espective products were formed with 14 %a sm ixture of regioisomers (67:33 for 1a:1b). This illustrates that wild-type NMT can hardly discriminate between the two N-atoms of 1.
We sought to improve the enzymespromiscuous activity by using computational enzyme library design. [16] While traditional directed evolution protocols optimize promiscuous activity for one substrate towards one chosen product, [28] we aimed to generate aw hole panel of transferases with activity for ab road range of substrates potentially accessing both product isomers with high selectivity.T ot his end, we used arecently developed algorithm by Sarel Fleishman and co-workers called FuncLib. [13] This protein design method can efficiently calculate the theoretically tolerated sequence space in enzyme active sites. [28] Phylogenetic and atomistic calculations are combined to analyze the stability of multiple active site mutations in silico. We have chosen 12 active site amino acids of NMT to computationally explore the stability of multiple mutations in the active site.These 12 amino acids build the substrate binding pocket but do not directly interact with the SAM cosubstrate ( Figure 2C and Figure S5). The FuncLib algorithm was applied as described in the supporting information.
In short, the phylogenetic analysis and mutational stability calculations at these 12 amino acid positions reduced the active site sequence space to 473.294 possible variants.T his pool was then further analyzed in silico by calculating the stability of the designs bearing 3-5 active site mutations.T he most stable designs were clustered based on their sequence diversity (cluster criterion:a mino acid difference ! 3). We decided to test the top 50 sequences,all bearing 3-5 active site mutations and differ in at least 3r esidues (see Figure S6). In this way,l arge and potentially meaningful steps in the active site sequence space can be experimentally investigated without depending on ultrahigh-throughput screening methods [29] ( Figure 2B).
Theenzyme panel (v1-v50) was bought as synthetic DNA, produced in E. coli and tested for methyl transfer activity using aset of six structurally different pyrazoles as substrates ( Figure 3). Screening of the library was performed using cell lysate in deep-well plate format with SAM as cosubstrate ( Figure S7). Thep erformance of the variants was analyzed using gas chromatography.E ven though the designs introduced 3-5 active site mutations simultaneously, > 90 %ofthe designs were active and converted at least one pyrazole substrate ( Figure S8-S13). Depending on the substrate,1 0-30 %o ft he variants showed increased activity and/or selectivity compared to the NMT wild-type enzyme.N ext, we purified aselected set of the best enzymes ( Figure S14) to characterize the selectivities and activities in detail. The selected pyrazole MTs perform CÀN bond formation with very high catalytic control, in some cases with regioselectivities of > 99 % ( Figure 3a nd Figure S15-S20). Catalyst control could even be demonstrated for 1 and 3,4-dimethylpyrazole (2)a ss ubstrates for which the pyrazole tautomers differ only in the relative position of asmall methyl group (see Figure 1A). High regioselectivities and in part regiodivergence were also found for 3-substituted pyrazoles bearing cyclopropyl (3)and (hetero)-aromatic groups (4 and 5)aswell as for pyrazoles with bulky substituents on the 4-position (6) (see Figure 3and Figure S15-S20 for details). Please note that NMT wild-type ( Figure 3) as well as chemical methylations with standard reagents ( Figure S1) generate product mixtures for all the tested pyrazole substrates. [3,30] There are currently no catalysts known that control pyrazole methylation or alkylation with such precision. [4,5] In addition to regiocontrol, we studied the activities of the selected pyrazole MTs using stoichiometric amounts of SAM as cosubstrate.Because MTs in general and the homo sapiens nicotinamide N-methyltransferase in particular are known to suffer from inhibition of the SAH by-product (Figure 1B), [31,32] we decided to validate activity based on initial rate determination using uniform substrate and enzyme concentrations ( Figure 3). Activity comparison by initial rate determination is most likely not biased by potential SAH inhibition. In general, many designs showed activity enhancements by af actor > 10 compared to the NMT wild-type (Figure 3a nd Figure S21-S26). In some cases,v ery high increase in activity up to afactor of 72 and 118 was found. This is remarkable because the variants performed with excellent selectivities at the same time.Such activity enhancements are outstanding and usually not observed with conventional mutagenesis strategies. [33] Iterative saturation mutagenesis or random mutagenesis often show activity enhancements by afactor of 2-4 per round of evolution. [28,[34][35][36][37] Six different enzyme variants (v17, v22, v28, v36, v40, v49) with increased activity and selectivity for the six pyrazole substrates were studied in detail (Figure 3). In these six designs,7out of the 12 target active site amino acids were mutated ( Figure S6). Three positions (D167, S201, S213) were altered in all of the six variants and two positions (A198, N249) were mutated in ! 50 %. Interestingly,these five amino acids are spatially all located next to each other (see top of the active site in Figure 2C)a nd might generate the binding pocket for the different substituents on the pyrazole ring.  [13] acomputational enzyme library design tool that calculates the sequences pace of tolerated active site mutations. An advantage to classical directed evolution approaches such as iterative saturation mutagenesis [14] is that FuncLib allows large and meaningfulsteps in sequences pace by introducing multiple active site mutations simultaneously. Figure 2B is inspired by recently published, excellentreviews. [15,16] (C)Structure of NMT (pdb 2iip): Active site amino acids that have been randomized using FuncLib are shown as colored sticks. Cocrystallized SAH is shown as black sticks and the shape of the substrate binding pocket is illustrated as grey shadow.
These mutable amino acids are part of the loops connecting the 4-5, 5-6 and 6-7 b-strands of the methyltransferase Rossmann-fold. [38] Thei ntroduced beneficial mutations at these five hot-spot positions are very diverse (D167C/H, A198L/T,S 201A/C/I/Q,S 213A/C/H/M and N249C/A) and general beneficial mutations for pyrazole methylation cannot be identified. As recently reported, the multiple mutations introduced by FuncLib often show non-additive behavior. [13] Such epistasis explains large steps in activity and selectivity, but makes it difficult to generalize the mode of action. We believe that the single set of mutations in each top variant enables selective binding of the pyrazole substrate in ar eactive conformation. This is in agreement with literature highlighting that methyltransferases bind their substrates in anear-attack conformation to achieve efficient catalysis. [39] Figure 3. (A)S elected variants were characterizedb yanalyzing the initial rates using 2mMpyrazole,2mM SAM and 50 mMenzyme. All enzymatic reactions were performed in triplicates.P lease note that all chemical methylations using reagents as well as reactions with NMT wildtype generate product mixtures (see Figure S1 and S15-S20). a) Activity increase is calculated by dividing the initial rate of the variant by the initial rate of NMT wild-type. (B)A sanexample, GC chromatograms for the conversion of 6 with selected variants (red) are shown and compared to wildtype (black) as well as product standards (top). These examples demonstratethe efficiency of FuncLib to convert apromiscuousstarting point (NMT wild-type) into highly selective variants with significant increase in activity in just one round of mutagenesis and screening. GC retention time 6a:6.4 minutes. GC retentiont ime 6b:6.9 minutes (see Figure S20).

Identification of Promiscuous Halide Methyltransferases
After establishing the selective methylation of pyrazoles, we aimed to expand the system towards selective alkylation. Ther emarkable SAM cofactor recycling system by Liao and Seebeck [12] (Figure 1B)h as the potential to enzymatically synthesize and recycle non-natural SAM analogs (NSA) from readily available haloalkanes as alkyl donors. [40] While this system is currently limited to methylation, [12,41,42] we envisioned to expand its application towards general alkyl transfer. Key in this endeavor is to find promiscuous halide methyl transferases (HMT) that accept different haloalkanes apart from iodomethane as substrates.T his would enable synthesis and recyclingofNSAs directly from SAH (see Figure 1B). As as tep in this direction, we report here af ungal HMT from Aspergillus clavatus that accepts haloethane,-propane and -butane as substrates for NSA synthesis.Inthe following,we will refer to this enzyme as NSA-synthase.The NSA-synthase was identified by studying the substrate scope of six literatureknown HMTs,i ncluding enzymes from bacterial, fungal und plant origin ( Figure S28). Thee nzymes were bought as synthetic DNAa nd produced in the SAH-nucleosidase deficient E. coli strain JW0155. [12] Substrate scope screening was performed with cell lysates using SAH and several haloalkanes as substrates (haloalkanes include iodomethane, iodoethane,b romoethane,1 -iodopropane,1 -iodobutane and 1-bromobutane,s ee Figure S28). While most of the tested HMTs were very restrictive accepting only iodomethane as substrate,t he identified NSA-synthase converted haloethanes,-propanes and -butanes.Notably,next to iodoalkanes the corresponding bromoalkanes are also substrates of the NSA-synthase.Control experiments without enzyme,SAH or alkyl donor confirmed that the NSA formation is enzymecatalyzed ( Figure S29). As ag eneral trend, we observed that the activity of enzymatic NSA synthesis decreased with increasing alkyl chain length of the haloalkanes ( Figure S30). While the ethyl analog was synthesized with > 80 %p roduct formation, the propyl analog was generated with 12 %and the butyl analog with 3.6 %. After confirming the formation of the NSA by mass spectrometry (Figure S31), we aimed to combine the NSA-synthase with the engineered pyrazole MTs to study enzymatic pyrazole alkylation.

Enzymatic Pyrazole Alkylation Using Simple Haloalkanes
Since NSA formation was most efficient for ethyl and propyl analogs ( Figure S30), we focussed our proof-of-concept studies on the enzymatic ethylation and propylation using enzyme v36 and 3-cyclopropylpyrazol (3)a ss ubstrate (Figure 3). This enzyme-substrate combination was selected based on the high activity and selectivity in pyrazole methylation ( Figure S27). To set up the cyclic enzyme cascade consisting of v36 and the NSA-synthase,w eu sed iodomethane as alkyl donor and studied the productivity of the system using different SAH concentrations.S urprisingly,o ur first experiments revealed that residual SAH bound to the purified enzymes (v36 and NSA-synthase) was sufficient for efficient methylation of 3.F urther addition of SAH (0.05 equiv.with respect to the substrate) did not change the productivity of the system ( Figure S32). Next, we examined the formation of the methylated product as af unction of iodomethane excess.Highest product formation of 3b (62 %, Figure 4) was achieved using 10 equivalents of iodomethane. However,e quimolar concentrations of iodomethane with respect to the substrate 3 reduced the product formation only slightly to 47 %( Figure S33). In all cases 3 was methylated with very high regioselectivity (> 99 %) to generate 1-methyl-5-cyclopropylpyrazole (3b). To demonstrate that enzymatic pyrazole alkylation can be performed on ap reparative scale (1.0 mmol), the methylated pyrazole 3bwas synthesized using iodomethane as alkyl donor ( Figure S34). Thep roduct was generated with 37 %i solated yield and with very high regioselectivity (97 %). 55 %o fp ure starting material was recovered after the reaction.
Finally,w ea imed to prove general enzymatic alkyl transfer using iodoethane,b romoethane and 1-iodopropane as starting material. Even though v36 was originally selected for methyl transfer, 3 was ethylated and propylated using the enzyme system (Figure 4). Thee nzymatic alkylation of 3 generated 1-ethyl-5-cyclopropylpyrazole (7b)and 1-propyl-5cyclopropylpyrazole (8b)w ith very high regioselectivites of > 98 %a nd > 97 %( Figure S35 and S36). Them oderate to low yields achieved in these initial enzyme cascade reactions might originate from several causes and require detailed analysis.I nasimple scenario,v 36 is efficient in methylation but inefficient in ethylation and propylation of 3.Itisbelieved that SAM-dependent methyltransferases function by binding the substrate and cosubstrate in areactive conformation, prealigning the orbitals for efficient and selective catalysis. [39] A variant chosen for methylation may therefore be unproductive in binding pyrazole 3 in the same reactive conformation in the presence of al arger ethyl or propyl group.A lternatively,t he methyltransferase could suffer from product inhibition. Finally,the alkylation of SAH can produce NSA in two epimeric forms as such sulfonium species have as tereochemically active lone pair. It is currently not clear which epimer is generated by the NSA synthase,a nd it is also unknown whether the engineered methyltransferases such as v36 accept both epimers as cosubstrate.T hese questions will be addressed in future studies.

Conclusion
Our studies highlight apotentially generalizable approach to obtain catalyst-control in challenging alkylation reactions. In particular,w es how that selective alkylation of pyrazoles can be achieved using simple haloalkanes,e ngineered enzymes and an expanded cosubstrate pool. An important step was the identification of ap romiscuous halide methyltransferase (named NSA-synthase) that enzymatically generates and recycles non-natural analogs of SAM. An additional central part was the engineering of as econd promiscuous transferase that is active on pyrazoles as substrates.W e believe that this approach together with important data published elsewhere [12,42] will open up interesting new avenues for enzyme engineering to enable selective C-heteroatom bond formation in various complex molecules.M any of these reactions are highly desired and have so far eluded catalytic-selective synthesis. [4,5,9] Ther esearch outlined here also supports that computational enzyme library design tools such as FuncLib [13] are very effective.Over 90 %ofthe designs are active,10-30 %ofthe variants were more active and/or selective than the wildtype enzyme and we identified highly selective variants with significant activity improvements at the same time.T he strength of FuncLib is to successfully introduce multiple active site mutations simultaneously.T his enables large steps in sequence space accompanied with drastic increase in activity and selectivity.F uncLib transformed ap romiscuous starting point (NMT wild-type) into asmall pyrazole alkylating enzyme family in just one round of mutagenesis and screening.T his artificial enzyme family accepts structurally diverse pyrazoles as substrates,ishighly regioselective and in many cases regiodivergent (Figure 3). It will be interesting to see whether computational enzyme library design tools can be combined with machine learning [15] or performed iteratively to further speed up directed enzyme evolution. [43]