Structure‐Guided Modulation of the Catalytic Properties of [2Fe−2S]‐Dependent Dehydratases

Abstract The FeS cluster‐dependent dihydroxyacid dehydratases (DHADs) and sugar acid‐specific dehydratases (DHTs) from the ilvD/EDD superfamily are key enzymes in the bioproduction of a wide variety of chemicals. We analyzed [2Fe−2S]‐dependent dehydratases in silico and in vitro, deduced functionally relevant sequence, structure, and activity relationships within the ilvD/EDD superfamily, and we propose a new classification based on their evolutionary relationships and substrate profiles. In silico simulations and analyses identified several key positions for specificity, which were experimentally investigated with site‐directed and saturation mutagenesis. We thus increased the promiscuity of DHAD from Fontimonas thermophila (FtDHAD), showing >10‐fold improved activity toward D‐gluconate, and shifted the substrate preference of DHT from Paralcaligenes ureilyticus (PuDHT) toward shorter sugar acids (recording a six‐fold improved activity toward the non‐natural substrate D‐glycerate). The successful elucidation of the role of important active site residues of the ilvD/EDD superfamily will further guide developments of this important biocatalyst for industrial applications.


Generation of mutant and production of dehydratases variants
The native sequence (WT) of SsDHAD, FtDHAD, and PuDHT were cloned to pET28 as described in previous studies. All mutants were generated using either standard QuikChange protocol, or twostep PCR (PCR without annealing temperature), or overlap extension PCR. The list of primers is presented in Table S7. The plasmid containing mutant genes were sent for sequencing to confirm the mutation. The correct plasmid was then used to transform E. coli BL21 (DE3). Expression of all variants of FtDHAD and PuDHT was done as previously described. All the variants contain hexahistidine tag in the N-terminus, thus were purified using Ni NTA column and the final buffer was exchanged to 50 mM HEPES pH 8 as described previously. [1] For SsDHAD WT and its respective variants, the expression was performed in autoinduction media at 25 °C as described previously. [2] Due to the instability of SsDHAD in the presence of imidazole, the enzymes were heat purified in the presence of β-mercaptoethanol as described previously. [2] The final buffer was exchanged to 50 mM HEPES pH 8.

Kinetic characterization
All sugar acids were either purchased or synthesized as described previously. [1] Dihydroxy isovalerate (DHIV) was synthesized from pyruvate using two step enzymatic approach (acetolactate synthase from Bacilus subtilis (BsALS), ketolacid reductoisomerase from Meiothermus ruber (MtKARI) and formate dehydrogenase from Candida boidinii (CbFDH) for cofactor recycling). The yield and concentration of DHIV is estimated by HPLC.
Final concentration of each substrate used was 25 mM. The activity was calculated by following the formation of corresponding product over time up to 24 h. SsDHAD and FtDHAD activities were determined at 50 °C, while PuDHT was measured at 30 °C. The lowest measurable activity is 0.034 mU/mg, recorded at the lowest concentration of the standard (0.1 mM) by HPLC after a 24 h reaction with 2.5 mg/ml enzyme (maximum enzyme used).

Screening of dehydratase libraries
Expression of the dehydratases libraries (Ft_P73X, Pu_I73X, Pu_H653X, PuDHT_N652_H653insX) were performed in 96-Deep Well Plates (DWP). In short, E. coli BL21 (DE3) harbouring plasmids expressing respective libraries were grown on LB-agar containing kanamycin (50 µg/ml). Single colony was inoculated in each well of 96-DWP containing autoinduction media with kanamycin (100 µg/ml). The culture was grown into saturation at 25 °C for 36 h by shaking the DWP at 1000 rpm. The culture was harvested by transferring 200 µl of cell culture to 96 U-bottom plate and centrifuging the plate at 4000 xg for 15 min. After the supernatant was decanted, the cell pellet was stored at -80 °C until further use.
The cell was lysed by transferring 200 µl of lysis buffer (50 mM HEPES pH 8, 5 mM MgCl2, and lysozyme 1 mg/ml) and incubating the plate at 1000 rpm, 40 °C for 1 h. The cell debris was pelleted by centrifugation at 4000 xg for 15 min. Activity toward D-glycerate was measured using coupled assay using lactate dehydrogenase (LDH) for porcine heart (purchased from Sigma) and NADH. Activity toward L-threonate and DHIV was measured using coupled assay using the thermostable variant of Lactococcus lactis ketoacid decarboxylase (7M.D) and alcohol dehydrogenase from Bacillus stearothermophilus (BstADH) and NADH. Activity toward D-gluconate was measured by coupled assay using aldolase from Picrophilus torridus (PtKdgA) and LDH and NADH. The oxidation of NADH was monitored at 340 nm. All coupling enzymes were used in excess. Wild type (WT) enzyme of FtDHAD was used as control for Ft_P73X, while PuDHT WT was used for (Pu_I73X, Pu_H653X, PuDHT_N652_H653insX). List of auxiliary enzymes are presented in Table S8.

Computational Biology
Template identification and homology modelling To identify potential templates for the homology modelling, a protein BLAST [3] search was performed against the PDB database on the sequences of the target enzymes (FtDHAD, PuDHT and SsDHAD). This resulted in four possible templates (Table S5). A multiple sequence alignment (MSA) was performed with Clustal Omega [4] using these four template DHADs and the target DHADs, and visualized using ESPript. [5] From the templates within the same DHAD class (as described in the main text) as the target DHAD, the two structures with the highest sequence identity to the target DHAD were selected to serve as template during the homology modeling. Only the chains showing the active loop conformation, i.e. the loop containing the catalytic serine were used. For AtDHAD, this active loop conformation was modelled inside the structure based on the structure of MtDHAD using Modeller release 9.18 [6] before the generation of the homology models. Since we generate homology models of the dimer, template structures with only one monomer were duplicated and manually converted into a dimer structure. The homology models were generated using Modeller; five different models were generated for each target DHAD, and the top-ranked model based on the DOPE score was selected. The binding sites of these homology models were further refined by converting the Mg 2+ coordinating lysine to a carboxylated lysine (KCX), after which the sidechain was placed such that the Mg 2+ ion was coordinated in an octahedral geometry. Furthermore, conserved water molecules were manually placed inside the binding site. The protonation state of the protein residues was predicted with the PROPKA3.0 software package [7,8] at pH 7.5 containing the catalytic serine in its deprotonated form, followed by a visual check. A QMEAN analysis was performed to get a quality estimation of the generated homology models. [9] Both QMEAN4 and QMEAN6 were calculated, where the first is a scoring function consisting of a combination of structural aspects, and the latter additionally contains terms describing the agreement of the structure with sequence-based predictions of secondary structure elements and solvent-accessibility. The QMEAN calculations were performed via the Swiss-Model webserver: https://swissmodel.expasy.org/qmean/

System preparation, parameterization and simulation setup
There are no straightforward parameters for Molecular Mechanics-based simulations of the [2Fe-2S] cluster. However, because of its role in substrate binding and catalysis, proper sampling of this cluster is important. Therefore, we parameterized a bonded model of the [2Fe-2S] cluster in DHADs. We applied a similar strategy as described by Carvalho et al. [10] , but with Fe2 having a vacant coordination site. The parameterization was performed on the MtDHAD structure (PDB: 6ovt), as this structure is applied as template for two of the target DHADs, and is resolved with a high resolution. [11] The two cysteine residues coordinating Fe1 were treated equivalently during the parameterization, as well as the two sulfur ions in the [2Fe-2S] cluster. The geometry of the 'small model' generated by MCPB.py [12] , consisting of the [2Fe-2S] and sidechains of the coordinating cysteine residues, was optimized at the B3LYP/6-31G(d) level of theory using Gaussian09 (revision E.01) [13] , after which the bonded terms were parameterized using the Seminario [14] method on the optimized model. Charges were fitted applying the RESP procedure [15] based on a Merz-Singh-Kohlmann population analysis [16,17] on a model consisting of the [2Fe-2S] cluster and capped coordinating cysteine residues. The ChgModB scheme was applied, meaning that all charges of the backbone heavy atoms were restrained to the values from the ff14SB [18] force field. An additional improper dihedral term was added (force constant: 10 kcal·mol -1 , periodicity: 2, phase: 180 degrees) to the [2Fe-2S] cluster to ensure an in-plane conformation of this cluster. The final parameters are provided in Table S6. The nonbonded parameters of the Mg 2+ ion were set to the values of the IOD set from Li et al. [19] . The substrates were parameterized as follows: Van der Waals parameters of the substrate atoms were taken from the General Amber Force Field [20] (GAFF), and the charges were retrieved from a RESP fit based on a HF/6-31G(d) geometry optimized structure of the substrates. The force field parameters for the deprotonated serine residue were retrieved via the R.E.D. server. [15,[21][22][23] Force field parameters for the deprotonated serine which were not in the ff14SB force field were taken from GAFF.
The protonated structures (see above) were solvated in an octahedral box consisting of TIP3P [24] water applying a buffer region of 12 Å, and counterions were added to neutralize the system using the LeaP module of the Amber18/AmberTools18 software package [25] . A distance restraint between the Mg 2+ ion and Fe2 from the [2Fe-2S] cluster with force constant 100 kcal·mol -1 ·Å -1 was applied to retain a proper binding site geometry during all simulations. Energy minimization was performed with the XMIN method in Sander from the Amber18/AmberTools18 software package applying a 20.0 kcal·mol·Å -2 positional restraint to the protein atoms, gradually bringing the density from 0.8 g·cm -3 to 1.0 g·cm -3 with a step size of 0.02 g·cm -3 by adjusting the box size. At target density, a final minimization step was performed without positional restraints. The system was subsequently heated with the following procedure: 10 ps from 0-5 K, 5-10 K, and 10-20 K respectively, in a NVT ensemble, applying the Langevin thermostat [26] with a collision frequency of 7.0 ps -1 , and a 3.0 kcal·mol -1 ·Å -2 positional restraints on all atoms. Subsequently, the system was further heated applying the positional restraints solely on the substrate and backbone heavy atoms in the following sequence: 50 ps 20-50 K, 100 ps 50-100 K, 100 ps 100-200 K, followed by a 200 ps equilibration at 200 K without positional restraints. Afterwards, the temperature was increased to the target temperature (300 K for PuDHT and MtDHAD, 323 K for FtDHAD and SsDHAD) in 400 ps. Finally, the system was equilibrated for 500 ps in a NPT ensemble at the target temperature, applying the Berendsen barostat [27] with a relaxation time of 1 ps and compressibility of 44·10 -6 bar -1 to keep the pressure constant at 1 bar. Furthermore, periodic boundary conditions were applied, and the SHAKE algorithm [28] was used on all bonds involving hydrogen atoms. A nonbonded cut-off of 12 Å was used, the particle mesh Ewald method [29] was applied for long-range electrostatics, and an integration step of 1 fs was used. After the heatup, a 100-200 ns production Molecular Dynamics (MD) simulation was performed for each system using the pmemd.CUDA MD engine from Amber18, saving the velocities and coordinates of all atoms every 10 ps.

Molecular docking and in silico mutations
Docking simulations were performed with AutoDock 4.2.6 [30] . A 60x60x60 points docking grid with 0.375 Å spacing, centered on the middle between the Mg 2+ ion and Fe2 from the [2Fe-2S] cluster, was generated with AutoGrid. For each docking simulation, 150 poses were generated with a population size of 150, and a maximum of 2.5·10 6 energy evaluations with AutoDock. In the topranked poses obtained using the AutoDock scoring function, the substrates were mostly bound in a catalytically non-competent orientation; these poses were therefore assumed to be incorrect (see detailed description in main text). Since there are no experimental structures of a dehydratase with a bound substrate, the scoring function could not be refined to improve the performance to rank the docked poses. Additionally, the presence of the highly polarized [2Fe-2S] cluster and Mg 2+ ion is also likely to impair the performance of the scoring function in these systems. Therefore, we selected suitable docking poses based on their ability to promote the dehydratation reaction as described in the main text. In practice, the following distances were measured for every pose, and summed up to serve as docking score (in case of the substrate carboxyl oxygen, the distance to the closest oxygen was measured): the distance between a carboxyl oxygen and the hydrogen (NH) from the backbone of Thr262, the distance between a substrate carboxyl oxygen and the OH of the side chain of Thr262, the distance between a substrate carboxyl oxygen and the Mg 2+ ion, the distance between C2-OH and the Mg 2+ ion, the distance between C2-H and the oxygen of Ser552 and the distance between C3-OH and Fe2. Visualization and in silico mutagenesis was performed with PyMol [31] using Dunbrack's backbonedependent rotamer library [32] . Figure S1. Full Multiple Sequence Alignment (MSA) of target-and template DHAD sequences. Enzymes in bold are the enzymes studied here. The numbers above the alignment indicate the alignment numbers, which are used throughout the manuscript. Helices and arrows above the MSA indicate the location of alpha-helices and beta-sheets, respectively, as obtained in the X-Ray structure of MtDHAD (PDB-ID: 6ovt). Residues with red background indicate fully conserved residues at the respective alignment position, and residues in red font indicate a certain degree of conservation within the studied sequences. Blue boxes indicate cross-group conservation. FtDHAD      Tables   Table S2. Quality estimation of homology models.

PuDHT
RlArDHT & CcXylDHT -0.62 -1.05 Table S3. Relative activity of alignment position 554/555 variants. The activity is measured relative to the wild-type. Alignment numbers are used for residue numbering.