Breaking a Strong Amide Bond: Structure and Properties of Dimethylformamidase

Dimethylformamidase (DMFase) breaks down the human-made synthetic solvent N,N-dimethyl formamide(DMF) used extensively in industry(1). DMF is not known to exist in nature and was first synthesized in 1893. In spite of the recent origin of DMF certain bacterial species like Paracoccus, Pseudomonas, and Alcaligenes have evolved pathways to breakdown DMF and use them as carbon and nitrogen source for growth(2, 3). The structure of DMFase from Paracoccus and the biochemical studies reported here provide a molecular basis for its stability, substrate specificity and catalysis. The structure reveals a multimeric complex of the α2β2 type or (α2β2)2 type. One of the three domains of the large subunit and the small subunit are hitherto undescribed folds and as yet of unknown evolutionary origin. The active site is made of a distinctive mononuclear iron that is coordinated by two tyrosine residues and a glutamic acid residue. The hydrolytic cleavage of the amide bond is catalyzed at the Fe3+ site with a proximal glutamate probably acting as the base. The change in the quaternary structure is salt dependent with high salt resulting in the larger oligomeric state. Kinetic characterization reveals an enzyme that shows cooperativity between subunits and the structure provides clues on the interconnection between the active sites. Significance Statement N,N-dimethyl formamide(DMF) is a commonly used industrial solvent that was first synthesized in 1893. The properties that make DMF a highly desired solvent also makes it a difficult compound to breakdown. Yet, certain bacteria have evolved to survive in environments polluted by DMF and have enzymes that breakdown DMF and use it as their carbon and nitrogen source. The molecular structure of the enzyme that breaks down the stable amide bond in these bacteria, reveals two new protein folds and a unique mononuclear iron active site. The work reported here provides the structural and biochemical framework to query the evolutionary origins of the protein, as well as in engineering this enzyme for use in bioremediation of a human made toxic solvent.


Abstract:
Dimethylformamidase (DMFase) breaks down the human-made synthetic solvent N,N-dimethyl formamide(DMF) used extensively in industry(1). DMF is not known to exist in nature and was first synthesized in 1893. In spite of the recent origin of DMF certain bacterial species like Paracoccus, Pseudomonas, and Alcaligenes have evolved pathways to breakdown DMF and use them as carbon and nitrogen source for growth (2,3). The structure of DMFase from Paracoccus and the biochemical studies reported here provide a molecular basis for its stability, substrate specificity and catalysis. The structure reveals a multimeric complex of the a2b2 type or (a2b2)2 type. One of the three domains of the large subunit and the small subunit are hitherto undescribed folds and as yet of unknown evolutionary origin. The active site is made of a distinctive mononuclear iron that is coordinated by two tyrosine residues and a glutamic acid residue. The hydrolytic cleavage of the amide bond is catalyzed at the Fe 3+ site with a proximal glutamate probably acting as the base. The change in the quaternary structure is salt dependent with high salt resulting in the larger oligomeric state. Kinetic characterization reveals an enzyme that shows cooperativity between subunits and the structure provides clues on the interconnection between the active sites.
Significance Statement: N,N-dimethyl formamide(DMF) is a commonly used industrial solvent that was first synthesized in 1893. The properties that make DMF a highly desired solvent also makes it a difficult compound to breakdown. Yet, certain bacteria have evolved to survive in environments polluted by DMF and have enzymes that breakdown DMF and use it as their carbon and nitrogen source.
The molecular structure of the enzyme that breaks down the stable amide bond in these bacteria, reveals two new protein folds and a unique mononuclear iron active site. The work reported here provides the structural and biochemical framework to query the evolutionary origins of the protein, as well as in engineering this enzyme for use in bioremediation of a human made toxic solvent.

INTRODUCTION
Dimethylformamide (DMF) is an organic solvent commonly used in chemical synthesis, leather, printing, and petrochemical industries; the extent of its usage is evident from the quantity of its production (4,5). Its polar nature accords properties similar to water, and its physio-chemical features make it versatile(miscible in many organic solvents and water, a high boiling point of 153 °C , etc.) (1, [6][7][8]. It is a known hepatotoxic and ecotoxic agent (9,10). DMF is remarkably resilient to chemical and photochemical decomposition. Biodegradation often emerges as an efficient process for its removal from the environment (11). In spite of the short history of DMF on earth (introduced in 1893), several microbes that can grow using DMF as the sole carbon source have been described, with most of them belonging to the phylum proteobacteria (Pseudomonas, Alcaligenes, Orchobactrum sp. and Paracoccus species) (2,3,(12)(13)(14). Pseudomonas uses an oxidative demethylation pathway to convert DMF into formamide. The product is subsequently converted into ammonia and formate by the enzyme formamidase. Other organisms encode a N, N-dimethylformamidase (DMFase) that catalyzes the formation of formate and dimethylamine (15). DMFase from different organisms have been characterized and all of them comprise of two polypeptide chains, a smaller polypeptide of 15 kDa and a larger polypeptide of 85 kDa. Sequence based homology searches do not show any similarity to other ubiquitous amidohydrolases, and the nature of its active site remained a puzzle. Here, we report structures of DMFase determined by electron cryo-microscopy and X-ray crystallography augmented with its biochemical characterization. Enzyme kinetic measurements with different substrates complements the structural data. Based on the structure, we performed site directed mutagenesis of specific residues to test their role in metal binding and catalysis. Together the data allows us to propose a plausible mechanism and a molecular explanation for the narrow substrate specificity of the enzyme.

RESULTS
Electron cryo-microscopy: When purified native or recombinant DMFase was imaged on ice by electron cryo-microscopy (cryoEM), the micrographs unambiguously revealed two populations in solution (Supplementary Data Fig. 1A). A smaller unit measuring ~100Å diameter and a larger structure ~150Å in diameter could be identified. Subsequent 2D classification showed that the smaller unit was a dimer and the larger structure was a dimer of dimers. The dimer of DMFase is made of two a and two b subunits with a total mass of 200 kDa. The dimer of dimers (called tetramer here) has a total mass of 400 kDa.
The initial data we collected had a mixture of populations (Supplementary Data Fig.1 A&B) and the reconstructions resulted in a 3.4 Å map for the dimer and a 3.8 Å map for the tetramer. The two populations observed by cryoEM was intriguing. We noted that salt concentration in the buffer shifts the oligomer (dimer-tetramer) equilibrium. In presence of low salt, the enzyme is predominantly in the dimeric form ( Supplementary Data Fig. 1C&D) and this equilibrium shifted to the tetramer at higher salt concentration ( ³200 mM) (Supplementary Data Fig.1 E&F). Subsequently, we collected two data sets with no salt and 200 mM NaCl yielding reconstructions with an overall resolution of 3Å (dimer) and 2.8 Å (tetramer) respectively (Fig. 1A, 1C, Supplementary Data Fig. 2,3). The local resolution plot of both maps showed much of the reconstruction was resolved between 2.5-3.5 Å (Supplementary Data   Fig. 2&3). The high quality of the maps allowed the de novo tracing of polypeptide chain of both the large and small subunits (Fig. 1B). We used the dimer map for the initial tracing and the fit of the  model to map is shown in Fig. 1A and 1B. The model from the dimer was used to obtain the tetrameric model ( Fig. 1C&D and Supplementary Data Fig. 3, Table 1). The interface between the two dimers in the tetramer is solely through the large subunit ( Fig. 1C and 1D).
Crystallography: Subsequently, we also determined the crystal structure of DMFase by molecular replacement using the coordinates from cryoEM data. The crystals belong to the space group P21 with two tetramers in the asymmetric unit and the model refined to a resolution of 2.8 Å ( Table 2). The structures determined by the cryoEM and X-ray crystallography are similar, with the differences observed are minor largely localized to the loop regions.
Two New Folds: All description of the structure described here is based on the best cryo-EM model.
The smallest unit is an ab dimer ( Fig. 2A). The large subunit consists of three different domains (Fig.   2B). Using PDBeFOLD (16), we find that domain I is made of residues 1-73 and 338-383 adopting an immunoglobulin (IgG)-like fold. Domain II is made of residues 74-337 and adopts the Pentraxin fold.
Domain III comprises of residues 384 -761 and a fold that is not identified by the PDBeFOLD or DALI servers (Fig. 2B). The core of the Domain III fold itself can be described as a/b/a fold, with five bstrands that are parallel and sandwiched between a-helices. Such an arrangement is seen in ThuA-like family of proteins. These arrangements are classified as part of the larger family of glutamine amidotransferases (17). The five conserved bstrands, plus the three extra b-strands together form a  sheet that forms the inside of the sandwich (Fig. 2B). The connecting region between the b-strands forms a sub-domain that is made of four anti-parallels b-strands that cap the structure. Domains I and III show a significant number of interactions but the interaction of domain II with other domains of the monomer is somewhat limited (Fig. 2A).
The small subunit of DMFase consists of four a-helices and two b-strands. There are two long helices at the N and C-terminus, with the N-terminal helix wedging into the interface between the two large subunits (Fig. 1C &1D). As a whole entity, the small subunit is also an undescribed fold. But, the two helices alone superpose very well with the two helices of the ESCRT-1(18) (PDB-ID 2F66). Most of `the interactions of the small subunit are with domain III of the large subunit ( Fig. 2A). The N-terminal residues of the small subunit interact with domain II and seem to stabilize the tertiary structure, by holding this domain from adopting other orientations with respect to domains I, and III ( Fig. 2A). The large subunit, when expressed and purified without the small unit was not enzymatically active   Table 1). The size exclusion chromatography profile suggests that it most likely exists as dimer albeit with a lower Tm (46°C) than the a2b2 enzyme (Supplementary Data Fig. 4). Thus, the small subunit is most likely to play a role in structural stabilization.

A Distinctive Metal Site:
The initial EM map clearly showed a large density around residues Y440, Y399 and E521 that could not be accounted for by amino acids (Fig. 3A). We predicted this to be the active site, which is buried in domain III. The exact nature of the metal ion as Fe was confirmed by an X-ray fluorescence spectroscopy at the synchrotron (Supplementary Data Fig. 5 Fig. 7; Table 1). The absence of the Fe in these proteins results in the interface region between the large and small subunit becoming disordered (Supplementary Data Fig. 8). We also mutated residues H519 and S395 that are close to the metal-binding center. Of these, S395A showed comparable activity to wild type DMFase but the H519A mutant was catalytically inactive (Supplementary Data Table 1). Despite not coordinating the Fe 3+ ion, H519 still plays a role in catalysis.
The coordination of Fe 3+ is strained and is close to a square pyramidal geometry with two vacant sites (in the absence of a modeled water). There is density at the active site that could easily accommodate at least one water molecule in both the EM maps and the crystallographic map, which has been currently modeled ( Fig. 3A&C). There are also some differences in the orientation of the residues in the active site in the final refined models of the EM and X-ray. However, the limited resolution does not allow us to interpret these differences with sufficient confidence. In the X-ray and EM maps, there remains unexplained additional density after the addition of Fe and water (Fig. 3A) and common buffer components such as phosphate, oxalate as well as DMF can be modeled. In the absence of other experimental evidence, we have chosen not to over interpret and left it unmodelled.
There are utmost 46 entries in the PDB where tyrosines act as coordinating ligands to a mononuclear Fe atom. The architecture of the two tyrosines binding to the Fe 3+ ion is similar to the one present in the structure of the ferric binding proteins from Nisseria gonorrhoeae (PDB ID 1d9y) and Yersinia enterocolitica (PDB ID 1xvx) (19). Both these also have a histidine coordinating the ligand and are not catalytically active. The architecture with two tyrosine hydroxyls and a glutamic acid side chain carboxylate as observed here is unique.
In order to confirm that the chemical environment around the iron atom is strained, we calculated the electronic energy of the iron atom and its surrounding environment using the experimentally observed structure (Fig. 3A) as the starting model. We compared this strain energy to the energy of a relaxed system (Fig. 3D). Electronic energies were calculated using density functional theory (DFT) with the B3LYP functional [20][21][22] Table 3). The kinetics data are a better fit to the Hill equation (with a Hill    Table 6).

DISCUSSION
The structure and experiments reported here provide insights into residues involved in substrate binding and catalysis. A large cavity is observed at the interface between the large and small subunits and this leads towards the active site (Fig. 4B). In the EM-structures of the active site mutants (Y440A and E251A), the interface between the large and the small subunit become disordered indirectly eluding a role for small subunit in the substrate entry pathway (Supplementary Data Fig. 8). The substrate binding pocket in the active site is constricted by bulky side chains consisting mostly of aromatic residues ( Fig   4C). The cavity formed by the hydrophobic residues is large enough to accommodate substitutions other than the dimethyl group on the amine. There is currently no structure of the DMF bound form of the protein. Given the nature of the reaction, one would presume that the carbonyl center is oriented towards the active site iron probably in coordination with the metal ion during the course of the enzyme catalyzed reaction. The hydrophobic pocket may assist in the binding and orientation of the substrate, while the charged residues help in directing the carbonyl group towards the Fe +3 .
Interestingly, Phenylalanine (F693) from the adjacent large subunit is part of the substrate binding pocket. This side chain also acts as constriction and would provide an explanation for selective substrate specificity and explain as to why larger amides are not efficiently hydrolyzed (Fig. 4C). A cross-talk between the two active sites in the dimer via the loop on which F693 is present is a distinct possibility. This residue and the loop may also be involved in the observed cooperativity in the enzyme kinetic experiments.
The heat of formation of the peptide bond is 2550 calories/mole and that of an amide bond is 5840 calories/mole (25). While, the exact energy of breaking the amide bond in DMF has not been measured, its stability and properties that make DMF an attractive solvent would suggest that this is a very stable bond and hence its heat of formation may be even higher. When a water molecule binds to Fe 3+ the water would make a better nucleophile that can attack the C-N bond of the formamide. Our first hypothesis was that the binding of the C=O to the Fe 3+ also will make the carbon a better electrophile, and together no other residue need be involved in catalysis. But the observation of Glu657 close to the active site suggested that it might be the catalytic base. One could think of a mechanism where the presence of the glutamate and the metal together provides the necessary catalytic power to break down the strong amide bond of DMF. We carried out site directed mutagenesis, and the Glu657Ala enzyme did not show any activity against DMF. Together one could propose a mechanism that involves the mononuclear Fe 3+ center, the Glu657 and the intermediate oxyanion being stabilized by Asn547. The catalytic cycle hence would involve a water bound to Fe 3+ as the ground state to which the substrate binds. The nucleophilic attack of the activated water on the predisposed amide bond results in the hydrolytic cleavage of the bond. This is followed by the release of the dimethylamine. The formate is displaced from the active site by the binding of a new water molecule (Supplementary data Fig. 18).
In summary, we report here the structure and properties of a newly evolved stubborn amidase. While the evolutionary link to the new folds is not yet apparent, it is tempting to hypothesize that the large size of the enzyme and the properties of the quaternary structure, are required to provide the enzyme with not only the halostability but also to stabilize the strained Fe +3 site. This strained metal coordination, the presence of the a nearby glutamate that acts as the catalytic base together, provide the necessary reduction in the activation energy for increased catalytic breakdown of the substrate. Further studies to tease out mechanistic details of this interesting reaction is in progress. Interestingly the use of enzymes for bioremediation has often been hindered due to lack of stability of the enzyme in the Thermal shift assay. Protein unfolding and stability was determined as a function of temperature. In experimental studies, a label-free thermal shift assay was performed using Tycho NT. 6 (NanoTemper Technologies). Pre-dialyzed protein and mutant samples were diluted (~0.5 mg/mL) in appropriate buffer conditions and run in duplicates in capillary tubes. Intrinsic fluorescence from tryptophan and tyrosine residues was recorded at 330 nm and 350 nm while heating the sample from 35°C to 95°C with a ramp rate of 30°C/min. The ratio of fluorescence (350/330 nm) and the inflection temperature (Ti) were calculated by Tycho NT. 6 software which provides Tm (melting point), the temperature at which 50% of measured protein is unfolded. Measurement results are summarized in the Supplementary Data Table 1 and thermographs shown in Supplementary Data Fig. 6.

Electron microscopy of DMFase.
Initial data of DMFase at 2-3 mg/ml at 50 mM NaCl were collected using Quantifoil holey carbon grids (R 0.6/1, Au 300 mesh) with blotting and freezing accomplished with a manual plunger in a cold room.
This initial data was collected with Titan Krios at MRC LMB, Cambridge and Falcon 2 detector in integration mode with the EPU software. Images showed that there were two populations and both these were picked and subjected to reference-free 2D classification. Initial models of both these populations were individually generated either with EMAN2(28) or by Stochastic gradient descent within RELION(29) with C2 symmetry imposed. The refinement of the dimer population obtained with integration mode gave a reconstruction of ~7 Å indicating that the protein is well behaved and high resolution can be obtained. All the data described here were collected on enzyme obtained by recombinant expression. Enzyme purified from native source showed similar maps (data not shown).
Subsequently data was collected with Falcon 3 detector in counting mode at 1.07 Å sampling and images were exposed for 60 seconds with a total accumulated dose of ~27 e/Å2 and dose fractionated into 75 frames, with each frame having a dose ~0.3 e. The images were grouped in to 25 frames resulting in ~1 e /frame and Unblur (30) was used for alignment. This initial processing was done in Relion 2.0.
The summed images were then used for automated particle picking with Gautomatch with template derived from previous data collection and CTF was estimated with Gctf (31). Particles were extracted with a box size of 320 pixels and subjected to 2D classification, 3D auto-refinement, per particle motion-correction, B-factor weighting and refinement. Further 3D classification was used to improve the quality of the maps by removing bad particles. This resulted in 3.4 Å map for the dimer and 3.8 Å for tetramer. We noticed that the views in the tetramer were not diverse perhaps resulting due to thinner ice that we preferred for the dimer during data collection.
During this period, we observed that the oligomeric state of the enzyme is affected by salt and subsequently we imaged DMFase in three different salt concentrations (no salt, 200 and 500 mM NaCl).
All these data described here were collected at the National CryoEM facility in Bangalore with Falcon 3 detector in counting mode at 1.07 Å sampling. The grids for these data sets were Au 300 mesh either R0.6/1 or 1.2/1.3 and made with Vitrobot Mark IV at 100% relative humidity and 18°C. The grids were blotted for 3.5 seconds. The data sets were processed with Relion 3.0 including the whole frame alignment and dose-weighting with Relion's own algorithm. Particle picking and CTF estimation was performed with Gautomatch and Gctf. Particles were extracted with a box size of 320 pixels and subjected to 2D classification, 3D auto-refinement, per particle CTF refinement, B-factor weighting with Bayesian polishing and refinement and subsequent 3D classification. The data sets of mutants Y440A and E521A were processed similarly. Local resolution of the maps was estimated with Resmap (32). Model building was performed with Coot(33) and the model was refined with phenix.real_space_refine (34). Though there are non-protein densities in the map, we have modelled only one water molecule at the active site and the rest have not been modelled. Details of the EM data and model quality are presented in Table 1.  (Table 2) and the structure was refined to a R-factor of 0.25 (R-free 0.28). The crystals belonged to space group P21 and a molecular replacement solution using the tetramer model obtained from the EM data was obtained. There are very few interactions between the two tetrameric structures observed in the crystals suggesting that this is not a higher-order oligomer.
Iterative cycles of model rebuilding and refinement were performed in COOT and Phenix.refine respectively. Complete data collection and refinement statistics is tabulated in Table 2. The buried area dimer interface between the one ab and the second ab is ~5000 Å 2 and there are 21 hydrogen bonds and at least 17 salt bridges and a total of about 500 non-bonded interactions (as calculated with PDBsum (38)). This interface between the two dimers is mostly hydrophobic, with a buried surface area of about 1830 Å 2 with less than 18 hydrogen bonds.

Synchrotron X-ray fluorescence spectroscopy (syncXRF).
Identity of the metal present in DMFase was validated by XRF of protein crystals. Protein crystals of protein were flash frozen in liq. N2 before sequentially washed four times in the cryoprotectant consist of 9:1 v/v of mother liquor and ethylene glycol. XRF scans were taken at PROXIMA-1 beamline at the SOLEIL synchrotron at 107K. X-ray fluorescence emission spectrum of the sample was collected by excitation at the selenium K edge (12,664 eV Fig. 16). Optimum relative activity measured for DMF at 3.1% v/v whereas at 16% of the same shown half residual activity. In high organic solvent medium (>40%), disintegration of catalytic structure leads to the loss of total activity even the remaining of residual quaternary structure.
Density functional theory calculations for strained iron in DMFase. We used a recently published method for calculating the electronic energies of the iron complex using density functional theory (39) with the B3LYP functional for singlet calculations, the UB3LYP method for doublet calculations, and the 6-311G(d) basis set. We used the 6-311G(d) basis set for all atoms as the difference in computation time was not significant. We assumed low spin state of the iron in both octahedral and tetrahedral calculations. The Fe 3+ charge state results in a doublet and the Fe 2+ charge state results in a singlet regardless of the tetrahedral or octahedral symmetry around the iron atom. Moreover, both tyrosine residues were assumed to be deprotonated, along with the glutamic acid residue. Therefore, the Fe 3+ case results in a neutral system whereas the Fe 2+ case results in an anionic state. The initial coordinates were obtained from the crystal structure and the beta carbon was replaced with a methyl group to sever as a single bond between two SP 3 carbons. For each geometry and charge state, two optimizations were performed: the first was constrained so that only the hydrogens and water atoms were allowed to move (strained state), and a second optimization were all atoms were allowed to move (relaxed state). For the doublet calculations where an unrestricted calculation was performed, the <S 2 > values was calculated before and after the annihilation of the first spin contaminant (Supplementary Data Table 2). The expected value of <S 2 > for system lacking any spin contamination is 0.75 for the doublet system. Since the <S 2 > after spin annihilation for both Fe 3+ symmetries are 0.7559 and 0.7503 (Supplementary Data   Table 2), which is relatively close to 0.75, we conclude that our calculations do not suffer from spin