Using site-saturation mutagenesis to explore mechanism and substrate specificity in thiamin diphosphate-dependent enzymes


  • Forest H. Andrews,

    1. Department of Chemistry and Chemical Biology, Indiana University-Purdue University Indianapolis, IN, USA
    Search for more papers by this author
  • Michael J. McLeish

    Corresponding author
    1. Department of Chemistry and Chemical Biology, Indiana University-Purdue University Indianapolis, IN, USA
    • Correspondence

      M. J. McLeish, Department of Chemistry and Chemical Biology, Indiana University-Purdue University Indianapolis, 402 North Blackford Street, Indianapolis, IN 46202, USA

      Fax: +1 317 274 4701

      Tel: +1 317 274 6889



    Search for more papers by this author


For almost 20 years, site-saturation mutagenesis (SSM) has been used to evolve stereoselective enzymes as catalysts for synthetic organic chemistry. Much of this work has focused on enzymes such as lipases and esterases, although the range is rapidly expanding. By contrast, using SSM to study enzyme mechanisms is much less common. Instead, site-directed mutagenesis is more generally employed, with a particular emphasis on alanine variants. In the present review, we provide examples of the growing use of SSM to study not only substrate and reaction selectivity, but also the reaction mechanism of thiamin diphosphate (ThDP)-dependent enzymes. We report that the use of SSM to examine the roles of the catalytic residues of benzoylformate decarboxylase gave rise to results that were at odds with earlier kinetic and structural studies using alanine substitutions and also questioned their conclusions. SSM was also employed to examine the long held tenet that a bulky hydrophobic residue provides a fulcrum by which the V-conformation of the ThDP cofactor is maintained. X-ray structures showed that ThDP stayed in the V-conformation even when the replacement residues were charged or did not contact the cofactor. We also summarize the results obtained when SSM was used to evolve new substrate specificity and/or enantioselectivity in ThDP-dependent enzymes such as benzoylformate decarboxylase, transketolase, 2-succinyl-5-enolpyruvyl-6-hydroxy-3-cyclohexene-1-carboxylate synthase and the E1 component of the 2-oxoglutarate dehydrogenase complex.




benzoylformate decarboxylase


combinational active site saturation


2-oxoglutarate decarboxylase from Escherichia coli


Escherichia coli MenD


TK from Escherichia coli




iterative saturation mutagenesis


2-succinyl-5-enolpyruvyl-6-hydroxy-3-cyclohexene-1-carboxylate synthase


Protein Data Bank


pyruvate decarboxylase


yeast Saccharomyces cerevisiae TK




site-saturation mutagenesis


thiamin diphosphate




Site-saturation mutagenesis (SSM) is a powerful technique that allows for the evolution of a target by substitution of the 20 naturally occurring amino acids at a predetermined position on the protein [1]. SSM is most effective when targeting residues approximately 6 Å or less from the substrate or cofactor [2]. By comparison, the more random evolution techniques, such as error-prone PCR [3] and gene shuffling [4], tend to identify residues more removed (5–15 Å) from the active site [5, 6]. Overall, SSM is capable of altering substrate specificity by changing highly, even completely conserved, active site residues. Consequently, it has been used extensively to engineer changes in substrate and/or enantiospecificity, to improve the thermostability, and to introduce new enzyme functionalities.

An important focus in the use of SSM is the minimization of the number of clones required for the successful evolution of a target. Towards this end, two approaches have been developed: iterative saturation mutagenesis (ISM) and combinational active site saturation (CAST) mutagenesis [7, 8]. The former identifies hot spots within the protein structure that will provide the targets for SSM. After screening or selection of the initial libraries, the clones with the most favourable features are utilized as a template for subsequent rounds of mutagenesis at other hot spots. This process continues until the desired outcome has been reached [7]. As with the ISM approach, CAST works under the assumption that multiple residues within the protein must be modified for optimization of the target. However, in the CAST approach, active site hot spots of two to five residues are selected and simultaneously mutated [8]. In some cases, it has been posible to integrate the two techniques so that favourable clones generated by CAST are further enhanced by ISM [9]. Unfortunately, these approaches are both complicated by the fact that the genetic code is redundant. Thus, a primer designed to incorporate nucleotides equally throughout a codon (NNN library) will result in 64 numerations at one position, and require approximately 200 colonies to be screened to ensure that every amino acid is covered [10]. Using a NNK library, in which the the last position is limited to either guanine (G) or thymine (T), will still provide all 20 amino acids but reduces the number required to screen by half. Of course, these numbers rise exponentially when more than one position is mutated.

In an attempt to further minimize redundancy, the NDT library was developed [10]. This library comprises 12 codons that provide 12 amino acids, equally divided into hydrophobic (Ile, Leu, Cys, Gly, Val), aromatic (Tyr, Phe, His), polar (Asn, Ser) and charged (Asp, Arg) residues. Although libraries generated from this approach are not truly saturating, the number of clones needing to be screened was greatly reduced. Most recently, the 22c-trick library was developed, in which the NDT, VMA, TTG and ATG primers were mixed in a 12 : 6 : 1 : 1 ratio [11]. This results in an equal distribution of all 20 amino acids from only 20 codons, and provides a means of obtaining a fully optimized library that can be relatively easily screened. Excellent summaries of the various methods and applications can be found in recent reviews [12-14].

Given that SSM has been said to combine the benefits of directed evolution and rational design [15], it is surprising that it rarely used as a tool for studying enzyme mechanisms. Mechanistic studies routinely use site-directed mutagenesis, although that approach can be based on a predetermined concept of the mechanism. Consequently, attention is often focused on a limited range of amino acid replacements, with alanine being the most commonly utilized. More recently, it has become clear that an unbiased examination of potential catalytic residues can provide unexpected results. For example, during a study of ‘old yellow enzyme’ from Saccharomyces pasteurianus, it was predicted that replacement of Trp116 with smaller amino acids would improve the ability of the enzyme to accept larger substrates [16]. Instead, when SSM was used to generate all possible replacements for Trp116, it was found that the W116F and W116I variants allowed substrates to bind in an orientation such that the stereochemistry of the products was the opposite to that of the wild-type enzyme [16]. In a more mechanistic study, SSM was used to study the role of a highly conserved cysteine residue (Cys298) in CaHydA, an [Fe-Fe]-hydrogenase from Clostridium acetobutylicum. On the basis of structural similarity, it was anticipated that serine would be a functional replacement but, after SSM, only the C298D variant showed activity. This was attributed to the ionizability of the aspartate and suggested a novel role for Cys298 in a proton transfer pathway [17]. It was noted that the SSM approach is required to identify such an unexpected structural replacement [17]. Similar unexpected findings have also been observed when SSM has been applied to thiamin diphosphate (ThDP)-dependent enzymes.

ThDP-dependent enzymes

ThDP (Fig. 1), the biologically active form of vitamin B1, is used as a cofactor by enzymes involved in a wide variety of metabolic pathways [18]. The role of ThDP in metabolic processes is described in more detail by Bunik et al. (unpublished results). Suffice to say, ThDP-dependent enzymes are required to catalyze a diverse range of reactions. The formation and breakdown of carbon–carbon bonds adjacent to a carbonyl group are the most common reactions, although C–N, C–O and C–S bond synthesis is also catalyzed [18].

Figure 1.

Structure and activation of ThDP.

The chemical structure of ThDP has three components; a six-membered aminopyrimidine ring, a five-membered thiazolium ring and a diphosphate (pyrophosphate) tail (Fig. 1). Unlike cofactors such as NADH, ThDP is a true catalyst and remains bound to the enzyme throughout the catalytic cycle. ThDP itself is able to catalyze many of these reactions in solution, albeit extremely slowly [19]. The mechanism requires formation of the ThDP C2-carbanion or ylide, which subsequently acts as a nucleophile adding to the α-carbonyl of a 2-keto acid substrate [19-23]. The deprotonation of C2 is made possible by the cofactor being held in a V-conformation in which the N4′-imino group is positioned next to the C2 of the thiazolium ring. This thermodynamically unstable conformation [24] appears to be supported by the bulky hydrophobic residue located beneath the two rings that is seen in all X-ray structures of ThDP-dependent enzymes [25-29]. In this manner, the cofactor is set up for a transfer of the proton from the C2 position of the thiazolium ring to the 4′-iminoThDP tautomer [25] (Fig. 1). Additionally, the cofactor is activated by the presence of a conserved glutamic acid residue located adjacent to N1′ of the pyrimidine ring. As shown in Fig. 1, a hydrogen bond to the ionized glutamate facilitates formation of the imino tautomer and, ultimately, the formation of the ylide necessary for enzyme activity [20, 21, 30, 31]. With the exception of glyoxylate carboligase [32, 33], the glutamate is conserved throughout the family [34] and its replacement reduces enzyme activity by more than four orders of magnitude [22].

Based on structural similarity, the ThDP-dependent enzymes have been broadly classified into five groups [34]. The largest of these, the decarboxylase group, contains pyruvate decarboxylase (PDC; EC which is often described as the archetypal ThDP-dependent enzyme [34]. The PDC catalytic cycle is shown in Fig. 2. As noted earlier, the first step in catalysis is similar in all ThDP-dependent enzymes (i.e. the ThDP ylide attacks the carbonyl carbon of the various substrates; in this instance, pyruvate). This results in a tetrahedral intermediate (2α-lactylThDP) that loses carbon dioxide to form the 2α-carbanion-enamine intermediate, often termed the ‘activated aldehyde’ [35-37]. Again, this intermediate is common to all ThDP enzymes. Decarboxylation requires the enamine to be protonated, forming a second tetrahedral intermediate (hydroxyethylThDP), which breaks down to release acetaldehyde. However, if the enamine acts as a nucleophile towards an acceptor substrate such as an aldehyde, ketone or a 2-keto acid, carboligation occurs (Fig. 2). These carboligation reactions provide chiral α-hydroxy ketones that can be used as versatile building blocks, particularly in the pharmaceutical industry [38, 39]. These applications are discussed in greater detail by Hailes et al. (unpublished results).

Figure 2.

Mechanism of reactions catalyzed by PDC and BFDC.

It is notable that the mechanism contains several steps requiring the assistance of enzymic acid-base catalysts. X-ray structures show that the active sites of the PDCs from Saccharomyces cerevisiae (ScPDC) and Zymomonas mobilis (ZmPDC) both contain a pair of acidic residues. In addition, there is a pair of histidine residues, located next to each other on the polypeptide chain, that forms the ‘HH’ motif common to many ThDP-dependent decarboxylases [40]. A series of site-directed mutagenesis studies on the ionizable residues of ZmPDC confirmed their importance in the catalytic cycle [41-44]. Similar but more detailed studies on ScPDC added a new dimension in that they showed that the two acidic residues, Asp28 and Glu477, were more involved in post-decarboxylation events, whereas the two histidine residues primarily assisted in those steps through decarboxylation [45-47]. There are now several X-ray structures available for the ‘HH’ motif enzymes, and the residues likely to be involved catalysis, as well as in substrate specificity, are well known [40]. However, attempts to alter their substrate specificity using rational design coupled with site-directed mutagenesis have been generally unsuccessful [40, 48-50].

Benzoylformate decarboxylase

Benzoylformate decarboxylase (BFDC; EC, the penultimate enzyme in the mandelate pathway, catalyzes the non-oxidative decarboxylation of benzoylformate to yield carbon dioxide and benzaldehyde, thereby enabling some microorganisms to utilize R-mandelate as their sole carbon source [51-54]. Similar to PDC, BFDC is a member of the decarboxylase group and the two enzymes catalyze very similar reactions, differing only in specificity for a methyl or phenyl moiety (Fig. 2). Therefore, it was expected that their active sites/catalytic residues would be similar, despite BFDC sharing only 20–25% sequence identity with the PDCs. However, superposition of the X-ray structures of BFDC and PDC showed significant differences (Fig. 3). For example, although both active sites contained two histidine residues situated in similar spatial positions, those in BFDC were located on different subunits in the catalytic dimer [27]. Furthermore, in addition to the conserved glutamate, the PDC active site contained two acidic residues, whereas the BFDC active site contained a single serine residue.

Figure 3.

Comparison of the ionizable residues in the active sites of (A) PDC and (B) BFDC. Images were generated with PyMOL [111] using coordinates from Protein Data Bank (PDB) 2VBI (PDC) and 1BFD (BFDC).

Mechanistic studies on BFDC

The three ionizable residues in the active site of BFDC were identified as Ser26, His70 and His281 [27]. An X-ray structure of BFDC in complex with a substrate analogue inhibitor, R-mandelate, provided a clue regarding the roles of the potential catalytic residues [55]. It showed that carboxylate of the inhibitor was in an apparent hydrogen bond with Ser26, whereas one of the histidine residues, His70, was positioned to act as a proton donor in the formation of the first tetrahedral intermediate, mandelylThDP (Fig. 2). Mutating each of the ionizable residues to alanine resulted in approximately 600-fold decreases in kcat/Km values for both the S26A and H281A variants, and a 4000-fold decrease in kcat/Km for the H70A variant (Table 1). In addition, consistent with its proposed role in substrate binding, the S26A variant showed a 30-fold increase in Km for benzoylformate and a 100-fold increase in Ki value for R-mandelate [55].

Table 1. Steady-state kinetic parameters for wild-type and variant BFDC.a Data are taken from Yep et al. [57]
VariantKm (mm−1)kcat (s−1)kcat/Km (mm−1·s−1)
  1. In parenthesis is the fold decrease from wild-type.

Wild-type0.27 ± 0.02 (1)320 ± 4 (1)1180 (1)
S26A7.8 ± 1.6 (29)15 ± 1 (21)1.9 (620)
S26L1.2 ± 0.1 (4)132 ± 4 (2)110 (11)
S26M0.8 ± 0.08 (3)26 ± 1 (12)32 (37)
H70A1.5 ± 0.3 (6)0.46 ± 0.08 (690)0.3 (3950)
H70L0.38 ± 0.03 (1.4)14 ± 4 (23)36 (33)
H70F0.85 ± 0.05 (3)4.5 ± 0.6 (70)5.3 (240)
H281A1.2 ± 0.3 (4)2.1 ± 0.19 (150)1.7 (690)
H281F0.54 ± 0.3 (5)65 ± 1 (5)120 (10)
H281Y0.49 ± 0.06 (2)6.9 ± 0.2 (45)14 (80)

An earlier study had shown that an alternative substrate, p-nitrobenzoylformate, gave rise to two transients in a rapid-scan UV spectrum [56]. These allowed direct observation of the formation of mandelylThDP, as well as the formation and breakdown of the enamine. As part of that study, individual rate constants were determined for the wild-type BFDC reactions [56]. Similar measurements were now made with the S26A, H70A and H281A variants [55]. The results unambiguously implicated His70 in the predecarboxylation steps and in product release, implying that His70 was likely to act as the general acid and general base, B1H+ and B1, respectively (Fig. 2). Moreover, His281 was likely to be B2H+, the donor in the protonation of the enamine, whereas Ser26 appeared to be involved in all steps in the reaction [55].

Using SSM to study the mechanism of BFDC: the roles of the ionizable residues

The standard assay for the ThDP-enzyme catalyzed decarboxylation of 2-keto acids monitors the depletion of NADH caused by the reduction of the product aldehyde by the coupling enzyme, alcohol dehydrogenase [49]. As part of a project aimed at rapidly identifying BFDC variants able to accept alternative substrates, a microplate version of the assay was developed, along with a saturation mutagenesis protocol [57]. The H281A variant was chosen to validate the protocol because it showed reproducible activity in the microplate screening assay but had kinetic properties that were clearly different to those of the wild-type enzyme (Table 1). This combination of properties allowed us to screen for variants whose activity was restored. Predictably, when SSM was carried out using H281A as the template, several colonies were observed with wild-type BFDC activity. However, unexpectedly, several more were found to have intermediate activity levels. Sequencing identified these as the H281F, H281W, H281Y and H281Q variants. Kinetic analysis of purified enzymes (Table 1) showed that the replacements had only a marginal effect on Km and less than 50-fold reduction in kcat values. Indeed, the kcat value for the H281F variant was reduced only five-fold from that of the wild-type BFDC, which was completely inconsistent with the earlier prediction that His281 was the donor for the protonation of the enamine intermediate [55, 57].

These unforeseen results led to SSM being carried out on Ser26 and His70 and, once again, the results were at odds with the earlier predictions (Table 1). For example, Ser26 could be replaced with methionine with a 12-fold decrease in kcat and only a minor effect on the value of Km. Replacement with leucine had even less effect on kcat. Given that X-ray structures clearly show that Ser26 forms a hydrogen bond with the carboxylate of the substrate [55, 58], it is difficult to rationalize how these bulky hydrophobic residues replace Ser26 without disrupting substrate binding. The H70A variant exhibited the largest changes in catalytic constants with almost 4000-fold decrease in kcat/Km value and, as with His281, His70 was predicted to play a role in proton transfer. Yet the H70L variant showed only a 20-fold decrease in kcat, with virtually no effect on Km. Although the H70F variant was not as active as H70L, it was still an order of magnitude faster than H70A (Table 1). Regardless of the precise numbers, both phenylalanine and leucine are incompatible with the proton donor-acceptor role previously ascribed to His70 and necessitate a re-evaluation of Fig. 2.

Taken together, the results clearly identify an important role for SSM in the study of enzyme mechanisms. The effects of alanine substitutions are used routinely to infer roles for individual residues [59] and yet the study described above shows that the predicted roles of His70 and His281, in particular, are patently incorrect.

Further mechanistic studies on BFDC: the role of the fulcrum residue

As shown in Fig. 1, the first step in catalysis by all ThDP-dependent enzymes is the deprotonation of the C2 carbon by the 4′-imino group to form the ylide [19, 22, 23]. X-ray structures show that, in all cases, the 4′-imino group is within 3.5 Å of C2 (i.e. well within the requisite distance for deprotonation) [25-29]. Furthermore, they suggest that the V-conformation is assisted by the presence of a ‘fulcrum’ residue located directly below the cofactor. In BFDC, this residue is Leu403, and the V-conformation results in a N4′–C2 distance of 3.1 Å [27]. An early study used site-directed mutagenesis to make selected variants of the analogous residue, Ile415, in ScPDC [29]. The results enabled the prediction that the fundamental features of the V-conformation will be independent of the identity of the residue at the fulcrum position and, furthermore, each enzyme will evolve to use a preferred amino acid [29].

In light of the unexpected results with the putative catalytic residues, it appeared that SSM could be used to provide an unbiased interrogation of the importance of the fulcrum position in BFDC. This proved to be the case because more than half the L403X colonies screened had at least 10% of wild-type activity [60]. In toto, twelve active variants were purified and characterized, with kcat/Km values found to range over three orders of magnitude. The structures of several variants were also determined by X-ray crystallography. The net results of the study are possibly best exemplified by the L403E variant. Glutamate is certainly not hydrophobic, nor is it particularly bulky, yet the kcat value of the L403E variant was only 18-fold lower than that of wild-type BFDC. Furthermore, its structure was typical in that there was very little change in the V-conformation of the cofactor or, indeed, the enzyme as a whole (Fig. 4A). With the exception of the L403F variant (below), the average rmsd of the cofactors was approximately 0.2 Å [60].

The L403F variant differed in that the Km value of the substrate increased almost ten-fold, and that the cofactor was displaced with the thiazolium ring away so that the rmsd between the cofactors was 0.98 Å (Fig. 4B). Consistent with the increase in Km values shown by L403F, there is also some movement in Thr377, a residue known to be involved in substrate specificity [61]. Overall, the study supported the hypothesis that the V-conformation of ThDP will be observed regardless of the identity of the residue at the fulcrum position. It was also apparent that a hydrophobic residue, although not essential, is necessary for optimal activity.

The utility of SSM for mechanistic studies was highlighted by the fact that BFDC showed good activity with unexpected Leu403 replacements such as glutamate and lysine. Conversely, the Ile415 variants chosen for side-directed mutagenesis of ScPDC did not include any non-intuitive residues [29]. Although, undoubtedly, BFDC appears to be more tolerant of substitutions than ScPDC, this can only be said with certainty because all possibilities have been sampled [60].

Site-saturation mutagenesis and the interconversion of BFDC and PDC

In addition to their decarboxylase activities, BFDC and PDC also have the ability to carry out stereospecific carboligation reactions, albeit with a different substrate spectrum. BFDC catalyzes the conversion of benzaldehyde and acetaldehyde into R-benzoin and, predominantly, S-2-hydroxypropiophenone (Scheme 1). The products of PDC, on the other hand, are R-phenylacetylcarbinol, an intermediate in the commercial synthesis of ephedrine [62], and S-acetoin (Scheme 1). In light of the industrial interest in stereospecific products, there has been considerable recent work on the use of ThDP-dependent enzymes in organic synthesis. More details are provided in recent reviews [18, 63-65], as well as that by Hailes et al. (unpublished results).

Scheme 1.

Stereospecific carboligation reactions catalyzed by PDC/BFDC.

In an attempt to explore the evolutionary relationship between the two enzymes, as well as to expand the substrate spectrum, an effort was made to interconvert BFDC and PDC [48]. Superposition of the active sites showed that the BFDC residues, Ala460 and Phe464, had direct counterparts in the PDC residues, Ile472 and Ile476. Site-directed mutagenesis was used to prepare single- and double-variants of both enzymes. The PDCI472A and BFDCA460I variants proved to be excellent 2-ketohexanoate decarboxylases, although the interconversion itself was essentially unsuccessful [48].

These experiments indicated that a different approach was required. It was noted earlier that SSM is most effective for residues located within 5–6 Å of the substrate [2] and it appeared that this approach could be used for BFDC. Examination of the active site of the BFDC:R-mandelate complex identified 12 residues within 5 Å of R-mandelate that had structural counterparts in PDC (Fig. 5). A microplate screen was developed to identify BFDC variants with altered substrate specificity [57] and each of the 12 residues was subjected, in turn, to SSM. The variants were screened for activity with a series of 2-keto acid substrates and the results plotted in radar plots [61]. As shown in Fig. 6, such plots allow for the simultaneous assessment of the overall activity of a variant (distance to centre) and its substrate utilization pattern (shape of the plot). The T377L and A460Y variants showed clearly different substrate patterns (Fig. 6A–C) and improved use of pyruvate. A second round of SSM was carried out using Thr377 and Ala460 variants as templates and, regardless of the template, optimal PDC activity was observed with the T377L/A460Y variant. Steady-state kinetic analyses of these variants showed that much of the enhanced PDC activity resulted from improvements in Km value for pyruvate. Indeed, the T377L/A460Y variant had a Km for pyruvate of 2 mm, which is close to that of ZmPDC (1 mm) [48] and the S0.5 value for ScPDC (1.7 mm) [45]. Consistent with Thr377 and Ala460 not being in contact, the effects of the two mutations on kcat/Km for pyruvate were additive [66, 67] and, overall, the SSM approach led to a four orders of magnitude change in the ratio between pyruvate and benzoylformate utilization [61].

Figure 4.

Overlay of the structures of wild-type BFDC (green), as well as (A) L403E (orange) and (B) L403F (cyan). The L403E variant shows little change in the position of the cofactor and Thr377. By contrast, the L403F variant, although maintaining the overall V-conformation, shows distortion in the thiazolium ring of the cofactor and movement in Thr377. Generated using data from PDB 1YNO (wild-type), 4JD5 (L403E) and 4GP9 (L403F).

Figure 5.

Residues found within 5 Å of R-mandelate bound by BFDC identified as targets for SSM. Generated using coordinates from PDB 1MCZ.

Figure 6.

Radar plots depicting activity single colonies screened using the decarboxylase activity adapted for a microplate reader. The axes denote the activity of whole cell extracts (mAU·min−1) and have been scaled from 0 to 1200 mAU·min−1. Shown here are plots from single colonies containing plasmids encoding (A) WT, (B) A460I, (C) T377L and (D) A460I-T377L variants of BFDC. Reproduced with permission [57].

Combined directed evolution and SSM of BFDC

As noted earlier, BFDC has the ability to carry out stereospecific carbon–carbon bond formation [68], with products including various 2-hydroxy ketones and benzoin derivatives [69-71]. To extend the utility of the reaction, BFDC has been subjected to a combination of directed evolution and site-directed mutagenesis, as well as SSM [72, 73]. This process necessitated the development of a novel screening assay that used the reduction of 2,3,5-triphenyltetrazolium choride to formazine by 2-hydroxyketones. Formazine has an intense red colour, which enables the reaction to be monitored at 510 nm in 96-well plates [72]. The initial screens identified the L476Q and L476P as variants with improved carboligase activity in organic solvents [72], whereas L476Q and M365L/L461S were now able to utilize O-substituted benzaldehyde derivatives as donor substrates [73]. Subsequently, SSM was carried out on Leu476. Ten variants were identified as having up to a five-fold improvement in carboligase activity. Several of these variants also showed improvement in enantioselectivity in 20% dimethylsulfoxide, the condition best suited for synthesis applications [72].


Transketolase (TK; EC is a metabolic enzyme found in the pentose phosphate pathway in all organisms, as well as in the Calvin cycle in plants [74, 75]. It functions as a highly stereospecific carboligase catalyzing the transfer of the 2-carbon ketol from xylulose 5-phosphate to either ribose 5-phosphate or erythrose 4-phosphate (Scheme 2).

Scheme 2.

(A) The natural reactions catalyzed by TK. (B) The synthetically significant reaction of TK in which HPA is the source of the ketol group.

As with many ThDP-dependent enzymes, the mechanism of TK has been studied in great detail [76-78]. Formation of the ylide and subsequent attack on the carbonyl carbon of the keto sugar generates the first tetrahedral covalent intermediate. Cleavage of the C–C bond adjacent to the carbonyl resulting in the formation of the first product, an aldose, and the covalent intermediate, α,β-dihydroxyethylThDP. The carbanion of α,β-dihydroxyethylThDP attacks the carbonyl of the acceptor substrate, also an aldose, ultimately resulting in the extension of the acceptor's carbon skeleton by two carbons (Fig. 7).

Figure 7.

Mechanism of the TK-catalyzed conversion of xylulose 5-phosphate to fructose 6-phosphate. For clarity, active site acids and bases are not shown.

Unlike the BFDC and PDC, TK catalyzes a reversible reaction. To overcome this equilibrium, β-hydroxypyruvate (HPA) is often used as the ketol donor. With HPA, the formation of the activated aldehyde intermediate is coupled to the release of carbon dioxide (Scheme 2), making the reaction effectively irreversible and driving the equilibrium towards the product side [79, 80]. The use of HPA as the donor has been critical to the development of TK as an asymmetric catalyst [81, 82].

Although TK exhibits a strong preference for phosphorylated α-hydroxylated aldehydes as acceptor substrates, TK can also use a variety of nonphosphorylated aldehydes as acceptors, albeit at greatly reduced efficiency [83-86]. This innate activity provides an excellent platform for the directed evolution of TK. Recently, SSM was used to evolve TK from Escherichia coli (EcTK) to accept nonphosphorylated substrates [87]. The crystal structures of several TK enzymes in complex with phosphorylated substrates have been solved [78, 88, 89], and so these could be used as a guide. Initially, ten first-shell residues were identified as being (a) located within 4 Å of the erythrose 4-phosphate acceptor when bound to yeast TK [88] and (b) highly conserved across both bacterial and yeast TKs [87, 88]. In addition, of the 52 residues found within 10 Å of ThDP, the ten residues possessing the highest phenotypical variation across putative yeast and bacterial TKs were selected as the second-shell residues [87]. Both sets of residues are shown in Fig. 8. Using HPA and glycolaldehyde as the donor and acceptor substrates, respectively, libraries were first screened for erythrulose formation using a high-throughput HPLC assay [90]. In total, six libraries (A29X, S188X, D259X, R358X, H461X and R520X) provided variants displaying improved activity towards glycolaldehyde [88].

Figure 8.

Active site of E. coli TK (PDB 1QGD). First-shell residues (see text) are shown in cyan, whereas second-shell residues are shown in magenta.

Based on X-ray structures, the side-chains of Arg358, His461 and Arg520 appear to coordinate the phosphate of these substrates [78]. Therefore, it was not unexpected that variants in those positions showed improved activity with glycolaldehyde. On the surface, the improvement shown by variants such as R358P, R358I, R520V and R520P would suggest that substitution of a nonpolar residue for either Arg358 or Arg520 is a defining factor. However, R520G and R520stop variants also showed improvement, whereas, overall, the H461S variant was the most efficient at using glycolaldehyde for erythrulose formation, with an almost five-fold increase in turnover compared to wild-type TK. The R520stop mutation results, inter alia, in the deletion of the C-terminal domain. Although this domain does not affect cofactor binding or formation of the active site [91], it had been described as a potential regulatory element [25]. Could this be another case of SSM providing unexpected results? Taken as a whole, it was concluded that the increased catalytic activity shown by first-shell variants was likely a result of the increased access of substrates to the active site [87].

Screening of the second-shell libraries identified variants of Ala29, Ser188 and Asp259 with an improved ability to utilize glycolaldehyde. More precisely, the A29D, A29E, S188Q, D259A and D259G variants all exhibited a two- to three-fold increase in specific activity under the conditions tested. Athough it was possible to rationalize the higher activity of the Ser188 and Asp259 variants, it was noted that Ala29 is in direct contact with the terminal phosphate of ThDP and, conceivably, could influence the catalytic activity of TK through electrostatic effects. It was also noted that none of the substitutions resulting in increased activity towards glycolaldehyde were naturally occurring, even at highly variable sites [87].

To further examine the range of potential nonphosphorylated products, the EcTK libraries were screened against aliphatic, aromatic and heteroaromatic aldehydes using HPA as the ketol donor. Accordingly, a colorimetric screen [92] was employed and 13 variants from seven libraries (H26X, A29X, D259X, R358X, H461X, D469X and R520X) were shown to possess enhanced activity towards propanal. Although most of these mutations had also resulted in improved glycolaldehyde activity, some variants identified from the H26X and D469X libraries actually led to a decrease in glycolaldehyde activity [87, 93]. Both His26 and Asp469 interact with the glycolaldehyde C2 hydroxyl group, which is absent in propanal [89]. The optimal variant for the propanal reaction was D469T, which saw a three-fold decrease in Km for propanal, as well as an 8.5-fold increase in Vmax, compared to wild-type TK. These improved kinetic parameters were attributed to a more productive orientation of propanal in the active site of the D469T variant [93]. Interestingly, the D469Y variant showed a 64-fold preference for propanal over glycolaldehyde. This compares favourably with the approximately nine-fold preference displayed by D469T and suggests there is still room for improvement in selectivity.

Subsequently, a chiral GC assay was used to determine the stereochemistry of 1,3-dihydroxypentan-2-one, the product of the TK-catalyzed HPA/propanal reactions [94]. Although the wild-type enzyme exhibited minimal S-selectivity (58% ee), the D469E variant was highly S-selective (90% ee). Conversely, the H26Y variant was found to be highly R-selective (88% ee). When longer chain aliphatic aldehydes were used for the acceptor, the D469T showed even higher S-enantioselectivity. The R-selectivity for the H26Y variant also varied (from 78–92%) depending on the length of the aliphatic chain [85].

Wild-type TK was unable to use benzaldehyde the acceptor, whereas replacement of Asp469 with lysine, threonine or glutamate resulted in variants that were able to use benzaldehyde, as well as other aromatics such as furaldehyde and thienaldehyde, as acceptor substrates [95]. Furthermore, the L469X variants changed enantiospecificity and generated the R-hydroxyketone product [95]. Although unexpected, this is not unprecedented because BFDC is known to switch from being an S-selective to an R-selective carboligase as the size of the acceptor aldehyde increases [71, 96].

So far, the work described was carried out on the E. coli enzyme. More recently, SSM was used to enhance the activity of yeast S. cerevisiae TK (ScTK) towards long-chain polyol aldehydes [86]. In the first instance, docking experiments with d-ribose 5-phosphate were carried out using the ScTK crystal structure [97]. Eleven residues were identified as being involved in the binding and stabilization of the substrate. Based, in part, on the previous work of Hibbert et al. [87, 93], six of those residues, His28, His261, Asp475, His479 and Arg526, were selected for SSM. Of the more than 800 colonies screened, only four variants were found to be significantly more active towards polyols. It is notable that three of the four optimal variants, R526N, R526Q and R526Y, resulted from the substitution of Arg526 with a neutral polar residue. By contrast, when SSM was carried out on the equivalent residue in EcTK, Arg520, the substitutions that led to the greatest enhancement towards glycolaldehyde were not by polar residues but, instead, by smaller hydrophobic residues. The reasons for this difference remain to be established. These single ScTK variants were utilized as templates for another round of mutagenesis using primers based on residues adjacent to (or predicted to interact with) Arg526. Here, the majority of the double variants saw a decrease in activity. The R526Q/S525T variant was the exception, demonstrating an almost four-fold improvement in specific activity towards nonphosphorylated polyols.

2-Oxoglutarate decarboxylase

The 2-oxoglutarate dehydrogenase complex catalyzes the rate-limiting step in the citric acid cycle, the conversion of 2-oxoglutarate (2KG) to succinyl-CoA [98]. The complex comprises a ThDP-dependent enzyme, 2-oxoglutarate decarboxylase (E1o; EC, a lipoylated succinyl transferase (E2o; EC and a dihydrolipoyl dehydrogenase (E3; EC Recently, the X-ray structure of 2-oxoglutarate decarboxylase from E. coli (EcE1o) was determined [99]. Refinement of the diffraction data revealed the presence of additional electron density, likely corresponding to oxaloacetate, in the active site. Furthermore, the distal carboxylate group of oxaloacetate appeared to be coordinated between His260 and His298, suggesting that these two residues were likely involved in the binding and stabilization of 2KG (Fig. 9). Alignment of putative 2-oxoglutarate decarboxylase sequences indicated that the two histidines were highly conserved, again hinting that those residues are likely required to ensure the correct orientation of the substrate for decarboxylation.

Figure 9.

Active site of EcE1o. Additional electron density within the active site of EcE1o was modelled as oxaloacetate. His298 and His260 appear to coordinate the distal carboxylate group of oxaloacetate. Reproduced with permission [99].

To explore this hypothesis, site-directed mutagenesis was carried out, generating the H260A and H298A variants [99]. Compared to wild-type EcE1o, both demonstrated greatly reduced activity. Additionally, although oxaloacetate is known to act as a competitive inhibitor [100] for wild-type EcE1o (amongst others), no inhibition was observed with either H260A or H298A [99]. The body of evidence led to the suggestion that His260 and His298 were likely to be involved in a hydrogen-bonding network that coordinates the distal carboxylate group of 2KG.

More recently, as part of their efforts to synthesize acyl-CoA analogues, Shim et al. [101] undertook a re-examination of the role of His260 and His298 in 2KG recognition.Accordingly, SSM was carried out on His260 and His298 and three libraries were constructed. The libraries, H260X, H298X and H260X/H298X, were subsequently screened with 2KG, as well as 2-ketovalerate, a non-natural substrate, lacking the 5-carboxylate of 2KG [101]. The reaction was monitored in a microplate by the disappearance of the blue colour at 600 nm upon reduction of 2,6-dichlorophenolindophenol [101].

Analysis of the H298X library revealed several variants having increased activity towards 2-ketovalerate. Compared to the wild-type EcE1o reaction with 2-oxo-valerate, the H298L, H298V and H298D variants exhibited three-, 13- and 38-fold increases in kcat/Km values, respectively. Intriguingly, replacing His298 with a leucine, a residue incapable of hydrogen bonding, resulted only in a seven-fold decrease in overall catalytic efficiency. This result calls into question the proposed role of His298 in a hydrogen-bonding network [101]. The H260X library, on the other hand, did not produce any variants with improved activity towards 2-oxovalerate. Indeed, only the H260E variant retained any significant level of activity with 2-oxoglutarate. Even then, substituting His260 with glutamate resulted in more than two orders of magnitude increase in Km and, concomitantly, kcat decreased by almost two orders of magnitude. Overall, the substitution resulted in a four orders of magnitude decrease in kcat/Km, which, on the surface, would indicate that this position is critical for the proper orientation of 2KG in the active site of EcElo. Then again, the H260E/H298N variant, identified from the H260X/H298X library, was the second most active variant with 2KG. This showed only a 17-fold decrease in kcat/Km, and so there is still some doubt about the importance of both residues.

2-Succinyl-5-enolpyruvyl-6-hydroxy-3-cyclohexene-1-carboxylate synthase

The ThDP-dependent enzyme, 2-succinyl-5-enolpyruvyl-6-hydroxy-3-cyclohexene-1-carboxylate (SEPHCHC) synthase (MenD; EC, catalyzes the second step in the menaquinone biosynthetic pathway, the conversion of isochorismate to SEPHCHC [102, 103]. Initially, it was assumed that MenD catalyzed the formation of 2-succinyl-6-hydroxy-2,4-cyclohexadiene-1-carboxylate [102]. It is now understood that the formation of the latter is either catalyzed by MenH or occurs via a spontaneous release of pyruvate from SEPHCHC (Fig. 10) [103]. As with the decarboxylases and E1o, MenD initially catalyzes the decarboxylation of the 2-keto acid, 2-KG. However, MenD is unusual in that, in a Stetter-like reaction, the resultant carbanion/enamine adds to the β-carbon of its second substrate, isochorismate [102, 103]. In vitro, MenD can use a variety of aldehydes as acceptor substrates to provide a range of mixed acetoin-like products [104]. However, similar to most ThDP-dependent enzymes, the carboligation reaction catalyzed by MenD is stereospecific, and only the R-configuration of these 1,2-addition products is readily obtainable [40, 104].

Figure 10.

MenD catalyzed formation of SEPHCHC from 2-ketoglutarate and isochorismate. SEPHCHC is subsequently converted to 2-succinyl-6-hydroxy-2,4-cyclohexadiene-1-carboxylate (SHCHC), either spontaneously or in a reaction catalyzed by MenH [(1R, 6R)-2-succinyl-6-hydroxy-2,4-cyclohexadiene-1-carboxylate synthase].

Studies initially carried out on BFDC identified a structural element, now known as the S-pocket, which allows the antiparallel arrangement of the donor and aldehyde acceptor necessary for S-selectivity [37]. The most important S-pocket residues in E. coli MenD (EcMenD) were identified on the basis of its X-ray structure [105] and docking studies using benzaldehyde [106]. The S-pocket was then enlarged by converting two residues, Ile474 and Phe475 (Fig. 11), to glycine and alanine, respectively. The resultant I474A/F475G variant afforded 5-hydroxy-4-oxo-5-phenylpentanoate from 2-KG and benzaldehyde with an ee of 75% (S) rather than the usual > 99% (R) [106]. Further analysis of the X-ray structure of EcMenD identified three other residues (Ser32, Leu478 and Arg538) that may be influencing enantioselectivity (Fig. 11). SSM using NDT libraries [12] was subsequently carried out at each of these three positions. The I474A/F475G variant was used as the template and the reaction was monitored using the 2,3,5-triphenyltetrazolium choride assay, as described previously for the carboligation reactions of BFDC [72]. Unexpectedly, from these libraries, only the I474A/F475G/R395Y variant was found to have an improved selectivity, with an ee of 85% (S). Unfortunately, this came at a conversion rate of 7% compared to > 99% wild-type EcMenD [107]. The conversion rate and stereoselectivity was found to be substituent dependent. For example, when reacted with wild-type EcMenD, 3-methoxybenzaldehyde provided > 99 (R) product at a 99% conversion efficiency. Conversely, reaction with the triple mutant provided 96% (S) product at better than 20% conversion [107].

Figure 11.

Residues identified as comprising the S-pocket of E. coli MenD (PDB 2JLC).

An interesting observation was the rationale for the effect of the R395Y mutation. Generally, it is assumed that improving the ee of the S-enantiomer will be achieved by making the antiparallel orientation of the acceptor substrate (benzaldehyde in this case) energetically more favourable [37]. However, in this instance, it is considered that the S-selectivity of the triple variant is more a result of the destabilization of the parallel orientation of benzaldehyde. Arg395, located on the periphery of the S-pocket, has been suggested to contribute to active site binding and the proper orientation of isochorismate, the native substrate [108, 109]. Evidence for this was provided by the R395A mutation, which resulted in an increased in Km value for isochorismate [108, 109]. In the non-native carboligation reactions, Arg395 is considered to form a cation-π interaction with benzaldehyde, thereby stabilizing the parallel orientation. Loss of this interaction, coupled with the increased steric hindrance provided by the tyrosine side-chain, likely destabilizes the normal R-pathway, resulting in a variant with excellent S-selectivity [107].


Over the past few years, there has been heightened interest in ThDP-dependent enzymes. Although this is primarily from an organic synthesis perspective, the fundamental inter-relationship between substrate specificity and catalytic mechanism has meant that the latter could not be ignored. In this review, we have endeavoured to provide a detailed summary of how SSM can be (and is being) used in ThDP-dependent enzymes to explore the basic aspects of that relationship. We note that SSM has a surprising capacity to provide unforeseen results. Sometimes, this may be advantageous, such as with the unexpected but welcome changes in enantiospecificity provided by some of the variants described herein. However, it may also require the reassessment of what was considered to be a well-established mechanism. Along those lines, we accept that the difficulty in using site-directed mutagenesis in the absence of structural information has been acknowledged [110]. However, based on the SSM studies with ThDP-dependent enzymes described here, it must also be acknowledged that, even when detailed structural information is available, the results obtained from alanine variants can be misleading.

In summary, we anticipate that the diversity of chemical reactions, as well as reaction and substrate selectivities, will keep ThDP-dependent enzymes at the forefront of the search for novel asymmetric catalysts over the next few years. Given the ready availability of structural information, coupled with the rapid development of medium- to high-throughput screens, we expect that SSM will play an increasingly important role in that search.