By continuing to browse this site you agree to us using cookies as described in About Cookies
Notice: Wiley Online Library will be unavailable on Saturday 7th Oct from 03.00 EDT / 08:00 BST / 12:30 IST / 15.00 SGT to 08.00 EDT / 13.00 BST / 17:30 IST / 20.00 SGT and Sunday 8th Oct from 03.00 EDT / 08:00 BST / 12:30 IST / 15.00 SGT to 06.00 EDT / 11.00 BST / 15:30 IST / 18.00 SGT for essential maintenance. Apologies for the inconvenience.
Note After submission of this manuscript, we received a preprint of the following paper reporting that the mutation of His29 to alanine reduces the BAL activity to 5%. Kneen MM, Pogozheva ID, Kenyon GL & McLeish MJ (2005) Exploring the active site of benzaldehyde lyase by modeling and mutagenesis. Biochim Biophys Acta: Proteins and Proteomics, doi:10.1016/j.bbapap2005.08.025
G. E. Schulz, Institut für Organische Chemie und Biochemie, Albertstr. 21, Freiburg im Breisgau, Germany 79104 Tel: +49 761 203 6058 Fax: +49 761 203 6161 Email: firstname.lastname@example.org
Pseudomonas fluorescens is able to grow on R-benzoin as the sole carbon and energy source because it harbours the enzyme benzaldehyde lyase that cleaves the acyloin linkage using thiamine diphosphate (ThDP) as a cofactor. In the reverse reaction, this lyase catalyses the carboligation of two aldehydes with high substrate and stereospecificity. The enzyme structure was determined by X-ray diffraction at 2.6 Å resolution. A structure-based comparison with other proteins showed that benzaldehyde lyase belongs to a group of closely related ThDP-dependent enzymes. The ThDP cofactors of these enzymes are fixed at their two ends in separate domains, suspending a comparatively mobile thiazolium ring between them. While the residues binding the two ends of ThDP are well conserved, the lining of the active centre pocket around the thiazolium moiety varies greatly within the group. Accounting for the known reaction chemistry, the natural substrate R-benzoin was modelled unambiguously into the active centre of the reported benzaldehyde lyase. Due to its substrate spectrum and stereospecificity, the enzyme extends the synthetic potential for carboligations appreciably.
Thiamine diphosphate (ThDP)-dependent enzymes participate in numerous biosynthetic pathways and catalyse a broad range of reactions mainly involving the cleavage and the formation of C–C-bonds. For instance, they catalyse nonoxidative and oxidative decarboxylations of 2-ketoacids, produce 2-hydroxyketones, and transfer activated aldehydes to a variety of acceptors. However, they can also form C–N, C–O and C–S bonds [1,2]. The ThDP-dependent benzaldehyde lyase (BAL, EC 188.8.131.52, suggested systematic name: 2-hydroxy-1,2-diphenylethanone benzaldehyde-lyase, i.e. benzoin benzaldehyde-lyase) catalyses the reversible ligation of two aromatic aldehydes to yield an (R)-2-hydroxyketone (Fig. 1). BAL was discovered by Gonzales and Vicuna who isolated it from the strain Pseudomonas fluorescens Biovar I, which was found in wood scraps in a cellulose factory . They showed that this strain can grow on lignin-like substrates, because the endogenous BAL can cleave the acyloin linkage of R-benzoin and R-anisoin to use these compounds as a carbon and energy source .
BAL is a valuable tool for chemo-enzymatic syntheses because it generates various enantiomerically pure 2-hydroxyketones through aldehyde ligation or by partial decomposition of racemic mixtures [5–8]. The enzyme generates activated aldehydes either via direct aldehyde addition to ThDP or via cleavage of 2-hydroxyketones but is not involved in decarboxylation reactions . BAL accepts a broad spectrum of aromatic donor substrates, among them ortho-substituted benzaldehydes, and processes substituted acetaldehydes resulting in functionalized derivatives of (R)-2-hydroxypropiophenone . The enzyme also ligates two aliphatic aldehydes resulting in highly enantio-enriched acyloins . In all reactions, BAL shows a high stereospecificity for the R-configuration of the acyloin linkage . Starting from the assumption that aldehydes which are not accepted as donor substrates may still be acceptor substrates and vice versa, a biocatalytic system for the asymmetric cross-carboligation of aromatic aldehydes has been developed . Taken together, BAL broadens appreciably the substrate spectrum of the related enzymes benzoylformate decarboxylase (BFD) [12,13] and pyruvate decarboxylase (PDC) [14–16] used for similar syntheses. Here we report the crystal structure of BAL with bound cofactor ThDP at 2.6 Å resolution, suggest the geometry of the reaction and explain the substrate specificity in structural terms.
Results and Discussion
Structure determination and description
BAL is a homotetramer of 4 × 563 amino acid residues corresponding to a molecular mass of 4 × 58 919 Da. Each subunit binds one ThDP molecule using one Mg2+ ion. The obtained crystals belong to spacegroup P3121 with one tetrameric BAL molecule (wild-type plus invisible C-terminal His-tags) per crystallographically asymmetric unit (Table 1). The crystal structure was determined by the incorporation of seleno-l-methionine (SeMet) and subsequent phasing with multiwavelength anomalous diffraction (MAD). Met1 was cleaved off during protein production as indicated by electrospray ionization mass spectroscopy (ESI-MS). The complete replacement of the 12 remaining methionines per subunit was demonstrated by ESI-MS, which showed a single peak at a mass of 562 ± 5 Da (expected 563 Da) higher than the mass of the wild-type.
Table 1. Data collection statistics. All crystals belong to spacegroup P3121. The unit cell dimensions of the SeMet-labeled crystal were a =b = 150.3 Å and c = 195.8 Å. Those of the wild-type BAL crystal were a = b = 154.7 Å, c = 200.7 Å. The corresponding packing parameters were 2.69 Å3/Da and 2.92 Å3/Da, respectively. Values in parentheses refer to the highest resolution shells, which comprised 2.70–2.58 Å in all data sets.
For the MAD data sets Friedel pairs are treated as independent reflections.
The SeMet diffraction data contained a good anomalous signal to 3.0 Å resolution. Among the expected 4 × 12 selenium sites, 4 × 11 were found and used for the initial phasing. The 4 × 1 missing SeMet positions were located in the mobile and therefore invisible C-terminal ends. After phase improvement, a model of SeMet-labeled BAL was built and refined to 2.6 Å resolution. This model served as a template for the wild-type BAL structure, which was determined by molecular replacement.
The structure of wild-type BAL was refined to 2.6 Å resolution resulting in a model closely similar to that of SeMet-BAL. It included residues 2–555 of each subunit as well as four molecules of ThDP and four Mg2+ ions. The eight C-terminal residues were disordered in both structures. Data collection and refinement statistics are given in Tables 1 and 2. The crystals of SeMet-labelled BAL and wild-type BAL grew under almost identical conditions and showed the same packing scheme but quite different unit cell axes. Since the B-factors were lower and the refinement results better for the wild-type crystals than for SeMet-labelled crystals, we refer in the following to the wild-type structure (Fig. 2).
Table 2. Refinement statistics.
SeMet-labeled BAL peak data set
Resolution range [Å]
Structured peptide (all four subunits)
Average B-factors [Å2] (mainchain/total)
R.m.s.d. bond lengths [Å]/angles [degr.]
Ramachandran angles in favored region [%]
Ramachandran angles in allowed region [%]
The BAL homotetramer has an overall size of approximately 95 × 95 × 75 Å3. No significant structural differences were found between the four crystallographically independent subunits of the tetramer (Fig. 3). Each subunit consists of the three domains Dom-α, Dom-β and Dom-γ (Fig. 2), named as in previous annotations. All three domains consist of a central six-stranded parallel β-sheet flanked by a varying number of α-helices. Residues involved in binding of the cofactor ThDP are located at the C-terminal ends of the β-strands of Dom-γ (diphosphates and Mg2+) and of Dom-α′ of a neighbouring subunit (pyrimidine moiety). The active centre is defined by the thiazolium ring of ThDP, which sits in a deep pocket opening to the outer surface of the tetramer.
The four subunits A, B, C and D form the two tight dimers A–B and C–D around the molecular axis P (Fig. 3), in which each subunit buries a solvent-accessible surface area of 3270 Å2. The two tight dimers are associated much less tightly around the molecular axes Q and R to form a D2-symmetric homotetramer. These secondary interfaces bury 1790 Å2 per subunit. The tight contact is formed by Dom-α and Dom-γ of subunit A with their counterparts in subunit B. It is stabilized by a large number of hydrogen bonds. The weaker contact results from an association of Dom-α and Dom-β of subunit A with the respective domains of subunit D. It contains only few hydrogen bonds. A large cavity lined by the four Dom-α is located at the centre of the tetramer. It contains a considerable number of crystallographic water molecules and is not connected to the active centre pocket.
To detect possible conformational changes of the tetramer, we compared the wild-type and the SeMet-labelled structures of BAL. A chainfold superposition of the four central Dom-α showed a good agreement in these domains but a radial contraction bringing the outer Dom-β and Dom-γ of SeMet-BAL up to 1.4 Å closer to the centre when compared with the wild-type. Moreover, the shrinkage of SeMet-BAL involves a 0.5-Å approach of Dom-γ (fixing the diphosphate of ThDP) towards Dom-α′ (binding the pyrimidine moiety), which may affect the thiazolium ring suspended between the two fix points. This observed contraction reveals possible domain rearrangements and agrees with the crystal unit cell changes stated in Table 1. According to the crystallization conditions, the contraction seems to be caused by an increase of the PEG 200 concentration from 50% to 55%, removing water from the protein.
Comparison with related proteins
To find related proteins in the Protein Data Bank, we performed a general search using program dali. This search identified a number of closely related structures all of which were ThDP-dependent enzymes involved in important metabolic pathways. The Z-scores ranged from 39.6 to 29.6 indicating close relationships (Table 3). The related proteins are acetolactate synthase (ALS) , acetohydroxy acid synthase (AHAS) , indolepyruvate decarboxylase (IPDC) , benzoylformate decarboxylase (BFD) , carboxyethylarginine synthase (CEAS) , pyruvate oxidase (POX)  and pyruvate decarboxylase (PDC) [14,15]. The overall sequence identity among these seven enzymes ranges from 19% to 29% with an average of 24%. A comparison of the relative domain positions in the D2-symmetric tetramer showed in general a good equivalence with deviations around 2 Å. All enzymes are especially similar with respect to the tight dimer formed by Dom-α and Dom-γ. Given the high dali scores of Table 3 and the drastic drop to the next lower score, these enzymes form a separate subset among the ThDP-dependent enzymes, which we name ‘POX group’ after the first established structure .
Table 3. Superposition of BAL with related proteins using dali.
a In all cases we used subunit A of the PDB file except for BAL where we used subunit D. b BAL contains 563 residues total. c The next lower Z-score was 16.7 indicating that the eight enzymes form a separated, closely related group. The ThDP-dependent transketolase  showed a Z-score of 13.1. d PDB file 1OZF of ALS yields the same values. e The PDC structure is that of Zymomonas mobilis.
Within this group the enzymes BFD and PDC are best characterized with respect to their function and therefore most relevant in organic synthesis. A structure-based sequence alignment of BAL with BFD and PDC is shown in Fig. 4. The alignment assigns the residue equivalences at the active centres and it presents the sequence of BAL in relation to its secondary structures. AHAS and POX contain FAD as a further cofactor besides ThDP which, however, plays merely a secondary role (Table 3). The FAD of POX accepts two electrons from the substrate and transfers them to dioxygen, whereas the FAD of AHAS is only required for structural integrity indicating that it is a relic of evolutionary development.
The POX group shows very similar binding locations for ThDP which also correspond to those of other ThDP-dependent enzymes . In all enzymes ThDP assumes a V-conformation resulting in a close approach between the C2 atom of the thiazolium ring and the N4′ atom of the pyrimidine moiety. A superposition of the cofactors is depicted in Fig. 5 revealing a remarkable conformational similarity. The diphosphates are tightly bound to the polypeptide of Dom-γ using Mg2+ as a mediator. The Mg2+ ion is octahedrally coordinated to the sidechains of Asp448 and Asn475, to the backbone carbonyl of Ser477, to the diphosphate as well as to a water molecule (Fig. 6). This binding motif is present in all ThDP-dependent enzymes. In evolutionary terms the diphosphate-binding site is the most important fix point of ThDP because it is best conserved as demonstrated by the diphosphate-binding sequence fingerprint G-D-G-X24-N-N that was detected long before any structure was known . At the other end of ThDP, the pyrimidine is well fixed in Dom-α′ of another subunit: its N1′ atom forms a strong hydrogen bond to a glutamic acid (Glu50 in BAL). A comparison of the relative B-factors along the ThDP molecules shows that the thiazolium ring and the ethylene bridge have generally the highest mobility, which corresponds to the largest positional differences of these parts observed in Fig. 5.
Active centre and reaction geometry
While the overall polypeptide architecture as well as the binding mode of ThDP are quite similar within the POX group, the active centre is not well conserved. The active centre pocket of BAL is lined by nonpolar aliphatic and aromatic but only few polar residues. In this respect, BAL is most closely related to BFD . In both crystal structures of BAL a water molecule was identified at a distance of about 3.6 Å from the C2 atom of ThDP. This water molecule forms hydrogen bonds with Gln113 and His29, among which Gln113 is known to play an important role in catalysis .
The structures of all group members show ThDP in the V-conformation that brings the C2 atom of thiazolium in close proximity to the N4′ atom of the pyrimidine moiety. Moreover, one of the reported structures of ALS  contains an inhibitor that fixes the C2 to N4′ approach through a covalent bond as shown in Fig. 5. Moreover, all group members have a glutamic acid forming a short hydrogen bond to the N1′ atom of the pyrimidine ring, which was suggested to induce the 1′,4′-imino tautomer . The actual presence of this tautomer was later demonstrated [26,27]. As shown in Fig. 6, the imino group is hydrogen bonded to the carbonyl of Gly419 so that its lone electron pair points to the C2 atom of ThDP. It is therefore most likely that the catalytic cycle starts by transferring a proton from C2 to the imine. The resulting C2 carbanion may then attack the carbonyl carbon of the substrate yielding a covalent ThDP-substrate intermediate.
During acyloin cleavage, the next step is supposedly the deprotonation of the hydroxyl by His29 followed by the dissociation of the first aldehyde. The remaining activated aldehyde is then protonated and also released. The protonation is probably performed by the water attached to His29. During acyloin synthesis, on the other hand, the intermediate is an activated aldehyde that is going to attack an acceptor aldehyde suitably positioned in the active centre. Again, His29 is likely to participate in the reaction by forming a hydrogen bond to the oxygen of the acceptor aldehyde, which is eventually converted to a hydroxyl group of the condensation product by deprotonating His29. Proton handling by His29-Nδ1 is facilitated by the contact of the Nε2 atom to the bulk solvent (Note).
All sttif seem to involve small displacements of the thiazolium ring, which are possible because this ring is relatively mobile (Fig. 5). It should be noted that ThDP is suspended between Dom-γ and Dom-α′ which in a direct comparison between the wild-type and the SeMet structures of BAL underwent a relative displacement of 0.5 Å. It is therefore conceivable that domain motions affect the positional freedom of thiazolium and thus catalysis. Since such domain displacements may be caused by the Brownian motion, it is further possible that the enzymes channel thermal energy into the chemical reaction.
BAL shows a general preference for nonpolar substrates [8,28] and is highly stereospecific with respect to benzoin, cleaving only R-benzoin out of a racemic mixture . Moreover, BAL reacts with benzaldehyde and acetaldehyde to yield (R)-2-hydroxypropiophenone , in contrast to BFD, which uses the same educts to produce the S-enantiomer . In order to explore the geometry of the reaction catalyzed by BAL, R-benzoin was modeled into its active centre (Fig. 7). The resulting model accounts for a nucleophilic attack from the deprotonated C2 atom of the thiazolium ring under the expected Bürgi-Dunitzangle of 103° ± 3° onto the carbonyl carbon of R-benzoin . Fulfilling this restraint, the substrate is uniquely defined with respect to general location and conformation because all alternatives met severe steric obstacles. In contrast to R-benzoin, any model of the S-enantiomer gave rise to major sterical clashes, which explains the stereospecificity of BAL. In the resulting R-benzoin model the hydroxyl is located at the position of the water molecule observed in both crystal structures of BAL (Figs 6 and 7) as well as in the crystal structure of BFD . We suggest that this applies for all acyloin cleavage reactions of BAL. During acyloin C–C-bond formation, on the other hand, this water site accommodates the oxygen of the acceptor aldehyde.
All residues lining the active centre pocket are depicted in Fig. 7. A comparison with the functionally well-established enzymes BFD and PDC shows almost no conservation (Fig. 4). However, Ala480 and Phe484 are conserved between BAL and BFD. These residues were therefore mutated resulting in decreased activity, as to be expected from their location within the active centre . In BAL, the chain around Phe484 is quite mobile with B-factors about 20 Å2 higher than average so that this side chain may close down on a bound substrate performing an induced-fit motion. Such a side chain displacement is supported by a comparison with BFD, where Phe484 points into the active centre as shown in Fig. 7.
The established structure of BAL invites further efforts to identify the roles of the various catalytic residues through mutational and structural studies combined with enzyme kinetic measurements. This knowledge together with designed mutations are likely to expand the range of organic compounds that can be produced enzymatically.
Expression and purification
Wild-type BAL with a C-terminal His-tag (Fig. 4) was obtained from Escherichia coli SG13009 cells following a previously described procedure . Cells were grown at 37 °C and expression of BAL was induced with isopropyl-β-d-thiogalactopyranoside (IPTG). After cell lysis, the supernatant was applied to a Ni-chelate column. The enzyme was further purified on a gel permeation column (Superdex 200, Amersham-Pharmacia, Freiburg, Germany) using buffer A (25 mm Hepes pH 6.9, 200 mm NaCl, 2.5 mm MgCl2, 0.1 mm ThDP and 2 mm dithiothreitol). BAL-containing fractions were identified by SDS/PAGE, pooled and adjusted to a concentration of 20 mg·mL−1. The typical yield was 8 mg protein·g−1 cell pellet. SeMet-labelled BAL was obtained by introducing the expression vector into the methionine-auxotrophic E. coli strain B834(DE3). Cells were cultured in LeMaster medium  containing 25 mg·L−1 seleno-l-methionine (Acros). Cultivation and purification procedures were the same as for wild-type BAL. The yield of purified SeMet-labeled BAL was about 6 mg·g−1 cell pellet. Full incorporation of SeMet was verified by ESI-MS.
Crystallization and data collection
The purified protein (wild-type and SeMet) was dialyzed for 12 h against buffer B (5 mm Hepes pH 6.9, 10 mm NaCl, 2 mm MgCl2 and 2 mm dithiothreitol). The solution was then adjusted to a concentration of 12 mg·mL−1 and crystallised by the hanging drop vapour diffusion method at 20 °C. The drops consisted of 1.8 µL protein in buffer B, 0.2 µL of an Agarose-LM solution (3%, 37 °C; Hampton Research, Alieso Veijo, CA, USA) and 2 µL buffer C (50% (v/v) PEG 200 for wild-type BAL or 55% (v/v) PEG 200 for SeMet-labelled BAL, 100 mm Mes pH 6.85), which was also used as the reservoir. The crystals appeared after about 3 days and reached maximum sizes of 300 × 80 × 80 µm3 a week later. All crystals belonged to spacegroup P3121. They were flash-frozen in liquid nitrogen without a further addition of a cryo-protectant. Data collection of the wild-type crystals was carried out at beamline PX of the Swiss Light Source (Villigen, Switzerland). MAD data were collected from a single SeMet crystal using beamline BW7A at the EMBL-outstation (DESY Hamburg). All data were processed and scaled with program XDS .
Structure determination and refinement
Using the MAD data sets, the positions of the selenium atoms were established with solve. The selenium sites were refined and used for initial phasing with sharp. Density modification and initial model building was carried out using resolve. The model was manually completed with XFIT  and subsequently refined with the Anneal and Minimize options of CNS  followed by a restrained refinement with refmac. Water molecules were introduced using arp/warp. They were confirmed wherever the (2Fo-Fc)-map showed a density above 0.8 σ and the environment allowed the formation of hydrogen bonds. The procedure resulted in about 0.2 water molecules per residue. The refinement was completed with 10 cycles of tls/refmac specifying each subunit of the tetramer as a TLS group. Non-crystallographic symmetry restraints were used throughout the refinement. Subsequently, the structure of wildtype BAL was established by molecular replacement using molrep. It was refined in the same way starting from the model of SeMet-labeled BAL. Both structures were evaluated with procheck and rampage. Model building of R-benzoin in complex with BAL was performed by manually docking the substrate into the active centre, followed by energy minimization using the Anneal and Minimize options of CNS. For the structure similarity search we used dali. It should be noted that the general search with dali failed to find the enzymes als and ipdc in the Protein Data Bank. Structural superpositions were performed with lsqman. Figures were produced with povscript+  and povray[http://www.povray.org. The coordinates and structure factors have been deposited in the Protein Data Bank under accession codes 2AG0 and 2AG1.
We thank Martina Pohl for kindly providing us with the gene of the benzaldehyde lyase and for helpful discussions, the teams of the Swiss Light Source (Villigen/CH) and of the EMBL outstation Hamburg for their help with data collection and J. Wörth for the ESI-MS measurements. Further, we thank M. J. McLeish for sending us a preprint of his paper (details in Note). The project was supported by the Deutsche Forschungsgemeinschaft under grants SFB-380 and SFB-388.