Crystal structure of a family GT4 glycosyltransferase from Bacillus anthracis ORF BA1558



The synthesis of glycosidic bonds in living organisms is performed by glycosyltransferase enzymes (GTs) that harness activated sugar donor species. Typically the activated sugar is a nucleotide or lipid-phosphate linked species. The importance of di, oligo and polysaccharides and glycoconjugates to living organisms is reflected in the observation that, typically, 1–2% of most genomes are dedicated to the synthesis or degradation of these compounds.1 To date (April 2008), over 34,000 open reading frames (ORFs) have been identified which encode GTs. There are only a small number of GT 3D structures and, furthermore, it is estimated that at least 95% of these ORFs encode a protein whose function and specificity are not known. To aid functional analysis, Henrissat and coworkers have classified all known GT ORFs into families according to amino-acid sequence similarities,2, 3 currently there are 90 GT families with details available from the CAZy website ( A feature of the family classification is that as sequence reflects structure, the catalytic apparatus and its organization are highly conserved within a family and so is the stereochemistry of catalysis.

Bacillus anthracis, the causative agent of anthrax, is known to encode 45 glycosyltransferases with homology to known GTs (statistics for the Ames strain4). The two most populated families, as is typical for bacterial genomes, are families GT2 and GT4 which encode inverting and retaining GTs, respectively (reviewed in Refs. 3 and 5). These two GT families are also the most populated in the GT classification with over 9000 and 7000 members, respectively. Furthermore, enzymes from these families carry out a vast array of different glycosyltransfer reactions, with an array of different sugar donors and myriad different acceptors. To aid better functional and structural insight into the GT families, perhaps informing a future subfamily analysis, we embarked on the structure determination of the B. anthracis str. Ames family GT4 enzyme designated ORF BA1558 (hereafter designated BaGT4BA1558). A specific goal of the analysis was to study a GT4 member with significant sequence divergence from the GT4 enzymes of known 3D structure. To date, five GT4 structures have been reported: the E. coli lipopolysaccharide α-glucosyltranferase WaaG,6 the Streptomyces viridochromogenes avilamycin eurekanate-precursor glycosyltransferase AviGT4,6 PimA, the Mycobacterium smegmatis α-mannosyltransferase involved in phosphatidylinostol mannosde synthesis,7 MshA Corynebacterium glutamicum glycosyltransferase involved in mycothiol biosynthesis (PDB code 3C48, unpublished) and a GT of unknown function wbaZ-1 from Archaeoglobus fulgidus (PDB code 2f9f, Northeast Structural Genomics Consortium, unpublished). BaGT4BA1558 shares no more than 23% sequence identity (with PimA) with any of these GT4s of known 3D structure. Here, we report the 3D structure of BaGT4BA1558, solved using Se-Met SAD methods at 3.1 Å resolution harnessing the power of 12-fold averaging to allow structural interpretation.


The gene encoding BaGT4BA1558 was amplified by the polymerase chain reaction (PCR) from Bacillus anthracis genomic DNA using primers corresponding to the predicted 5′ and 3′ ends and that incorporate 5′ overhangs designed to insert the gene into the pETYSBLIC vector.8 The recombinant plasmid pETYSBLIC-BaGT4BA1558 was introduced into BL21(RIPL) E. coli cells and cultured in 0.5 L of Overnight Express Autoinduction System 2 for Se-Met labeling media (Novagen) supplemented with 50 mg L−1 kanamycin, 34 mg L−1 chloramphenicol and 50 mg L−1 streptomycin at 37°C for 8 h. Protein expression was induced overnight at 25°C. Cells were harvested and resuspended in 20 mM HEPES pH 7.2, 400 mM NaCl and lysed by sonication. Resulting supernatant was applied to a 5 mL HisTrap column (GE Healthcare) where the protein was eluted with an imidazole gradient. Pure protein was buffer exchanged into 20 mM HEPES, pH 7.5, 150 mM NaCl, 5 mM DTT and concentrated to 16 mg/mL for crystallization. BaGT4BA1558 was crystallized from 0.1M HEPES pH 7.5, 0.2M Na2SO4, 14% PEG 3350. Crystals were cryo-protected in the mother liquor solution with the addition of 25% glycerol and flash frozen prior to data collection.

A single-wavelength anomalous dispersion (SAD) data set was collected at the peak wavelength from a single Se-Met BaGT4BA1558 crystal on ID14.4 at the ESRF to 3.1 Å resolution. The data were processed with MOSFLM and reduced with SCALA from the CCP4 suite.9 The SHELXD program10 found 112 potential Se sites using anomalous difference data to 4.5 Å resolution. Heavy atom phasing and refinement was performed using MLPHARE9 and the best 62 Se sites were used for phasing the data up to 4.5 Å resolution. RESOLVE11 was used to extend the phases to 3.1 Å incorporating 11-fold NCS averaging with operators derived from the Se sites. The resultant map was readily interpretable, aided by the Se positions, and was used to build the model with QUANTA (Accelrys, San Diego, CA). The model was subsequently extended to 12 copies in the asymmetric unit, corresponding to four tetramers, using molecular replacement as implemented in MOLREP.12 Subsequent model building used COOT13 with NCS-restrained, simulated annealing/torsional-angle dynamic refinement in CNS.14 Data collection and refinement statistics are given in Table I. Structure figures were drawn with MOLSCRIPT15 and BOBSCRIPT16 and rendered with RASTER3D.16

Table I. Data Collection and Refinement Statistics
  • a

    Numbers in parenthesis correspond to the high-resolution outer shell.

Data processing
 Space groupP21
 Unit cell lengths (Å)a = 134.6, b = 204.7, c = 135.2
 Unit cell angles (°)α = γ = 90, β = 115.5
 Molecules in asymmetric unit12
 Resolution range (outer shell)78.33–3.10 (3.27–3.10)
 Rmergea0.108 (0.87)
 〈IIa17.8 (2.1)
 Completenessa98.5 (98.0)
 Redundancya7.5 (7.0)
Refinement statistics
 Resolution range (Å)20.00–3.10
 No. protein atoms33,924
 r.m.s.d. bonds (Å)0.018
 r.m.s.d angles (°)1.905
 r.m.s.d. NCS main-chain (Å)0.003
 Mean B value Protein atoms (Å2)91
 Ramachandran statistics92.3% preferred regions, 7.4% allowed regions, 0.28% outlier (Gly304)


The electron density map resultant from phase extension and 12-fold NCS average allowed the tracing of the BaGT4BA1558 model from Lys2 to Val373 with the exception of various short exposed regions (Ser12-Val13, Phe43-Asn46, Gln61-Val64, and Glu196-Glu198). The BaGT4BA1558 sequence folds into the typical GT-B fold of glycosyltransferases3 that consists of two “Rossmann-like” β/α/β domains separated by a deep crevice in the interdomain region and a kinked C-terminal α-helix that crosses over from the C-terminal domain to contact the N-terminal domain [Fig. 1(a)]. The BaGT4BA1558 N-terminal domain (Lys2-Phe174) is composed of a six-stranded twisted β-sheet flanked, on both sides of the β-sheet, by six α-helices while the C-terminal domain (Ile175-Val373) comprises five β-strands surrounded by four α-helices. The BaGT4BA1558 crystals contain 12 copies of the protein molecule in the asymmetric unit arranged as three tetramers, Figure 1(b). Each tetramer in turn is arranged with ∼222 point group symmetry in which two distinct interfaces are present, that of the subunit A–C with buried surface area of about 1400 Å2 and the interface between subunits B-D with contact area of about 1420 Å2 while there are no significant contacts between subunits A–B and C–D. The intersection of the four subunits in the tetramer is mediated by several H-bonding interactions and salt bridges and forms a large and solvent exposed central cavity.

Figure 1.

The 3D structure of BaGT4BA1558. (a) Cartoon representation of the structure of BaGT4BA1558 color-ramped from N (blue) to C (red) terminus. (b) BaGT4BA1558 tetramer colored by chain. (c) Fragment of the electron density map obtained after density modification (12-fold NCS averaging) and phase extension to 3.1 Å. (d) Wall-eyed stereo overlay of the active centers of BaGT4BA1558, PimA and WaaG.

As anticipated from the sequence similarity, a structural search performed with the SSM server17 revealed that the closest structural neighbors to BaGT4BA1558 are the family GT-4 representatives PimA (Z-score 9.0, r.m.s.d. 2.13 Å over 295 Cα atoms), AviGT4 (Z-score 9.9, r.m.s.d. 2.17 Å over 273 Cα atoms), and WaaG (Z-score 8.0, r.m.s.d 2.66 Å over 303 Cα atoms) followed by family GT-5 Pyroccocus Abyssi glycogen synthase (Z-score 6.6, r.m.s.d. 2.85 Å over 324 Cα atoms) and family GT-20 E. coli trehalose 6-phosphate synthase (Z-score 8.0, r.m.s.d 2.61 Å over 313 Cα atoms). These latter two hits are also retaining GTs.

The function of BaGT4BA1558 remains unknown. A comparison of the active site with other members of the GT-4 family shows that BaGT4BA1558 is likely to display large conformational changes upon substrate binding since the interdomain cavity volume (≈4100 Å3) nearly doubles the volume of the ligand-bound forms of WaaG and PimA (calculations carried out using CASTp18). Despite the overall low sequence identities, residues around the active site are conserved across the various members of the GT-4 family, including BaGT4BA1558 Figure 1(d). Similar to PimA and WaaG, BaGT4BA1558 provides a hydrophobic cavity that can sandwich the nucleotide heterocyclic moiety, though, given that this nucleotide portion is mainly recognized through main-chain interactions, its identity is difficult to predict. The residues that form a generic signature in retaining GTs for the recognition of the phosphate and part of the sugar portions of the donor ligand are also invariant in BaGT4BA1558. Thus, the carboxylate side chain involved in the ribose exo-cycle oxygen binding (in BaGT4BA1558, Glu290), the lysine residue that interacts with the distal phosphate oxygens (Lys211) and the glutamate side chain that recognizes sugar O3 and O4 atoms (Glu282) are all present in BaGT4BA1558. Structural overlaps of BaGT4BA1558 with WaaG and PimA around the nucleotide binding regions show remarkable conformational convergence around the transferable sugar binding sites, which are mainly composed of main chain amide atoms. Structural overlaps show that the r.m.s.d. among main chain atoms in this region falls below 0.5 Å (overall r.m.s.d. among these structures >2 Å) suggesting that these enzymes have evolved highly similar structural solutions in order to utilize diverse sugar donor substrates. BaGT4BA1558 is however clearly more similar to PimA than to the other GT-4 enzymes in the active site. The residues around the active site are highly conserved in sequence as well as in structure including BaGT4BA1558 His120, which is the structural equivalent of PimA His118 and is located on the β-face of the transferable sugar, Figure 1(d). Though the mechanism of retaining GTs remains elusive,5 the conservation of this latter structural feature also seen in AviGT4 and other retaining GTs suggests a degree of involvement of these enzymes in the departure of the nucleotide-phosphate leaving group and/or the stabilization of the positive charge that should occur at the anomeric carbon during reaction. It is envisaged that the BaGT4BA1558 structure will aid and inform future subfamily analysis required to bridge the, ever-widening, gap between sequence and known function in the glycobiology domain.


GJD is a Royal Society/Wolfson Research Merit award recipient. The Biotechnology and Biological Sciences Research Council (BBSRC) are thanked for funding.