Crystal structure of the Escherichia coli YcdX protein reveals a trinuclear zinc active site§


  • Alexey Teplyakov,

    1. Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute and the National Institute of Standards and Technology, Rockville, Maryland
    Search for more papers by this author
  • Galina Obmolova,

    1. Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute and the National Institute of Standards and Technology, Rockville, Maryland
    Search for more papers by this author
  • Pavel P. Khil,

    1. Genetics and Biochemistry Branch, NIDDK, National Institutes of Health, Bethesda, Maryland
    Search for more papers by this author
  • Andrew J. Howard,

    1. Center for Synchrotron Radiation Research and Instrumentation, Biological, Chemical and Physical Sciences Department, Illinois Institute of Technology, Chicago, Illinois
    Search for more papers by this author
  • R. Daniel Camerini-Otero,

    1. Genetics and Biochemistry Branch, NIDDK, National Institutes of Health, Bethesda, Maryland
    Search for more papers by this author
  • Gary L. Gilliland

    Corresponding author
    1. Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute and the National Institute of Standards and Technology, Rockville, Maryland
    • Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute and the National Institute of Standards and Technology, 9600 Gudelsky Drive, Rockville, MD 20850
    Search for more papers by this author

  • This article is a US Government work and, as such, is in the public domain in the United States of America.

  • Certain commercial materials, instruments, and equipment are identified in this manuscript to specify the experimental procedure as completely as possible. In no case does such identification imply a recommendation or endorsement by the National Institute of Standards and Technology nor does it imply that the materials, instruments, or equipment identified is necessarily the best available for the purpose.

  • §

    The accepted SI units of concentration, mol/L, and of unified atomic mass unit, u, have been represented by the symbol M and by the symbol Da, respectively, to conform to the conventions of this journal.


The ycdX gene of Escherichia coli encodes a protein of 27 kDa molecular weight that belongs to the PHP superfamily of proteins with diverse functions.1 This superfamily includes several types of DNA polymerases (namely, the α-subunit of bacterial DNA polymerase III, eukaryotic DNA polymerase β, and the X-family of DNA polymerases), histidinol phosphatases, and a number of uncharacterized protein families. In common for all PHP proteins is the presence of four conserved sequence motifs that contain invariant histidine and aspartate residues implicated in metal ion coordination.1 As part of DNA polymerases, the PHP domain was suggested to hydrolyze pyrophosphate and thereby shift the reaction equilibrium toward nucleotide polymerization.1 However, it cannot be ruled out that the PHP domain possesses a nuclease activity, particularly in the repair polymerases of the X-family. No functional information is available for stand-alone proteins that belong to the PHP superfamily. The YcdX protein is one of them.

The crystal structure determination of YcdX was undertaken as part of a structural genomics effort2 ( to assist with the functional assignment of the protein. The YcdX protein from E. coli was cloned, expressed, and the crystal structure determined at 1.6 Å resolution.


TIM, triosephosphate isomerase; PCR, polymerase chain reaction; APS, advanced photon source; CCD, charge-coupled device; SER-CAT, South East Regional Collaborative Access Team.

Results and Discussion.

YcdX has an unusual topology of a β7α7 barrel compared with the more common β8α8 (TIM) barrel. All β-strands are parallel, and their order is consecutive (Fig. 1). The C-terminal helix α8 caps the barrel on the N-terminal side. The deep cleft at the C-terminal side of the barrel contains three metal binding sites ligated to the imidazole and carboxylate groups of the protein (Fig. 2). In the native structure, one of the sites is occupied by Zn2+ (Zn1 in Fig. 2), although zinc was never added to the media. The nature of the ion was established by measuring the X-ray fluorescence at the Zn absorption edge (9662 eV) at a synchrotron source. The other two sites are probably occupied by Na+, which was present in the cryosolution. Crystallization of YcdX in the presence of zinc acetate results in the fully occupied zinc trinuclear cluster as indicated by the anomalous difference Fourier map.

Figure 1.

Ribbon presentation of the polypeptide fold of YcdX. A disordered loop 162–170 is shown in white. The blue spheres indicate zinc ions in the active site.

Figure 2.

Stereoview of the trinuclear zinc cluster in YcdX. Zn1 is the “high-affinity” site occupied by zinc in the native structure. Smaller spheres represent water molecules. Potential hydrogen bonds are shown as dashed lines.

The observed metal cluster reveals a likely catalytic site, which is in agreement with the canonical location of the active site in β/α-barrel enzymes at the C-edge of the closed β-sheet.3 The amino acid conservation pattern also supports this conclusion. All residues involved in metal coordination are invariant in the YcdX family emphasizing their functional importance.

According to the SCOP database,4 cellulases are the only protein family characterized by a seven-stranded β/α-barrel. However, despite the topological similarity, the structure of the active site in cellulases5–7 is completely different from that observed in YcdX. Therefore, a functional relationship between these proteins seems unlikely.

Zinc is one of the most ubiquitous cofactors in the protein world: hundreds of enzymes use zinc for catalysis.8 Despite all differences in the three-dimensional structures and reactions catalyzed, one common theme for many of these enzymes is the activation of a water molecule coordinated to Zn2+ for a nucleophilic attack on the carbonyl carbon of a peptide bond, or the phosphorus atom of a phosphoester bond.

Only four proteins with known structures have a trinuclear zinc catalytic site. All four (nuclease P1, endonuclease IV, alkaline phosphatase, and phospholipase C) hydrolyze the phosphoester bond. This finding suggests a similar activity for YcdX. Furthermore, endonuclease IV is structurally similar to YcdX because it has a TIM-barrel topology.9

By analogy,9–11 the catalytic mechanism of YcdX may proceed through the nucleophilic attack of the putative susceptible phosphorus atom by the water molecule bridging Zn2 and Zn3, which is presumably a hydroxide ion.12 The resulting pentacoordinate transition state is stabilized by all three metal ions, whereas the unesterified phosphate oxygen remains bound to Zn1, so that collapse of the transition state inverts the stereochemistry at the phosphate.

Based on the analysis of the sequence, Aravind and Koonin1 have concluded that the nature and distribution of conserved residues in the PHP superfamily is analogous to the urease-pyrimidinase superfamily. Although no specific relationship between the two superfamilies was detected, it was suggested that the PHP proteins may have a similar catalytic mechanism and also some version of the TIM barrel fold. The crystal structure of the YcdX protein confirms this prediction.

The YcdX crystals reveal a protein trimer sitting on the crystallographic threefold axis. In another crystal form obtained by cocrystallization with cadmium, the crystal packing is different, and the entire trimer occupies the asymmetric part of the unit cell (A. Teplyakov, G. Obmolova, G. Gilliland, unpublished data). These observations are in agreement with the gel-filtration data, which indicate the trimer as the predominant oligomeric state of the protein in solution. The crystal structure shows that all active sites are accessible in the trimer. The α-helix α3 of each monomer is at the entrance to the active site of another monomer. Trp51 and Asn55 from this helix point to the active site and may play a role in binding a substrate. A sulfate ion located in the native crystal at this site may mark the electrostatically favorable position for the putative phosphate of the substrate. Asn55 is conserved in the YcdX family although not in other PHP proteins. This feature can serve as an indicator of the trimeric structure of the protein characteristic of the YcdX family.

Analysis of the gene expression levels in E. coli under the conditions of genotoxic stress13 indicates that YcdX may be related to DNA repair. YcdX was among the genes significantly induced in response to the DNA damage caused by mitomycin C, as were many of the genes known to be involved in DNA repair. If the YcdX family is functionally related to DNA polymerases, then YcdX may be a stand-alone nuclease or phosphatase in one of the pathways of DNA repair. Further biochemical and biophysical studies are required to reveal the molecular and cellular functions of the PHP proteins. These studies will be facilitated by the three-dimensional structure of YcdX.

Materials and Methods.

Cloning, expression, and purification.

The ycdX gene was PCR amplified from E. coli MG1655 genomic DNA and subcloned into a pDONR201 plasmid using the Gateway Technology (Invitrogen). For expression, the coding sequence was transferred into a pDEST14 plasmid using site-specific recombination (Invitrogen). The protein was produced in E. coli strain BL21 Star (DE3) (Invitrogen) that was transformed with pDEST14. Cells were grown on LB media containing 100 μg/μL ampicillin at 37°C to an A600 of 0.6 and induced with 1 mM isopropyl β-D-thiogalactoside for 3 h. The protein was purified by column chromatography in two steps using Source 30Q (Pharmacia) and Butyl-560M (Toyopearl).

Crystallization and structure determination.

YcdX crystals were grown by the vapor diffusion hanging drop method at room temperature from 0.1 M Tris, pH 8.5, 60% ammonium sulfate, and 3% 2,4-methyl-pentanediol. A heavy atom derivative was obtained by cocrystallizing the protein in the presence of 2 mM ethylmercury phosphate. The trinuclear zinc complex was obtained by cocrystallizing the protein in the presence of 10 mM zinc acetate. All crystals are isomorphous and belong to the space group P321 with unit cell parameters: a = b = 77.2 Å, c = 80.0 Å. There is one protein molecule in the asymmetric unit. For X-ray data collection, the crystals were soaked in the cryoprotectant solution containing 0.1 M Tris, pH 8.5, 60% ammonium sulfate, and 3.6 M sodium formate for a few seconds, and flash-frozen in liquid propane.

The structure was solved by single isomorphous replacement using the Hg derivative. X-ray diffraction data to 2.15 Å resolution for the native protein and Hg derivative were measured on a Bruker rotating anode X-ray generator with a Mar345 image plate detector. High-resolution (1.6 Å) diffraction data for the native protein as well as 2.3 Å data for the zinc complex were collected on the SER-CAT beamline 22-ID at the APS (Argonne, IL) using a Mar CCD detector. All data (Table I) were processed with HKL2000.14 A single mercury site was located by the Shake-and-Bake method.15 Both isomorphous and anomalous differences were used for phasing using MLPHARE/DM.16 The electron density was of excellent quality, and most of the polypeptide chain was automatically traced with RESOLVE.17 This model contained 200 of 234 residues present in the final model. The atomic model was completed by using O18 and refined with REFMAC.19 The N-terminal methionine and residues 162–171 forming a loop between β6 and α6 are not visible in the electron density.

Table I. X-Ray Data and Refinement Statistics
Data setNative-IIZnNative-IHg
  • a

    Anomalous pairs not merged.

Resolution (Å)1.572.32.152.15
No. of unique reflections38,32512,59615,20128,608a
Completeness (%)10099.299.799.6
Rsym (Σ|I-|)/ΣI)0.0430.0410.0680.096
<I/σ> (outer shell)2.912.88.33.5
Rcryst (Σ‖Fo|-|Fc‖)/Σ|Fo|)0.1780.165  
Rfree (5% data)0.2080.225  
No. of protein atoms1,8021,802  
No. of ion atoms913  
No. of water molecules335188  
RMSD in bonds (Å)0.0170.017  
RMSD in angles (°)1.51.6  
RMSD in main-chain B factors (Å2)2.45.3  

The atomic coordinates of the native protein and the zinc complex were deposited in the Protein Data Bank under the accession codes 1M65 and 1M68, respectively.


We thank John Chrzas for providing the beamtime and for help in using the SER-CAT beamline at APS. Use of the Advanced Photon Source was supported by the U.S. Department of Energy, Basic Energy Sciences, Office of Science, under Contract No. W-31-109-Eng-38. This work was also supported in part by an award from the W.M. Keck Foundation.