NMR structure of the Escherichia coli protein YacG: A novel sequence motif in the zinc-finger family of proteins


  • Theresa A. Ramelot,

    1. Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington
    Search for more papers by this author
  • John R. Cort,

    1. Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington
    Search for more papers by this author
  • Adelinda A. Yee,

    1. Ontario Cancer Institute and Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
    Search for more papers by this author
  • Anthony Semesi,

    1. Ontario Cancer Institute and Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
    Search for more papers by this author
  • Aled M. Edwards,

    1. Ontario Cancer Institute and Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
    2. Northeast Structural Genomics Consortium
    Search for more papers by this author
  • Cheryl H. Arrowsmith,

    1. Ontario Cancer Institute and Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
    2. Northeast Structural Genomics Consortium
    Search for more papers by this author
  • Michael A. Kennedy

    Corresponding author
    1. Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington
    2. Northeast Structural Genomics Consortium
    • Battelle, Pacific Northwest National Laboratory, P.O. Box 999/MS K8-98, Richland, WA 99352
    Search for more papers by this author


We have used nuclear magnetic resonance (NMR) spectroscopy to determine the solution structure of YacG (gi|7466984), a small zinc-binding protein encoded by the Escherichia coli yacG gene. YacG is a conserved hypothetical protein (COG 3024) with homologs in various bacterial species (Fig. 1). A structure similarity search using Dali1 did not reveal any structurally similar proteins (Z score all < 2). However, the protein has characteristic features of a zinc finger including two antiparallel β-strands, an α-helix, and a tetrahedral Zn+2-binding site. The consensus motif for the Cys residues (–C–X2–C–X15–C–X3–C–) is invariant among the YacG homologs but is not present in any other zinc-binding proteins with known structures.

Figure 1.

Sequence alignment of YacG (gi|7466984) with seven other bacterial conserved hypothetical proteins: gi|11354705 (Vibrio cholerae, group O1 strain N16961), gi|1074553 (Haemophilus influenzae, strain RD KW20), gi|15601954 (Pasteurella multocida), gi|11348207 (Pseudomonas aeruginosa, strain PAO1), gi|11353003 (Neisseria meningitidis, group B strain MD58), gi|11352982 (Neisseria meningitidis, group A strain Z2491), gi|16126579 (Caulobacter crescentus CB15).

Materials and Methods

The yacG gene was PCR amplified from genomic DNA into a pET15b (Novagen) vector. This vector encodes the YacG protein with an N-terminal hexa-His tag and thrombin cut site. The protein expression and purification method is described by Yee et al.2 The NMR samples of 1–3 mM uniformly 15N/13C-labeled YacG were prepared in 450 mM NaCl, 25 mM Na2HPO4, 10 mM DTT, 20 μM Zn2+, 1 mM benzamidine, 1 × inhibitor cocktail (Roche Molecular Biochemicals), and 0.01% NaN3 in 10% v/v D2O/H2O at pH 6.5. A cobalt derivative was prepared by expressing YacG in minimal medium supplemented with cobalt nitrate to yield blue-colored protein. The cobalt derivative was studied in the same NMR buffer without zinc. TCEP could not be used in place of DTT because it resulted in protein aggregation.

NMR spectra were recorded at 25°C on 600, 750, and 800 MHz Varian Inova spectrometers. Chemical shifts were referenced to external DSS. Backbone and side-chain assignments were made by using the following triple resonance experiments recorded on 15N/13C-labeled samples: HNCO, HNCACB, CBCA(CO)NNH,3, 4 HNHA,5–7 HCCH[BOND]TOCSY,8 HCC[BOND]TOCSY[BOND]NNH,9, 10 and CCC[BOND]TOCSY[BOND]NNH.9, 11, 12 NOE restraints were determined from 3D 15N-edited NOESY[BOND]HSQC (τm = 150 ms),3, 13 CN[BOND]NOESY[BOND]HSQC (τm = 120 ms),14 and 4D CC[BOND]NOESY[BOND]HMQC (D2O, τm = 120 ms)15 experiments. 15N-HSQC3, 13 spectra were recorded in H2O and 2 h after the sample was lyophilized and redissolved in D2O. In this spectrum, there were nine amide cross-peaks belonging to the slowest exchanging amide protons: C9, T11, C12, K14, F27, C28, C32, and Q33. Steady-state heteronuclear 1H-15N NOE values were measured by using 2D spectra with a 5-s delay (NONOE) and a 2-s delay followed by 3 s of 1H saturation (NOE).16

Spectra were processed with Felix (MSI) and analyzed with Sparky (http://www.cgl.ucsf.edu/home/sparky). Backbone HN assignments were complete except for M1, S2, and E20. Residues 2–4, 20, and 41–65 had only intraresidue and sequential NOEs. Chemical shifts of 1H, 15N, and 13C resonances have been deposited in BioMagResBank (accession code 5335).

NOE cross-peaks were characterized as short (1.8–2.5 Å), medium (1.8–3.5 Å), and long (1.8–5 Å) distance restraints. Pseudoatom correction of 1 Å for methyls or 2.4 Å for unresolved methyl groups and aromatic protons were added to the upper bounds. Fourteen dihedral angle restraints for ϕ were derived from the HNHA experiment. A restraint of −100 ± 80° was applied for two residues for which the intraresidue Hα-HN NOE was clearly weaker than the NOE to the preceding Hα. Thirteen ψ restraints were added when preliminary structure calculations clearly indicated α-helix or β-strand secondary structure. Two hydrogen bond restraints within the β-sheet were added on the basis of cross-strand NOEs. Of the nine slowly exchanging amide protons, only C9 had a clear carbonyl hydrogen bond acceptor and was restrained. The Zn+2 atom was incorporated into the calculations by addition of 10 restraints to maintain tetrahedral geometry (4 Sγ-Zn distance restraints of 2.3–2.4 Å and 6 Sγ-Sγ restraints of 3.6–3.85 Å).

A total of 367 distance restraints and 29 dihedral restraints were used in the calculation of 40 structures using XPLOR-3.8417 routines dg_subembed.inp, dg_full_embed.inp, dgsa.inp, and refine_gentle.inp. The 20 structures with the lowest total energy were deposited in the RCSB Protein Data Bank (PDB) with accession code 1LV3. The backbone trace of the ensemble of 20 lowest energy structures in shown in Figure 2(a), and the structural statistics are compiled in Table I. Ramachandran analysis was performed by using PROCHECK-NMR.18

Figure 2.

a: Backbone (N, Cα, and C′) of 20 NMR structures of YacG (residues 4–40) optimally superimposed with respect to the average coordinates of the backbone atoms of residues 6–17 and 30–37. b: Ribbon diagram representative of YacG structure (all residues). Cysteine residues coordinating the Zn+2 are shown. The first 3 and last 25 residues are unstructured and shown in a random configuration.

Table I. Structural Statistics for YacG Final Ensemble of 20 Structures
  • a

    Residues 4–40.

  • b

    Residues in β-strands, rubredoxin knuckle, and α-helix: 6–17, 30–37.

Distance restraints  
 Sequential (|i–j| = 1)120 
 Medium (1 < |i–j| < 5)52 
 Long-range (|i–j| ≥ 5)86 
 Hydrogen bonds (2 per hydrogen bond)6 
 Zn restraints10 
Dihedral restraints  
Restraints per residuea10.7 
Distance restraint violations  
 Mean number of violations20.1 ± 2.4 
 Mean RMS violation (Å)0.011 ± 0.002 
Dihedral restraint violations  
 Mean number of violations0.8 ± 0.7 
 Mean RMS violation (°)0.11 ± 0.15 
RMSd from the average coordinates (Å)All residuesaSecondary structureb
 Backbone atoms (N, Cα, C′)0.46 ± 0.300.21 ± 0.11
 All heavy atoms1.01 ± 0.470.77 ± 0.21
Ramachandran statistics (%)All residuesaSecondary structureb
 Residues in most favored region75.895.3
 Residues in additional allowed regions16.34.7
 Residues in generously allowed regions3.50.0
 Residues in disallowed regions4.40.0

Results and Discussion

The structures of YacG contain two β-strands (6–9 and 14–17) followed by a 12-residue loop and an α-helix (30–37) [Fig. 2(b)]. The zinc is coordinated by four cysteines, located in the turn between the β-strands, immediately before and in the N-terminus of the helix (C9, C12, C28, and C32). The ϕ and ψ torsion angles for C9-K14 are typical of those characterized for a “rubredoxin knuckle” first identified in the iron-binding domains of rubredoxin19 and identified in many zinc fingers containing the sequence C–X2–C–G–X. YacG has this sequence pattern and also contains two main-chain hydrogen bonds that are characteristic of this turn.20 These hydrogen bonds between the G13 HN and C9 CO, and C9 HN and K14 CO were observed in all 20 structures in the ensemble. Hydrogen bonds to the cysteine side-chain of C9 from amide protons of T11, C12, and C28 were observed in many of the calculated structures (Sγ-N < 3.7 Å and Sγ-HN < 2.8 Å). These hydrogen bonds are also typically found in rubredoxin knuckles. S29 may be the N-terminal helix-capping residue. Its side-chain could be hydrogen bonded to the S32 HN as was seen in some structures. G38 has an αL configuration that breaks the helix at the C-terminus. The heteronuclear 1H−15N NOE values vary from 0.40 to 0.80 for residues 4–40 (mean value of 0.58 ± 0.13) and the 25 C-terminal residues have heteronuclear NOE values typical of an unstructured tail (data not shown).

Evidence for Zn+2 coordination comes from solid-state zinc NMR (personal communication A.S. Lipton). The spectra of lyophilized YacG indicate tetrahedral geometry and supports four sulfur coordination. Further evidence for tetrahedral geometry and cysteine coordination of the zinc-binding site comes from the UV-visible spectrophotometry spectrum (250–800 nm) of the cobalt derivative (data not shown). Co-S charge transfer bands were observed in the UV spectrum (310 and 360 nm) and the Co+2d-d transitions were observed in the visible spectrum (625, 690, and 750 nm). These spectra suggest a C4 cysteinate ligation and tetrahedral Zn+2 geometry similar to that described for N-terminal zinc finger of murine GATA-1 (NF).21

Although the Dali search did not reveal a structural similarity to any protein of known structure, YacG has a similar Zn+2 position and secondary structure architecture to NF.22 NF is also a C4 zinc-binding protein with two β-strands, an α-helix, and a rubredoxin knuckle between the strands. The structural similarities extend to a long loop between the second strand and the helix and a similar location of cysteines with respect to the tertiary structure. However, there is a different number of residues between each of the last three cysteines in the motif (NF has the sequence –C–X2–C–X17–C–X2–C–). Looking only at the secondary structural elements, the backbone root-mean-square deviation (RMSD) between YacG and NF is about 2.5 Å (residues 6–9, 14–17, and 30–37 of YacG). The overall structural similarity suggests that YacG might be involved in transcription either through protein-protein interactions (e.g., NF) or by DNA-binding [e.g., the C-terminal zinc finger of GATA-1 (CF)]. However, the specific residues important for these activities are not conserved in YacG, making functional conclusions about YacG difficult. The major differences between NF and YacG are the structure in the loop, the length of the helix, and the number of residues between the last two coordinating cysteines, which result in a different orientation of the helix relative to the β-strands. In YacG, the helix is less parallel to the β-strands resulting in the C-terminus of the helix being angled further away from the β-strands.


The authors thank Anna Khachatryan for help with YacG purification. Acquisition and processing of NMR spectra and structure calculations were performed at the Environmental Molecular Sciences Laboratory (a national scientific user facility sponsored by the U.S. DOE of Biological and Environmental Research) located at Pacific Northwest National Laboratory and operated by Battelle for the DOE (contract KP130103). This work was supported by the NIH Protein Structure Initiative Northeast Structural Genomics Consortium (grant P50-GM62413) YacG from E.coli is target ET92 of the consortium.