Crystal structure of YbaB from Haemophilus influenzae (HI0442), a protein of unknown function coexpressed with the recombinational DNA repair protein RecR


  • Kap Lim,

    1. Center for Advanced Research In Biotechnology, University of Maryland Biotechnology Institute, Rockville, Maryland
    Search for more papers by this author
  • Aleksandra Tempczyk,

    1. Center for Advanced Research In Biotechnology, University of Maryland Biotechnology Institute, Rockville, Maryland
    Search for more papers by this author
  • James F. Parsons,

    1. Center for Advanced Research In Biotechnology, University of Maryland Biotechnology Institute, Rockville, Maryland
    Search for more papers by this author
  • Nicklas Bonander,

    1. National Institute of Standards and Technology, Gaithersburg, Maryland
    Search for more papers by this author
  • John Toedt,

    1. Center for Advanced Research In Biotechnology, University of Maryland Biotechnology Institute, Rockville, Maryland
    Current affiliation:
    1. Department of Physical Science, Eastern Connecticut State University, Willimantic, CT.
    Search for more papers by this author
  • Zvi Kelman,

    1. Center for Advanced Research In Biotechnology, University of Maryland Biotechnology Institute, Rockville, Maryland
    Search for more papers by this author
  • Andrew Howard,

    1. Advanced Photon Source, Argonne National Laboratory, Argonne, Illinois
    2. Biological, Chemical, and Physical Science Department, Illinois Institute of Technology, Chicago, Illinois
    Search for more papers by this author
  • Edward Eisenstein,

    1. Center for Advanced Research In Biotechnology, University of Maryland Biotechnology Institute, Rockville, Maryland
    2. Department of Chemistry and Biochemistry, University of Maryland Baltimore County, Baltimore, Maryland
    Search for more papers by this author
  • Osnat Herzberg

    Corresponding author
    1. Center for Advanced Research In Biotechnology, University of Maryland Biotechnology Institute, Rockville, Maryland
    • Center for Advanced Research in Biotechnology, 9600 Gudelsky Drive, Rockville, MD 20850
    Search for more papers by this author


Recovery of replication following DNA damage is essential for preserving genome integrity. UV-induced DNA damage halts chromosomal replication, and in bacteria, recovery requires a number of rec proteins, such as RecA, RecF, and RecR, and nucleotide excision repair proteins UvrA or UvrC (see recent reviews1–2). In addition, RecG removes the replication machinery from the blocking lesion, and RecJ and RecQ degrade nascent DNA at blocked replication forks prior to the resumption of DNA synthesis. In the absence of either recF or recR, the nascent DNA degradation is much more extensive than in the presence of these genes.3 Thus, RecF and RecR appear to protect the DNA strands of the replication fork when it is blocked by DNA damage.

The recR gene of Escherichia coli,Bacillus subtilis, and Streptomyces lividans is preceded by a small open-reading frame encoding a protein of unknown function (YbaB, termed in the above organisms orf12, orf107, and orf1, respectively), which is cotranscribed with recR.4–6 When the operon was deleted in the S. lividans genome, the bacteria displayed increased sensitivity to DNA-damaging agents compared with the wild-type strain. The wild-type orf1-recR genes complemented the deletion mutant strain, as well as recR alone. However, although the B. subtilis orf107-recR complemented the S. lividans orf1-recR deletion mutant, in the absence of orf107, recR from B. subtilis was unable to restore the wild-type phenotype to the Streptomyces deletion mutant.6

In Haemophilus influenzae, the genes coding for YbaB and RecR are also adjacent to one another. They correspond to HI0442 and HI0443, respectively, of the TIGR numbering scheme.7 We have undertaken the crystal structure determination of HI0442 to explore whether the structure provides clues about the function of the protein ( The structure defines a new fold, and the structural features suggest that the protein may play a regulatory role in the recovery of DNA replication through a binding surface that competes with DNA for one of the components of the replication fork.

Materials and Methods.

Protein production: Following the sequence described in the TIGR comprehensive microbial resource database (, the H. influenzae KW20 HI0442 gene encoding 121 residues was cloned in pET15b (Novagen), containing a thrombin-cleavable His6-tag at the N-terminus. This yielded a mixture of homo- and hetrodimers consisting of 121- and 109-residue proteins, as measured by matrix-assisted laser desorption (MALDI) mass spectrometry (PerSeptive Voyager DE/TOF). The protein mixture yielded crystals of poor quality. MALDI mass spectral analysis of dissolved crystals also displayed the two molecular weight products. Further analysis showed that the sequence in Swiss-Prot (accession number P44711) is 109 residues long, and multiple sequence alignment showed that the shorter version is prevalent in the sequence family. To improve crystal quality, the two protein products were separated, and N-terminal amino acid sequencing confirmed that the 109-residue protein was a truncated version of the 121-residue protein. Thus, the truncated version was generated using the Quikchange mutagenesis kit (Stratagene), by deleting 36 bases from the 5′ end of the gene.

Protein expression was achieved in E. coli strain BL21(DE3). The cells were grown at 30°C in Luria–Bertani (LB) medium until the cell density reached A600 = 0.7, when isopropylthio-β-galactoside (IPTG) was added to a final concentration of 1 mM. Cells were disrupted by passage through French press in lysis buffer containing 50 mM sodium phosphate, pH 8.0, 300 mM NaCl, 10 mM imidazole. Cell debris was removed by centrifugation, the extract was applied to Ni-NTA column (Qiagen), washed with lysis buffer containing 20 mM imidazole, and eluted with lysis buffer containing 250 mM imidazole. The His-tagged protein solution was dialyzed against buffer containing 10 mMN-2-hydroxyethylpiperazine-N-2′-ethanesulfonic acid (HEPES), pH 8.0, 150 mM NaCl, 1 mM ethylene-diamine-tetra-acetic acid (EDTA) and 0.5 mM dithiothreitol (DTT). The His-tag was removed by reaction with α-thrombin (Haematologic Technologies, Inc.) at a 1:1000 (wt/wt) ratio for 3 h at 22°C. Thrombin was removed by passage through a benzamidine agarose column. Uncleaved protein and the His-tag were separated from the cleaved HI0442 by passage through a Ni-NTA column. Protein purity was assessed by sodium dodecyl sulfate–polyacrylamide gel electrophoresis (SDS-PAGE). The yield was approximately 13 mg/g cell paste. We prepared protein containing selenomethionine (SeMet) in E. coli strain B834(DE3) from minimal media containing 50 mg/L seleno-L-methionine, using procedures described previously,8 and following the same purification protocol as above. Protein yield was approximately 10 mg/g cell paste.

The molecular weight was verified by matrix-assisted laser desorption time-of-flight mass spectrometry (MALDI-TOF), and the oligomeric state of the protein in solution was determined by sedimentation equilibrium.

Structure determination: The SeMet-containing HI0442 was crystallized at room temperature by the vapor diffusion method in hanging or sitting drops. A 13 mg/mL protein solution in buffer containing 50 mM Tris, pH 7.5, 0.1 mM DTT, and 0.1 mM EDTA, was mixed with half the volume of mother liquor containing 0.9–1.1 M ammonium phosphate monobasic, pH 3.8, and equilibrated against the mother-liquor reservoir. Rod-shaped crystals appeared within 1 day, and continued to grow for a week to approximately 0.5 × 0.2 × 0.2 mm3. Crystal parameters are provided in Table I. The crystals were coated with a layer of viscous oil (1:1 mixture of Paratone-N, Exxon, and mineral oil), and flashed-cooled at 100°K in liquid propane cooled by liquid nitrogen. Multi-wavelength Amomalous Diffraction (MAD) data, exploiting the absorption edge of Se were collected on the Industrial Macromolecular Crystallography Association–Collaborative Access Team (IMCA-CAT) 17-ID beamline at the Advanced Photon Source (Argonne National Laboratory, Argonne, IL). The beamline was equipped with a MAR charge-coupled device (CCD) detector. Data processing was carried out with the use of the program HKL9 (Table I).

Table I. Data Collection and Phasing Statistics
  • a

    For phasing, Friedel pairs were treated as independent reflections.

  • b

    The values in parentheses are for the highest resolution shell, 1.81–1.75Å.

  • c

    Rmerge = Σhkl [(Σj |Ij − <I>|)/Σj |Ij|], for equivalent reflections (Friedel pairs separated).

  • d

    Phasing power = Σj |FH|/Σj |E|, where E is the lack of closure error.

  • e

    Dispersive R = Σhkl ||FP + FH(calc) |−FPH|/Σhkl |FPHFP|, where FP corresponds to the reference data set λ2, and FPH correspond to data collected at λ1 or λ3.

  • f

    Anomalous R = Σhkl |ΔFmath image − ΔFmath image |/Σhkl | ΔFmath image |, where ΔF± is the structure factor difference between Friedel pairs.

Space groupC2221  
Cell dimension (Å)a = 44.39, b = 132.79, c = 36.22
No of molecules in the asymmetric unit1  
Se MAD data statisticsλ1λ2λ3
 Wavelength (Å)0.97890.97900.9500
 Resolution (Å)1.751.751.75
 No. observed reflections66,65073,83267,528
 No. unique reflectionsa18,56819,41618,687
 Completeness (%)b92.1 (49.8)96.7 (73.5)92.0 (45.4)
 Rmergec0.055 (0.217)0.046 (0.201)0.042 (0.233)
Phasing statistics   
 Phasing powerd1.341.67
 Dispersive Re0.910.81
 Anomalous Rf0.650.630.70
Figure of merit0.56  

The computer program SHELXD10 was used to determine the Se sites. The 109-residue molecule contains 10 methionines. Six Se sites were identified by SHELXD. With hindsight, 3 of the SeMet residues (the first and the two final methionines) are disordered, and the side chain of the fourth SeMet exhibits high crystallographic temperature factor. The remaining 6 SeMet residues correspond to the 6 Se sites identified by SHELXD.

Phase determination was carried out with the CCP4 suite of programs.11 MLPHARE12 was used to refine the positions and occupancies of the Se atoms, and to calculate phases. The phases were further improved by solvent-flattening with the program DM.13 A total of 78 of the 109 residues were traced automatically with the program ARP/wARP.14 Another 13 residues were built manually with the interactive computer graphics program ‘O’.15 The remaining 18 residues were disordered.

Refinement was carried out with the use of the computer program suite CNS,16 following simulated annealing molecular dynamics cycles with alternating cycles of positional and individual temperature factor refinement. Water molecules were added to the model accounting for difference Fourier peaks with density ≥3σ (Table II). Structure analysis was carried out with a set of computer programs: PROCHECK for analysis of geometry,17 MOLSCRIPT18 and RASTER3D19, 20 for depiction of structure, and GRASP21 for depiction of the molecular surface.

Table II. Refinement Statistics
  • a

    The values in parentheses are for the highest resolution shell.

  • b

    Rcryst = Σhkl || Fo | − | Fc ||/Σhkl | Fo |, where Fo and Fc are the observed and calculated structure factors, respectively.

  • c

    Rfree is computed from 1,093 reflections that were randomly selected and omitted from the refinement.

Resolution (Å)20.0–1.75
Wavelength (Å)0.9789
Unique reflections F ≥ 2 σ (F)10,681
Completeness (%)a95.1
Number of protein atoms703
Number of H2O162
Rcrystb0.185 (0.353)
Rfreec0.274 (0.331)
RMS deviation from ideal geometry 
 Bond length0.020Å
 Bond angle1.8°
Ramachandran plot (%) 
 Most favored91.9
 Generously allowed1.6

Nitrocellulose filter binding assay: A nitrocellulose filter binding assay was carried out by incubating various amounts of HI0442 at 37°C for 10 min in 25 μL buffer [20 mM HEPES-NaOH, (pH 7.5), 10 mM MgCl2, 2 mM DTT, 100 μg/mL bovine serum albumin (BSA)] containing 50 fmol of 5′ 32P-labeled oligonucleotide (3000–5000 cpm/fmol) in the presence or absence of 1 mM adenosine triphosphate (ATP). The primers used for the assays were either a 110-mer Hel-4, 5′-CGCGCGGGCTCGTTTTACAACGTCGTGACTGGGCACTTGATCGGTTGGCC(dT)60-3′, or a partial duplex made by the annealing of a 100-mer M13-2, 5′-CTGCCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGC(TTTG)15-3′ to a 80-mer M13-2C, 5′-(GTTT)10GCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAG-3′.22 After incubation, the mixture was filtered through an alkaline-washed nitrocellulose filter (Millipore, HA 0.45 μm)23 and washed with 20 mM Tris-HCl (pH 7.5). The radioactivity adsorbed to the filter was measured by liquid scintillation counting.

Results and Discussion.

The structure: The 1.75Å resolution SeMet-containing HI0442 model comprises residues 7–98 of the 109 residues. Residues 1–6 and 99–109 are disordered, as well as the 3 residues (GlySerHis) that were introduced to enable thrombin cleavage of the His-tag. The side chains of SeMet17, Arg54, Phe77, and Arg82 are modeled with two alternate conformations, each with half occupancy. A total of 162 water molecules are included in the model.

HI0442 forms a tightly associated dimer (through a crystallographic two-fold symmetry axis), consistent with sedimentation equilibrium experiments. Each monomer folds into a α+β structure with the topology α1/β1/β2/β3/α2 [Fig. 1(A)]. The β2-strand follows directly the α1 helix without a connecting loop. A β-bulge occurs at Arg53-Arg54 of β3, and a short helix [labeled α′ in Fig. 1(A)] connects β3 and α2 at the top of the molecule. The fold is novel, as assessed by running DALI24 and VAST.25 We note that Bonneau and colleagues26 predicted this novel fold during the CASP4 experiment27 with the exact topology, and a Cα atom root-mean-square deviation of 5.0 Å.

Figure 1.

Crystal structure of HI0442. (A) Ribbon diagram of the dimer, with α-helices colored in blue and β-strands in yellow. (B) Molecular surface of the dimer's globular domain, highlighting the internal phenylalanine residues that adopt two mutually exclusive alternate conformations. The molecule is rotated along the vertical axis by approximately 90° relative to the orientation shown in (A). For each monomer, the two alternate conformations of Phe77 are shown in red and blue. Conformations of two symmetry-related phenylalnines colored red and blue clash with one another and cannot be adopted concomitantly. The yellow β-strands below Phe77 correspond to the yellow β-sheet shown in (A). The green β-sheet of the second molecule hinders the dimer interior and was removed for clarity. (C) Surface representation of the dimer, depicting the electrostatic potential with a color scale that varies from blue to red, representing positive and negative potential, respectively.

The dimer packing is also unusual. The six β-strands and two α2-helices in the N-terminal halves of the two chains form a barrel (with secondary structure unit order β1/β2/β3/α2/β1′/β2′β3′/α2′, where the prime sign corresponds to the second molecule), and the two helices of each monomer protrude from the globular region to form a tweezer-like shape. The maximum “tweezer” opening at the tip is approximately 22 Å, and the minimum opening close to the globular region is approximately 15Å. For comparison, the diameter of a double-stranded DNA is approximately 20Å.

The dimer interface is rich in hydrophobic residues. Most strikingly, the two symmetry-related Phe77 face one another, with their side chains adopting two alternate conformations, each with half occupancy [Fig. 2(B)]. The two conformations are mutually exclusive (i.e., the side chains of the symmetry-related phenylalanines must sample the same conformation at a time) to avoid clashes. Therefore, the center of the dimer contains empty space corresponding to the side chain conformation that is not sampled. It is tempting to speculate that this provides the dimer with some flexibility, and enables the “tweezer” to open and close.

Charge distribution: The protein surface is enriched with negatively charged residues. In particular, a cluster of negatively charged residues is located at the top of the β6α2 barrel [Fig. 2(C)]. The cluster includes Glu63, Asp64, Asp65, Glu67, Glu70, and Asp71. Of these, the most conserved residues within the YbaB sequence family are Asp/Glu65, Glu/Asp67, and Asp71. The region is duplicated because of the particular mode of dimerization, spanning the entire top of the dimer. Multiple sequence alignment (58 sequences in the nonredundant sequence database at the time of writing) shows that none of the HI0442 sequence family residues is invariant. The 63–72 polypeptide segment is the most conserved, indicating that it is functionally important.

Positively charged residues are less conserved. At the lower part of the globular domain, two pairs of arginine residues—53, 54 and 82, 83—decorate the surface. Although these are not well conserved in the sequence family, there is a tendency for positively charged residues to be located in that vicinity. On α1, two glutamine residues, Gln18 and Gln25, project toward their symmetry-related pair and are mostly conserved, although, in a few sequences, they are replaced by glutamic acid.

Functional implications: Clues about the function of HI0442 cannot be derived based on its fold, because the fold is novel. However, two conclusions emerge from the structural analysis: First, lack of invariant residues, and lack of functional groups arranged to form catalytic machinery, indicate that HI0442 is not an enzyme. Second, the protein lacks a prominent region of positively charged residues indicative of DNA binding.

The ability of HI0442 to bind DNA was examined despite the electrostatic properties of the surface, because recR and HI0442 are cotranscribed, and recR is involved in DNA replication recovery following DNA damage. The nitrocellulose filter-binding assay showed that the protein does not bind DNA in a nonspecific manner. This finding, however, does not exclude the possibility of binding to an unknown specific DNA sequence and/or structure (e.g., Holliday junction).

The striking clustering of negatively charged groups at the top of the HI0442 dimer is expected to be present in all the YbaB family members. Therefore, the cluster is likely to play a functional role. We propose that this region may mimic the surface of DNA, competing for binding to a DNA-binding protein involved in the DNA replication fork repair process. Such activity would be consistent with a regulatory function. Because very little is known about the structure and function of the various components of the rec/uvr system, it is currently impossible to suggest a more detailed function.


We thank John Moult and Eugene Melamud for the use of, and help with, their bioinformatics website. We thank the staff of IMCA-CAT at the Advanced Photon Source for help during data collection. The IMCA-CAT facility is supported by the companies of the Industrial Macromolecular Crystallographic Association, through a contract with IIT.