Crystal structure of the hypothetical protein TTHA1013 from Thermus thermophilus HB8

Authors


Introduction.

Recently, genome projects have identified many hypothetical proteins, and the structures of these hypothetical proteins have been determined by structural genomics projects. One of the main objectives of structural genomics is to reveal novel protein folds. The open reading frame of TTHA1013, consisting of 73 amino acid residues, was annotated as a hypothetical protein from Thermus thermophilus HB8. A BLAST database search1 revealed that TTHA1013 is similar to two bacterial homologs, with relatively high E values (e−6 for TDE0453, and 3e−6 for RPA4344) [Fig. 1(A)]. The Thermus thermophilus HB27 genome2 also contains a DNA region corresponding to that of TTHA1013, but the region was not assigned to a coding region. Such a weakly conserved protein may have a novel fold. In fact, the ProTarget3 program (http://www.protarget.cs.huji.ac.il), which offers a method for the prediction of novel folds, predicted that TTHA1013 has a novel fold (p-value = 1). Thus, we determined the crystal structure of TTHA1013 at 2.2 Å resolution, and actually found a novel fold.

Figure 1.

(A) Sequence comparison between TTHA1013 and its homologs, constructed by using BLAST1 followed by CLUSTAL W.16 TDE0453 is from Treponema denticola ATCC 35405, and RPA4344 is from Rhodopseudomonas palustris CGA009. Strictly conserved and similar residues are represented within a red box and by a red letter, respectively. The figure was generated with ESpript 2.0.17 (B) Ribbon representation of the TTHA1013 monomer (stereo view). The α-helix and β-strands are colored red and green, respectively. (C) Ribbon representation of the TTHA1013 dimer (stereo view). The two subunits are colored red and blue, respectively. These figures were generated with the graphics program CueMol (http://cuemol.sourceforge.jp/en/).

Materials and Methods.

Cloning, expression, and purification.

The open reading frame of TTHA1013 from Thermus thermophilus HB8 was cloned into the pET11a expression vector (Novagen). SeMet-substituted proteins were produced in Escherichia coli Rosetta834 (DE3). The cell lysate was heated at 70°C for 10 min. The soluble fraction was applied to a Resource-PHE column (Amersham Biosciences) previously equilibrated with 50 mM sodium phosphate buffer (pH 7.0) containing 1.5 M ammonium sulfate. The protein was eluted by using a linear gradient of 1.5 to 0 M ammonium sulfate. The fractions containing TTHA1013 were collected and dialyzed against 20 mM Tris-HCl buffer (pH 8.0). The dialyzed protein was applied to a Resource Q column (Amersham Biosciences) equilibrated with 20 mM Tris-HCl buffer (pH 7.0). The protein was eluted by using a linear gradient of 0 to 0.5 M NaCl. The target fractions were collected and dialyzed against 10 mM sodium phosphate buffer (pH 7.0). Next, the dialyzed protein was applied to a Hydroxyapatite CHT5 column (Bio-Rad) equilibrated with 10 mM sodium phosphate buffer (pH 7.0). The protein was eluted by using a linear gradient of 10 to 200 mM sodium phosphate. The fractions containing TTHA1013 were collected and dialyzed against 20 mM Tris-HCl buffer (pH 8.0) containing 150 mM NaCl. The dialyzed protein was applied to a HiLoad 16/60 Superdex 75 column (Amersham Biosciences) equilibrated with 20 mM Tris-HCl buffer (pH 8.0) containing 150 mM NaCl, and was eluted with this buffer. Finally, the eluted protein was applied to a HiPrep 26/10 desalting column (Amersham Biosciences) equilibrated with 20 mM Tris-HCl buffer (pH 8.0), and was eluted with this buffer. The purified protein was concentrated to 11.7 mg/mL using a Vivaspin 20 concentrator (3000MWCO) (Sartorius). The yield of SeMet-substituted TTHA1013 was 17.5 mg from 15.0 g of wet cells.

Crystallization and data collection.

Crystals of SeMet-substituted TTHA1013 were grown at 20°C by the hanging-drop vapor diffusion method. The mother liquor was 100 mM sodium acetate buffer (pH 4.6), containing 200 mM ammonium sulfate and 10% polyethylene glycol 4000. Drops consisted of 1 μL of protein solution with an equal volume of the mother liquor. The crystal was cryo-cooled to −173°C in the mother liquor with 30% glycerol. The crystal belongs to the space group P6422, with unit cell dimensions a = b = 51.90 Å, c = 117.73 Å. There is one molecule per asymmetric unit.

Structure determination and refinement.

X-ray diffraction data of the SeMet-derivative crystal were obtained on beamline BL26B1 at the SPring-8 (Harima, Japan), using a Jupiter210 CCD detector (RIGAKU/MSC). Diffraction data were processed with the HKL2000 program suite.4 The crystal structure was solved by the multiwavelength anomalous dispersion (MAD) method5 using the programs SOLVE6/RESOLVE.7 Refinement was carried out using the program CNS.8 The structure was visualized and modified with the program TURBO-FRODO.9 The stereochemistry of the model was assessed by the program PROCHECK.10 The solvent-accessible surface areas were calculated with the program AREAIMOL.10 The molecular surface was created with the program GRASP.11 The final model comprises 71 amino acid residues and 38 water molecules in the asymmetric unit. The N- and C-terminal residues were not visible, due to disorder, and were excluded from the final model. Data collection, MAD phasing, and refinement statistics are summarized in Table I. The coordinates are available at the Protein Data Bank, with the accession code, 1WV8.

Table I. Summary of Data Collection and Refinement Statistics
 SeMet
PeakaEdgeaRemoteaHigh resolution
  • a

    Diffraction data were processed while keeping the bijvoets separate.

  • b

    Rsym = Σ|IiIavg|/ΣIi.

  • c

    Figure of merit after SOLVE phasing

  • dRfree is calculated for 5% of randomly selected reflections excluded from refinement.

Data collection    
 Wavelength (Å)0.97910.97930.96371
 Resolution (Å)50–2.450–2.450–2.450–2.2
 Unique reflections6,9266,97969805253
Completeness (%)    
 All data99.899.699.699.1
 last shell10097.897.899
 Rsyma (%) All data4.54.54.63.7
 last shell20.721.521.521.1
Redundancy All data15.615.615.612.3
 I/σ(I) All data1818.417.927.7
Phasing    
 FOMSOLVEb 0.52  
Refinement    
 Rwork (%)   25.9
 Rfree (%)c   27.3
 RMSD bond length (Å)   0.006
 RMSD bond angles (Å)   0.88

Results and Discussion.

The structure of TTHA1013 is shown in Figure 1(B). The overall fold of this protein consists of one α-helix and four β-strands. The secondary structure elements are as follows: β1, residues 3–11; β2, residues 16–20; β3, residues 30–31; α1, residues 33–52; β4, residues 61–69. Interactions with the symmetry-related molecules (molecules A and B) are observed with buried surface areas of ∼1200 Å2 per monomer, which is 23% of the total surface area [Fig. 1(C)]. The dimer interface is composed of the β4-strands of both subunits. As a result, an eight-stranded antiparallel β-sheet is formed. The TTHA1013 structure was compared with the previously determined structures in the PDB database, using the DALI server12 (http://www.ebi.ac.uk/dali/). However, the TTHA1013 structure was not significantly similar (Z score >4.0) to any other protein structure. The highest Z score in the DALI results was only 3.5 (Thermotoga maritima cell division protein FtsA; PDBID = 1E4F). The residues 205–224, 228–235, 305–320, 329–334, and 339–342 of FtsA yielded a DALI score of 3.5. Furthermore, the SSM server13 (http://www.ebi.ac.uk/msd-srv/ssm/), which is also an interactive service for comparing protein structures in 3D, also showed that TTHA1013 structure was not significantly similar (P score >3.0) to any other protein structure. The highest P score in the SSM results was only 1.1 (Pseudomonas sp 4-Oxalocrotonate Tautomerase; PDBID = 1OTF). In addition, the CE server14 (http://cl.sdsc.edu/) gave a similar result. The highest Z score in the CE results was only 4.1 (Pyrococcus kodakaraensis O6-methylguanine-DNA methyltransferase; PDBID = 1MGT). Thus, the structure is a novel fold, as predicted by ProTarget.15 Of the 73 amino acid residues in TTHA1013, 14 amino acid residues are negatively charged. These residues are highly conserved in the homologs [Fig. 1(A)]. The conserved residues and the electrostatic potential on the molecular surface are shown in Figure 2(A) and 2(B), respectively. On the molecular surface, the distribution of the conserved amino acid residues correlates well with that of the negative charges. The conserved surface regions may contribute to the common properties of this family. In contrast to the negatively charged amino acid residues, three of the five positively charged residues, Arg2, Lys5, and Arg63, of molecules A and B are clustered together in a region on the molecular surface [Fig. 2(B)]. These positively charged residues are not conserved in other homologs [Fig. 1(A)]. Thus, this region might bind to a protein with a negatively charged region or to nucleic acids, and this putative function would be unique to TTHA1013.

Figure 2.

Conserved residues and electrostatic potential of the TTHA1013 dimer, mapped onto its molecular surface. (A) Distribution of conserved amino acid residues of the TTHA1013 dimer. Based on the alignment shown in Figure 1(A), strictly conserved and similar residues are colored orange and yellow, respectively. (B) Electrostatic surface potential of the TTHA1013 dimer, calculated with GRASP.11 The potential displayed represents the range from −20 (red) to +20 (blue) kBT.

Acknowledgements

We thank M. Yamamoto for data collection at BL26B1 of SPring-8, and T. Kobayashi, R. Fukunaga, R. Ishii, and T. Sengoku for many helpful discussions.

Ancillary

Advertisement