Proteins from hyperthermophilic organisms are frequently selected as targets of structure determination for the purpose of understanding protein functions from a structural viewpoint. We cloned the gene of PH0734.1, which is a hypothetical protein with unknown function, from Pyrococcus horikoshii OT31 and overexpressed its protein product in E. coli to investigate its structure and function.
PH0734.1 is a protein consisting of 172 residues with a molecular mass of 19,601 Da. Analysis of its amino acid sequence showed that PH0734.1 is divided into two domains: an N-terminal DUF1947 domain found in archaeal hypothetical proteins (Pfam data base,2 Accession number: PF09183.1) and a C-terminal PUA (PseudoUridine synthase and Archaeosine transglycosylase) domain. Although the PUA domain is characterized as an RNA-binding domain and is observed in various RNA modification enzymes such as archaeosine tRNA-guanine transglycosylase (ArcTGT)3, 4 and pseudouridine 55 synthase,5 the structure and function of PH0734.1 remains unknown.
To elucidate the structural basis of PH0734.1, we determined its three-dimensional structure at 1.73 Å resolution. The crystal structure of PH0734.1 revealed that although the structure of the PUA domain of PH0734.1 is highly similar to the preexisting protein structures, the conformation of the DUF1947 domain is novel, with a unique lysine cluster being formed at its N-terminal region. DUF1947 may modulate the binding target of the PUA domain using its characteristic electropositive surface.
ArcTGT, archaeosine tRNA-guanine transglycosylase; PUA, PseudoUridine synthase and Archaeosine transglycosylase.
Cloning, expression, and purification
The PH0734.1 gene was amplified from genomic DNA of P. horikoshii by PCR. The primers used for this amplification were designed according to the start-and-stop codon regions of the PH0734.1 gene, and Nde I and Bam HI sites were added at the 5′ end of the primers containing start-and-stop codons, respectively. After digestion with Nde I and Bam HI, the amplified fragment was cloned into pET-28a(+), the T7 polymerase-based expression vector of E. coli. The plasmid was transformed into E. coli Rosetta(DE3) pLysS for protein expression.
The E. coli transformants were cultivated at 25°C in LB medium until the optical density at 600 nm reached 0.6. Protein expression was induced by the addition of IPTG to a final concentration of 0.2 mM, and the cultivation at 25°C was continued for 16 h. E. coli cells were collected by centrifugation at 5,000 × g for 10 min. The cells were resuspended in 20 mM Tris-HCl (pH 7.0), 150 mM NaCl, and 15% glycerol, and then lysed by sonication. After centrifugation at 40,000g for 30 min, the supernatant was treated at 80°C for 30 min. The supernatant of the centrifugation at 40,000g for 30 min was purified by Ni-NTA agarose (QIAGEN). His-tagged PH0734.1 was eluted by a buffer solution containing 20 mM Tris-HCl (pH 7.0), 150 mM NaCl, 5% glycerol, and 250 mM imidazole. The elution fraction was treated with thrombin and benzonase to remove N-terminal His-tag of PH0734.1 and to cleave contaminated DNA and RNA, respectively. The PH0734.1 was further purified using a cation exchange chromatography column Resource S 6 mL (GE Healthcare). The protein was eluted with a 120-mL linear gradient of 0–1M NaCl in a 20 mM Tris-HCl (pH 7.0) buffer solution. Purified protein was dialyzed against 5 mM Tris-HCl (pH 7.0) and concentrated to 12 mg/mL for crystallization.
Crystallization and data collection
Crystallization experiments of PH0734.1 were performed by the sitting drop vapor diffusion method at 20°C. Crystallization drops were made by mixing 1 μL of the protein solution [12 mg/mL in 5 mM Tris-HCl (pH 7.0)] with 1 μL of a variety of reservoir solutions. Crystals of PH0734.1 were obtained under a reservoir solution condition containing 100 mM CAPS (pH 10.5) and 30% (v/v) PEG400. Typical crystals of PH0734.1 were obtained within 3 days.
X-ray diffraction data of PH0734.1 were collected at the AR-NW12 beamline in Photon Factory (Tsukuba, Japan). All X-ray diffraction data measurements were carried out under cryogenic conditions (95 K). The crystal of PH0734.1 was diffracted to a resolution of 1.73 Å. X-ray diffraction data of PH0734.1 were integrated and scaled with the program HKL2000.6 The crystal of PH0734.1 belonged to the space group P3221, with unit-cell parameters of a = b = 52.92 Å and c = 133.31 Å. Evaluation of the Matthews coefficient7 indicated that the crystal of PH0734.1 contained one protein molecule per asymmetric unit (VM = 2.48 Å3/Da). The data collection statistics are summarized in Table I.
Table I. Data Collection, Phasing, and Refinement Statistics of PH0734.1
Values in parentheses are for the highest resolution shell.
Wave length (Å)
Unit cell (Å)
a = b = 52.92, c = 133.31
Number of observed reflections
Number of unique reflections
Number of reflections in the Rfree dataset
Number of nonhydrogen atoms
RMSD bond length (Å)
RMSD bond angle (deg.)
Ramachandran plot (%)
The crystal structure of PH0734.1 was determined by the molecular replacement method. Molecular replacement was performed by the program MOLREP8 in CCP49 following the homology structure search using the program MrBUMP.10 The best search model for the molecular replacement was the C-terminal region of a hypothetical protein Ta1423 from Thermoplasma acidophilum (1Q7H, residues: 69–153), which is predicted as a PUA domain. The initial model was automatically rebuilt and refined using the program ARP/wARP.11 After automodel building, several cycles of manual model rebuilding and refinement were performed using the programs XtalView12 and Refmac5.13 Water molecules were picked up from the Fo − Fc map on the basis of peak height and distance criteria. The geometry of the final structure was evaluated with the programs PROCHECK14 and Rampage.15 The coordinates of PH0734.1 have been deposited into the Protein Data Bank (PDB) with the accession number 3D79.
Structural analysis was carried out using a set of programs: Dali16 for the search of similar structures from the database, SURFACE (CCP4)9 for the calculation of protein surface area, Dalilite17 for the superposition of molecules, APBS18 for calculation of the electrostatic potential of the protein surface, ESpript19 for the preparation of alignment figures, and Pymol (http://pymol.sourceforge.net/) for the depiction of structure.
RESULTS AND DISCUSSION
We collected the diffraction data of PH0734.1 to a resolution of 1.73 Å because the values of Rmerge and R/Rfree got drastically worse when we included the higher resolution data, though the value of I/σI in the outermost shell was high. The structure of PH0734.1 was determined by the molecular replacement method with good stereochemistry. The final model contained one protein molecule and 182 ordered water molecules. Because of the poor electron density map, we could not build the structure of the first 4 residues (1–4) and the last 1 residue (172). R and Rfree values of the final model were 20.9% and 22.9%, respectively. In the Ramachandran plot, 97.6% of the residues were included in the favored region, and the rest of the residues were in the allowed region. The refinement statistics are summarized in Table I.
Overall structure of PH0734.1
The overall structure of PH0734.1 is shown in Figure 1(A). PH0734.1 is composed of 11 β strands, six α helices, and three 310 helices with the topology of β1-α1-α2-β2-β3-β4-β5-α3-3101-β6-3102-α4-β7-3103-β8-β9-β10- α5-β11-α6. The structure of PH0734.1 is composed of two domains: an N-terminal DUF1947 domain (5–70) and a C-terminal PUA domain (71–162). In the DUF1947 domain, the first five β strands (β1–β5) form an antiparallel β sheet and face two α helices (α1–α2) on one side. In the PUA domain, the last six β strands (β5–β11) form a barrel-like mixed β sheet and are surrounded by six helices (α3–α5, 3101–3103). The C-terminal residues of PH0734.1 (163–172) form an α helix (α6) and interact with the DUF1947 domain. The two domains are tightly connected by hydrophobic interactions and electrostatic interactions with an approximate buried surface area of 6157 Å2.
Comparison with other proteins
A database search using the Dali server16 revealed that only two protein structures were similar to the structure of PH0734.1. The closest structure was that of the function-unknown protein Ta1423 from Thermoplasma acidophilum (PDB code: 1Q7H, Z-score = 16.3, r.m.s.d. = 2.5 Å, sequence identity = 37%). In addition to this structure, domains C2 and C3 of archaeal tRNA-guanine transglycosylase (ArcTGT) from Pyrococcus horikoshii (PDB code: 1IQ8) showed high similarity (Z-score = 14.9, r.m.s.d. = 2.7 Å, sequence identity = 33%). When these proteins were compared, the structure of the DUF1947 domain of PH0734.1 was quite different from that of the other proteins, although the structures of the PUA domain showed high similarity [Fig. 1(B)]. Based on analysis of the Dali search, there are no proteins structurally similar to the DUF1947 domain of PH0734.1 (Z > 6.0). It has a novel conformation.
In the structure of ArcTGT, domains C2 and C3 (corresponding to the DUF1947 and PUA domains of PH0734.1, respectively) bind the characteristic λ-form tRNA,4 although the residues important to the recognition of tRNA are poorly conserved in PH0734.1. The DUF1947 domain of PH0734.1 possesses a unique lysine cluster at around the N-terminal region of the α1 helix and loop α2-β2. Lys11, Lys12, and Lys15 of helix α1, and Lys36, and Lys37 of loop α2-β2 form a highly electropositive protein surface [Fig. 1(C)]. Although the electropositive protein surface of the DUF1947 domain also exists in the structure of Ta1423 and ArcTGT, the electropositive residues and the location of the electropositive surfaces are not conserved among them. This finding would indicate that although PH0734.1 interacts with electronegative macromolecules such as tRNA for ArcTGT, using its electropositive surface of the DUF1947 and PUA domains, the binding target of PH0734.1 is different from that of Ta1423 and ArcTGT. DUF1947 may modulate the binding target of the PUA domain using its characteristic electropositive surface.
The synchrotron-radiation experiments were performed at the AR-NW12 beamline in the Photon Factory (Proposal No. 2003S2-002).