Crystal structure of hypothetical protein PH0642 from Pyrococcus horikoshii at 1.6Å resolution


  • Institution at which this work was performed: Division of Biological Sciences, Graduate School of Science, Hokkaido University.


The hypothetical protein PH0642 of Pyrococcus horikoshii (molecular weight 29.9 kDa) has sequence similarity with the nitrilase superfamily proteins. The amino acid sequence of PH0642 has 31% identity with the N-carbamyl-D-amino acid amidohydrolase from Agrobacterium sp. strain KNK712, which belongs to the nitrilase superfamily. On the basis of a global and structure-based sequence analysis, the members of the nitrilase superfamily are classified into 13 branches with the variation of their consensus sequence.1 PH0642 was found to belong to the 13th branch of the nitrilase superfamily by the structure-based sequence analysis. Branch 13 contains uncharacterized nitrilase-related proteins. The cellular function of PH0642 is unknown, so the aim of this study is to predict the molecular function of PH0642 with structural comparison to the structural neighbors and sequence comparison with the members of the nitrilase superfamily.

Here, we report the crystal structure of PH0642, as determined by the MAD method.

Materials and Methods.

Cloning, overexpression, purification, and crystallization.

The gene encoding the hypothetical protein PH0642 was amplified using the PCR method by Pfu Turbo DNA Polymerase with primers having Nde I and Sal I restriction sites. The sequences of the primers were 5′ -GGGAATTCCATATGGTAAAGGTTGGCTACATTC- 3′ (the underlined part indicates the restriction site of Nde I) and 5′ -ACGCGTCGACTCACCTGAAGTAATATTCCTCCC- 3′ (the underlined part indicates the restriction site of Sal I). The amplified DNA was digested by the restriction endonucleases Nde I and Sal I and inserted into the Nde I-Sal I site of an expression plasmid pET-22b(+).

Escherichia coli strain B834(DE3) was transformed with the constructed vector pET-22b(+)/PH0642. The cells were grown at 37°C in 2L LB medium containing 50 μg/mL ampicillin. The expression of PH0642 was induced by 1 mM IPTG. The cells were harvested by centrifugation at 4,000g for 15 min at 4°C and resuspended in STE buffer (50 mM Tris-HCl, pH8.0, 1 mM EDTA, 50 mM NaCl). The cells were disrupted by a French Press. The homogenate was clarified by centrifugation at 40,000g for 30 min at 4°C. The supernatant of the cell extract was incubated for 30 min at 70°C. The protein solution was filtrated with a 0.22-μm filter and applied to a HiTrap Q-XL column (Amersham Biosciences Corp., Arlington Heights, IL), which had been equilibrated with buffer A (20 mM Tris-HCl, pH 8.0, 1 mM EDTA). After the wash with buffer A, the bounded protein was eluted by a linear gradient (0.05–1.0M) of NaCl in 100 mL buffer. The fractions containing PH0642 were pooled and concentrated to 10 mL, and loaded to a HiLoad 26/60 Superdex200pg column (Amersham Biosciences Corp.), which had been equilibrated with buffer B (20 mM Tris-HCl, pH 8.0, 1 mM EDTA, 200 mM NaCl). The protein eluted as a single peak. The fractions containing PH0642 were pooled and dialyzed against buffer C (20 mM Tris-HCl, pH 8.0), and concentrated by ultrafiltration using Apollo (Orbital Biosciences) to a final concentration of 10 mg/mL. The purity of the protein was analyzed by MALDI-TOF Mass Voyager DE-Pro (Applied Biosystems, Foster City, CA).

All crystallization experiments were performed using the hanging-drop vapor diffusion method in CrystalClear strips (Hampton Research) at 293 K. The initial crystallization trials were carried out using reservoirs consisting of 100 μL of Hampton Research Crystal Screen and Wizard (Emerald BioStructures) and drops containing 1 μL of the reservoir solution and 1 μL of protein solution. Further trials optimized these conditions, and improved crystals were obtained with 0.1 M Na-Acetate, pH 4.1, 45 % MPD.

Crystallization of selenomethionyl substituted derivative and MAD data collection.

Selenomethionyl recombinant PH0642 was prepared from methionine auxotroph E. coli cells BL21-CodonPlus(DE3)-RIL-X (Stratagene, La Jolla, CA) transformed with pET-22b(+)/PH0642 plasmid. Se-Met PH0642 crystals were grown by the hanging-drop vapor diffusion method from a solution containing 10 mg/mL protein, 50 mM Na-Acetate, pH 4.1, and 22.5% MPD equilibrated against 100 mM Na-Acetate, pH 4.1, and 45% MPD. A 10 μL drop consisting of equal volumes of the reservoir solution and of a protein solution was kept at 20°C.

MAD data from recombinant selenomethionyl PH0642 were collected from a single crystal using a Quantum4R detector at the beamline BL18B of the Photon Factory, Japan. A flash-frozen technique at 100K was applied for data collection. Two energies were chosen near the absorption edge of the selenium atom based on the fluorescence spectrum: 12.666 keV (λ = 0.9789 Å) and 12.663 keV (λ = 0.9791 Å) to give the maximum f” and minimum f', respectively. The third energy was selected at 12.654 keV (λ = 0.9798 Å) as a remote point. The Se-Met crystal of PH0642 diffracted to 1.6 Å resolution and belonged to space group P21 with unit cell parameters a = 75.0Å, b = 89.0Å, c = 77.9Å, and β = 96.1°. Four molecules of PH0642 were in the asymmetric unit (VM = 2.2 Å3/Da). The MAD diffraction data were integrated and scaled using the MOSFLM2 and SCALA3 programs.

Structure determination and refinement.

The structure of PH0642 was determined using the MAD method. Twenty-five selenium sites out of a total of 28 were located using the SOLVE program.4 The best experimental phases for the 25 selenium sites were calculated by SHARP.5 After phasing, density modification with solvent flattening and automatic model-building were achieved with the program RESOLVE.6 A model consisting of 77% of the structure of PH0642 was built automatically. Regions not constructed were manually built using O.7

Molecular dynamics refinement was performed and water molecules were located automatically using the CNS program.8 During refinement, 10% of the reflection data were set aside for the calculation of the free R-factor to monitor the refinement, and a default bulk solvent model was used with maximum likelihood targets. The final model consisted of 262 × 4 residues, 787 water molecules, and 4 acetate molecules with an Rcryst of 15.2% and an Rfree of 19.2%. The model quality was checked using PROCHECK.9 The atomic coordinates of PH0642 have been deposited in the Protein Data Bank as 1J31.


The structure of PH0642 was determined to 1.6Å resolution by the MAD method using selenomethionine-substituted proteins. Details on the data collection, phasing, and refinement are summarized in Table I. The final model contains all residues of PH0642.

Table I. X-ray Data Collection and Refinement Statistics
  • Values within parentheses are for the highest resolution shell (1.69–1.60 Å). Values within parentheses are for the highest resolution shell (2.42–2.30 Å). Rmeas = Σ[m/(m − 1)]1/2Σj|〈IhIhj|/ΣhΣjIhj, where 〈Ih is the mean intensity of symmetry-equivalent reflections. Rlambda = ΣΣj|FλjFλ0|/Σj|Fλ0|, where Fλj is the structure factor amplitude of the data collected at λj, and Fλ0 is the structure factor amplitude collected at 0.9798 Å. Rcryst = Σ|FobsFcal|/ΣFobs, where Fobs and Fcal are observed and calculated structure factor amplitudes. Rfree was calculated for Rcryst, using only an unrefined subset of reflections data (10%).

X-ray data collection   
 Wavelength (Å)0.97890.97910.9798
 Resolution (Å)23.3–1.623.3–1.623.3–1.6
 Observed reflections513,314 (74,153)510,836 (73,521)601,635 (83,595)
 Independent reflections133,408 (19,425)133,056 (19,390)133,189 (19,379)
 Completeness (%)99.6 (99.6)99.6 (99.7)99.6 (99.6)
 Multiplicity3.8 (3.8)3.8 (3.8)4.5 (4.3)
 Averaged I/σ(I)6.8 (1.9)6.5 (2.5)6.4 (1.6)
 Rmeas0.088 (0.349)0.088 (0.352)0.096 (0.437)
 Rlambda0.045 (0.089)0.047 (0.088)
Refinement data   
 Rcryst  15.2
 Rfree  19.2
 Number of protein atoms  8412
 Number of water molecules  787
 Number of other molecules  4
 RMSD bond lengths (Å)  0.015
 RMSD bond angles (°)  1.76

PH0642 was a dimer in the crystal, and also in solution from the results of gel filtration chromatography. The overall fold of the monomeric structure consists of a four-layered α-β-β-α sandwich core structure [Fig. 1(A)]. The arrangements of the two β sheets are in the order β11-β1-β2-β3-β4-β5 and β6-β7-β8-β9-β10. The three central strands of each β sheet (β1-β2-β3 and β7-β8-β9) form a parallel β sheet, and the other strands are arranged in an antiparallel configuration. The interface between the two β sheets is tightly packed with hydrophobic residues. These two β sheets are further sandwiched by two bundles of α-helices, each consisting of the helices (α1, α2 and α3) and (α5, α6 and α7), respectively.

Figure 1.

A: Ribbon diagram of a monomeric structure of PH0642 from Pyrococcus horikoshii. Strands are shown in blue and helices in green. The surface represents the solvent-accessible surface of a monomer (1.4Å probe radius). The acetate molecule that is bound for PH0642 is shown in CPK. B: Ribbon diagram of a dimeric structure of PH0642. The dimer is composed of an eight-layered α-β-β-α-α-β-β-α sandwich structure. A non-crystallographic diad axis is normal to the plane of the diagram, and passes between the two monomers. C: Sequence alignment of PH0642 homologs identified during a PSI-BLAST17 search. PAB1449: Pyrococcus abyssi (pir|C75051); E95106: Streptococcus pneumoniae (pir|E95106); G83608: Pseudomonas aeruginosa (pir|G83608); F75263: Deinococcus radiodurans (pir|F75263); AB0115: Yersinia pestis (pir|AB0115). The secondary structure elements of PH0642 are shown above the aligned sequence. The residues highlighted in orange represent complete conservation and those in yellow are conservative mutation. The residues that belong to the catalytic triad are marked by black arrowheads. D: Close-up view of the putative active site pocket. The electron density of the bound acetate molecule is shown as a mesh. This map is calculated at 1.6Å and contoured to 3.0 σ. The main chain atoms from N171 to A177 are shown as sticks. The two water molecules that were considered to be involved in the hydrolysis reaction are shown as red spheres.

The dimer related with the non-crystallographic diad axis makes an eight-layered α- β- β- α- α- β- β- α sandwich structure [Fig. 1(B)]. The C-terminal helix (α9) of each monomer is contacted to another monomer surface via hydrogen bonds, and the α-helix (α7) of each monomer interacts with each other via hydrogen bonds.

A DALI structure similarity search10 revealed that PH0642 is similar to N-carbamyl-D-amino acid amidohydrolase (DCase) from Agrobacterium sp. strain KNK712 (PDB ID, 1ERZ; Z score, 28.4; rmsd, 2.1Å for 235 Cα atoms),11 the putative CN hydrolase from Saccharomyces cerevisiae (PDB ID, 1F89; Z score, 32.7; rmsd, 1.8Å for 247 Cα atoms),12 and the N-terminal domain of the Nit-fragile histidine triad fusion protein (NitFhit protein) from Caenorhabditis elegans (PDB ID, 1EMS; Z score, 29.7; rmsd, 2.1Å for 235 Cα atoms).13 By the structural classification with SCOP, all these structural neighbors belong to the Nitrilase/N-carbamoyl-D-amino acid amidohydrolase superfamily. This superfamily also contains N-carbamyl-D-amino acid amidohydrolase from Agrobacterium radiobacter (PDB ID, 1FO6; Z score, 33.2; rmsd, 1.9 Å for 258 Cα atoms).14


The nitrilase superfamily members are non-peptide carbon-nitrogen hydrolases. There are a wide variety of carbon-nitrogen hydrolysis reactions. All of these reactions involve the attack of the cyano or carbonyl carbon by conserved cysteine residue.15, 16 The SH group of this cysteine residue forms acylenzyme with a substrate as reaction intermediate. This acylenzyme is then hydrolyzed by water molecules. The cysteine, glutamate, and lysine residues are involved in the hydrolysis of the carbon-nitrogen bond. These residues are conserved among all members of the nitrilase superfamily, and are called the Glu-Lys-Cys catalytic triad.1

In PH0642, the conserved Glu-Lys-Cys catalytic triad is comprised of Glu42, Lys113, and Cys146 [Fig. 1(C)]. The coordination of these conserved residues was similar to that of three structural neighbors. In the crystal structure of PH0642, these residues were located around the deep pocket that is accessible from the molecular surface.

The reaction mechanism of PH0642 might be similar to the one reported.15 At the first step of the reaction, Cys146 may attack the cyano or carboxyl carbon as a nucleophile. After this nucleophilic attack, a substrate and Cys146 form a tetrahedral intermediate as acylenzyme. This acylenzyme is then hydrolyzed by water molecules (possibly WAT213). WAT213, which is considered to hydrolyze the acylenzyme, forms hydrogen bonds with the Sγ atom of Cys146 at 3.02Å and the Nϵ atom of Lys113 at 2.76Å. Thus, the Sγ atom of Cys146 and the Nϵ atom of Lys113 interact via the hydrogen bond network. The distance between the Sγ atom of Cys146 and the Oϵ2 atom of Glu42 is 3.25Å, suggesting that the deprotonation of the SH group of Cys146 by Glu42 activates Cys146, and that the activated Cys146 attacks as a nucleophile.

An acetate molecule that was contained in the crystallization buffer was situated at the bottom of the putative active site pocket [Fig. 1(D)]. The Oϵ1 atom of this acetic acid forms a hydrogen bond with the indole ring N1 atom of Trp149 at 2.80Å, and also with the main chain N atom of Ala177 via a water molecule, and the Oϵ2 atom forms hydrogen bonds with two main chain N atoms of Val173 at 2.72Å and Met174 at 2.73Å.


We thank S. Wakatsuki, M. Suzuki, and N. Igarashi for their kind help with the data collection on beamline BL18B of the Photon Factory, Japan. We also thank T. Nakai and H. Nanba of Kaneka Corporation, Japan, and T. Nagasawa of Gifu University, Japan, for their kind help with the enzymatic activity assay. This work was supported by a research grant from National Project on Protein Structural and Functional Analyses from the Ministry of Education, Culture, Sports, Science, and Technology of Japan.