Crystal structure of the putative RNA methyltransferase PH1948 from Pyrococcus horikoshii, in complex with the copurified S-adenosyl-L-homocysteine



Methyltransferases (MTases) constitute a large family of enzymes that methylate the carbon, oxygen, or nitrogen atoms of their substrates, including DNA, RNA, proteins, phospholipids, and small molecules, thereby exerting important functions.1, 2 These functions include biosynthesis, detoxification, signal transduction, protein sorting and repair, chromatin regulation, and gene silencing.3 The structures of more than 50 MTases have been solved by X-ray crystallography and NMR. Of these structures, seven belong to the RNA MTases4 that act on RNA, and thus are directly related to gene transcription and expression. For example, methylation of the 5′-terminal cap plays an important role in messenger RNA (mRNA) export from the nucleus, efficient translation, and protection of the integrity of mRNA.5

The classical biochemical approach to identify enzymes is the fractionation of cell extracts and purification of enzymes based on activity. This approach is clearly limited to enzymes with rather high specific activity and known substrates. For RNA MTases, the lack of appropriate substrates, instability, or low levels of activity of the enzymes, and possible redundancy of their functions in vivo have hampered their identification and functional characterization. Fourteen methylations of 23S ribosomal RNA (rRNA) are known in Escherichia coli, but few of the methylating activities have been described to the level of the partial purification, and to date only one has been well characterized.6 Structural analysis of RNA MTases, either free or in a complex with the target RNA, will assist in identifying the biochemical and cellular functions of the unknown RNA MTases.3

Pyrococcus horikoshi protein PH1948 is a putative RNA MTase. It was classified as belonging to Cluster of Orthologous Groups 2263 (COGs2263), the 15 members of which are found only in Archaea and cover all the Archaea species in the COG database. A National Center for Biotechnology Information (NCBI) conserved domain search indicated that PH1948 contains a highly conserved domain found in large sets of MTases, such as cytosine-C5 DNA MTases, rRNA adenine dimethylases, ribosomal protein L11 MTases, O-MTases, and so on. The substrate and methyl donor for the methylation of COGs2263 members have not been characterized. Here, we report the crystal structure of PH1948 in complex with the copurified binding factor, S-adenosyl-L-homocysteine (SAH), at 2.2 Å resolution. In addition, a structural comparison with RNA MTases ErmC′ is also discussed.

Materials and Methods.

Recombinant PH1948 protein was expressed in E. coli strain B834 (DE3) by addition of 1 mM isopropylthio-β-D-galactoside (IPTG) to Luria–Bertani (LB) broth (310 K) at OD600 (optical density) of about 0.6. After induction for 5 h, the cells were harvested and disrupted with a French Press. Highly pure PH1948 protein was prepared in three steps: The target protein was captured with a HiTrap SP column (Amersham Biosciences Inc., Piscataway, NJ), then purified further with HiLoad 26/60 Superdex 75pg (Amersham Biosciences) chromatography, and finally passed through a Resource S column (Amersham Biosciences). The target peak of the last chromatography was collected and dialyzed overnight against 10 mM Tris-HCl buffer (pH 9.0), then concentrated to 5 mg mL−1. PH1948 protein was confirmed by matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry. For production of selenomethionine (Se-Met)-substituted PH1948 (Se-Met PH1948), cells were cultured in minimal medium containing Se-Met. The procedure for purification of Se-Met PH1948 was the same as that of the native protein.

The crystallization conditions for PH1948 consisted of 0.1 MN-2-hydroxyethylpiperazine-N′-2-ethanesulfonic acid (HEPES)-Na, pH 7.2, 28% polyethylene glycol (PEG400), and 0.1 M CaCl2. Crystals of Se-Met PH1948 with dimensions of 0.35 × 0.25 × 0.1 mm were obtained under the same conditions at 20°C within 2 weeks. Two diffraction data sets [single-wavelength anomalous diffraction (SAD) data and refinement data] were collected under 100 K at BL44B2 of SPring-8 (Hyogo, Japan) and BL5A of PF (Tsukuba, Japan), respectively. All data were processed using the program HKL2000.7 The structure was determined at a resolution of 2.7 Å by the selenomethionyl SAD method.8 The sites of 8 Se atoms were found and used for phase calculation by SOLVE.9 After phase improvement with noncrystallographic symmetry (NCS)-averaging by density modification (DM),10 the initial model was built automatically to 85.5% by ARP/wARP.11 The refinement at 2.2 Å resolution was carried out semiautomatically by LAFIRE12 with CNS,13 and NCS restraint was applied for the main-chain throughout refinement.

Results and Discussion.

The final model of PH1948 includes 799 of 828 residues constituting four protein molecules (one molecule contains 207 residues), four cofactors, and 312 water molecules in an asymmetric unit [Protein Data Bank (PDB) code: 1WY7). Due to poor electron density, N-terminal residues 1–3 and loop residues 181–185, and C-terminal residues 205–207 of molecule A, residue 1, loop residues 183–187, C-terminal residues 205–207 of molecule B, loop residues 181–186 of molecule C, and C-terminal residues 205–207 of molecule D could not be built. Except for the N-terminus, C-terminus (1–3, 205–207 residues), and flexible loop (179–190), the structures of four copies in an asymmetric unit are very similar, with an average root-mean-square deviation (RMSD) of 0.105 Å for the main-chain.

Figure 1(a) shows a monomer (Molecule D) of the protein PH1948–cofactor complex. PH1948 has a single domain composed of seven α-helices and seven β-strands (a MTase fold and an N-terminal α-helix α N). The seven-stranded β-sheet is assembled in the order β3-β2-β1-β4-β5-β7-β6, with β7 antiparallel to the other six parallel strands. With the β-strands flanked by three α-helices on one side (αZ, αA, αB) and three α-helices on the other (αC–αE), the core structure is formed with the typical features of the S-adenosyl-L-methionine (SAM)-dependent MTase fold.2

Figure 1.

PH1948 in complex with SAH and the omit map of SAH. (a) The protein is shown as a ribbon representation, and SAH as a stick model. The secondary structural elements are labeled β1-β7, αA–αE, and αZ, consistent with standard nomenclature among MTases except for an additional helix, αN, in the N-terminus. (b) A ball-and-stick model of SAH superimposed with the omit map. The map was calculated with coefficients FoFc contoured at 2.5σ calculated without SAH.

As the concentrations of the predominant methyl donor, SAM, in E. coli were reported to be 300–500 μM, copurification and crystallization of bound SAM or SAH have been found in some MTases.2, 14, 15 In the present structure of PH1948, the electron density map of FoFc indicated clearly that each of the four molecules in the asymmetric unit contained one SAH molecule, although no cofactors were added for purification and crystallization [Fig. 1(b)]. The presence of SAH was also confirmed by refining PH1948 with SAH and SAM, respectively. The bound SAH was presumably derived from methyl transfer of SAM during the growth of the E. coli cells, and the MTase PH1948 must have retained the cofactor throughout the experimental procedures.

The interactions of PH1948 with SAH are shown in Figure 2(a). Motif 2, a glycine-rich region between β1 and αA, Motif 3, an acid loop between β2 and αB with the sequence EVD, and Motif 4 (DV) prior to αC interact directly with SAH in a mode common to the SAM-dependent MTases2, 3 (common motifs I–III in MTases). Motif 1, located in the loop between αN and αZ, is highly conserved in the members of COGs2263. The O atom of Glu26 and NE of Gln27 interact with sulfur and the O2′ hydroxyl of SAH. Motif 5, with the highly conserved sequence NPPF following the strand β4, is the catalytic site for members of the PH1948 family. Apart from Pro121 making van der Waals contact with the adenine ring, there is no direct interaction between Motif 5 and SAH. Remarkably, the binding of SAH was enhanced by van der Waals contacts with Leu25, Val80, Val106, and Phe133. Especially, the ring of the conserved residue, Phe133, is stacked to the adenine ring with a distance of about 3.4 Å. Phe18, Val62, and Leu63 are also involved in formation of the binding pocket. With these interactions, SAH is held deeply within the binding pocket of PH1948, which results in copurification of the complex of PH1948 and SAH.

Figure 2.

Protein PH1948–SAH complex and structural comparison with ErmC′. (a) Stick model showing the interaction of PH1948 and SAH. SAH is shown in green and the important protein residues in gray, labeled with the single-letter abbreviations. Oxygen atoms and water are shown in red, nitrogen atoms in blue, and sulfur atoms in brown. The interactions among protein, SAH, and water are indicated by dashed lines. (b) Structural comparison of PH1948 with the N-terminal catalytic domain of ErmC′. The two structures are shown as ribbons, with the similar fold in gray, and varying parts in red (PH1948) and blue (ErmC′). The residues in the catalytic site are shown as sticks, and the cofactor SAH as lines. The sequences of the peptide segments possibly involved in substrate binding are compared: identity (*), strong similarity (:), and weak similarity (.), respectively.

The results of DALI16 searches indicated that the three-dimensional (3D) structure of PH1948 is very similar to the putative RNA MTase, Rv2118c (PDB17 code: 1I9G), the putative RNA (guanine-N2) MTase, Mj0882 (PDB18 code: 1DUS), and the rRNA (adenine-N6) MTase ErmC′ (PDB19 code: 1QAN) with Z-scores of 18.4, 17.5, and 14.7, respectively. Among these RNA MTases, ErmC′ is relatively well characterized, although the structure with bound substrate has not been determined.19, 20 Figure 2(b) shows that the proteins both bind SAH in similar pockets, and the active site is superposed well with the consensus N[P/I]P[Y/F] except for the orientations of the aromatic rings of Phe122 and Tyr104. The aromatic ring can be rotated easily to make a face-to-face π-stacking interaction with the target base upon substrate binding.18 The motif [D/N/S][P/I]P[Y/F] is a very common mode of substrate binding employed by N-MTases and is not nucleotide-specific but is used for nitrogens conjugated to a planar system (e.g., an amide moiety after nucleophilic attack).4 In the active pocket of ErmC′, the loop (158–171) connecting β6 and β7 plays an important role in binding and orienting the target adenine, and the N-terminal loop (10–12) is involved in binding the cofactor and fitting the orientation of the cofactor and target base.19, 20 In the case of PH1948, similar residues are present at the corresponding positions (residues 179–191 and 26–28; Fig. 2(b), bottom). The two loops of PH1948 may employ the same function as that of ErmC′. Especially, residues 26–28 are completely conserved among PH1948 family members, suggesting that this region is likely related to the substrate specificity, and the side-chain of Tyr28 would experience a conformational change to fit the target upon binding. ErmC′ has an additional C-terminal domain to bind RNA.19 Unlike ErmC′, the long, antiparallel β6 and β7 sheets in PH1948 protrude out the seven-stranded sheet, and the large loop 179–191 is twisted, with the top deviating away from loop 14–32, resulting in the formation of an active crevice. Combined with the N-terminal helix αN and αZ, a groove filled with positively charged residues is formed, which is predicted to bind RNA (electrostatic potential surface not shown). Upon RNA binding, the flexible and conserved loop 179–191 of PH1948 may undergo significant motion to bind and orient the target RNA base, similar to the motion of ErmC′ loop 158–171.20 Our structural findings and comparison with the structure of rRNA (adenine-N6) MTase ErmC′ demonstrated that PH1948 is a SAM-dependent nitrogen-MTase, with a positively charged groove in its N-terminus as the binding site for the most likely RNA substrate.I

Table I. Data Collection and Refinement Statistics
  • Values in parentheses refer to the highest resolution shell.

  • a

    Rmerge = ΣhΣj|〈IhIhj|/ΣhΣjIhj, where 〈Ih is the mean intensity of symmetry-equivalent reflections.

BeamlineBL44B2 (SPring-8)NW12 (PF)
Wavelength (Å)0.97931.00
Space groupC2C2
Unit cell parameters (Å, °)a = 205.5, b = 43.3, c = 118.5 β = 92.2a = 207.0, b = 43.1, c = 118.2 β = 92.1
Resolution (Å)39.86–2.7 (2.8–2.7)39.86–2.20 (2.28–2.20)
Unique reflections28,28653,514
Completeness (%)96.9 (78.4)99.3 (94.9)
Redundancy6.5 (5.1)5.0 (4.1)
I/σ (I)13.1 (2.5)25.2 (5.6)
Rmergea (%)8.6 (34.9)4.6 (21.8)
 R/R-free (%) 24.2/26.8
 RMSD bond length/angle (Å/°) 0.011/1.4
Average B factor (Å2)  
 Protein molecules 45.1
 Water molecules 37.6
 Others 44.4
Ramachandran plot (%)  
 Most favored regions 91.2
 Additionally allowed regions 8.7
 Generously allowed regions 0.1
 Disallowed regions 0


We thank Ms. A. Morita and R. Ogawa for help in protein expression and purification. We also thank the staffs of beamline BL44B2, SPring-8 and beamline NW12, Photon Factory, Japan, for their kind help with data collection.