Crystal structure of phosphoribosylformylglycinamidine synthase II (smPurL) from Thermotoga maritima at 2.15 Å resolution



The TM1246 gene (smPurL) of Thermotoga maritima encodes a phosphoribosylformylglycinamidine synthase II (FGAM; EC with a molecular weight of 65,834 Da (residues 1–603) and a calculated isoelectric point of 5.38. This enzyme is part of the de novo purine biosynthesis subsystem, where it forms a complex with purS (TM1244) and purQ (TM1245) to form formylglycinamide ribonucleotide amidotransferase (FGAR-AT). FGAR-AT catalyzes the adenosine 5′-triphosphate (ATP)-dependent conversion of FGAR and glutamine to formylglycinamidine ribonucleotide (FGAM), adenosine 5′-diphosphate (ADP), Pi, and glutamate in the fourth step of the purine biosynthetic pathway (EC–3 In Gram-positive bacteria, archaebacteria, and the Gram-negative T. maritima, FGAR-AT is a complex of three proteins: PurS, PurL (designated as smPurL), and PurQ. In eukaryotes and other Gram-negative bacteria, FGAR-AT is a multidomain protein with a molecular mass of about 140 kDa (designated as lgPurL). Based on iterative sequence similarity searches, it has been proposed that smPurL together with lgPurL, aminoimidazole ribonucleotide synthetase (PurM), Ni-Fe hydrogenase maturation protein (HypE), selenophosphate synthetase (SelD), and thiamine monophosphate kinase (ThiL) form a new structural superfamily of ATP-dependent enzymes.4 The structures of FGAR-AT (lgPurL) from Salmonella typhimurium [Protein Data Bank (PDB) 1t3t], PurM protein from Escherichia coli (PDB 1cli), and ThiL from Aquifex aeolicus (PDB 1vqv) are known.4–6 Herein, we report the crystal structure of TM1246 (smPurL), determined using the semiautomated, high-throughput pipeline of the Joint Center for Structural Genomics (JCSG).7

The crystal structure of TM1246 [Fig. 1(A)] was determined to 2.15 Å resolution using the multi-wavelength anomalous dispersion (MAD) method. Data collection, model, and refinement statistics are summarized in Table I. The final model includes one monomer (residues 2–186, 203–603), one chloride ion, and 221 water molecules in the asymmetric unit. The Matthews' coefficient (Vm)8 for TM1246 is 2.14 Å3/Da and the estimated solvent content is 42.0%. A main-chain torsion angle analysis by the program MolProbity9 shows that 97% and 100% of the residues are in the favored and allowed regions of the Ramachandran plot, respectively.

Figure 1.

Crystal structure of smPurL from T. maritima. A: Stereo ribbon diagram of TM1246 monomer color-coded from N-terminus (blue) to C-terminus (red) showing the domain organization. Helices H1–H24 and β-strands (β1–β25) are indicated. The disordered region is depicted by a dashed line with the start and end residues labeled. B: Diagram showing the secondary structure elements of TM1246 superimposed on its primary sequence. The α-helices, 310-helices, β-strands, β-bulges, and γ-turns are indicated. The four β-sheets are indicated by a red A, B, C, and E. β-Hairpins are depicted as red loops. Disordered regions are depicted by a dashed line with the corresponding sequence shown below.

Table I. Summary of Crystal Parameters, Data Collection, and Refinement Statistics for TM1246 (PDB: 1vk3)
  • a

    Highest resolution shell.

  • ESU = estimated overall coordinate error.20

  • Rsym = Σ|Ii − 〈Ii〉|/Σ|Ii| where Ii is the scaled intensity of the ith measurement, and 〈Ii〉 is the mean intensity for that reflection.

  • Rcryst = Σ|{Fobs| − |Fcalc||/Σ|Fobs| where Fcalc and Fobs are the calculated and observed structure factor amplitudes, respectively.

  • Rfree = as for Rcryst, but for 5.0% of the total reflections chosen at random and omitted from refinement.

Space groupP212121  
Unit cell parameters a = 59.78 Å, b = 72.69 Å, c = 128.42 Å 
Data collectionλ0MADSeλ1MADSeλ2MADSe
Wavelength (Å)0.97960.97931.0035
Resolution range (Å)63.26–2.1530.18–2.2030.18–2.20
No. of observations91,42093,06489,540
No. of reflections28,27227,79127,151
Completeness (%)90.8 (56.8)a95.4 (72.7)a93.2 (64.4)a
Mean I/σ(I)10.2 (1.3)a(8.9)(1.4)a10.3 (1.6)a
Rsym on I0.09 (0.55)a0.11 (0.49)a0.09 (0.46)a
Highest resolution shell (Å)2.21–2.152.26–2.202.26–2.20
Model and refinement statistics   
Resolution range (Å)63.26–2.15Data set used in refinementλ0MADSe
No. of reflections (total)28,237Cutoff criteria|F| > 0
No. of reflections (test)1,368Rcryst0.186
Completeness (% total)90.6Rfree0.254
Deviation from ideal geometry (rms)   
 Bond length0.012 Å  
 Bond angle1.32°  
Average isotropic B-value protein24.9 Å2  
Average isotropic B-value ions51.1 Å2  
Average isotropic B-value water32.4 Å2  
ESU based on R value0.31 Å  
Protein residues/atoms586/4,444  
Solvent molecules221  

The TM1246 monomer contains 25 β-strands (β1–β25) in four β-sheets (A, B, C, E), one β-hairpin (D), 18 α-helices (H1–H7, H10–H11, H13, H15–H17, H19–H21, H23–H24), and seven 310-helices (H8–H9, H12, H14, H18, H22–H23′) [Fig. 1(A,B)]. The total β-strand, α-helical, and 310-helical content is 27.6, 31.7, and 3.1%, respectively. TM1246 comprises four α+β domains [Figs. 1(A) and 2(A)]. The first (A1: 2–166) and third domains (A2: 362–507) are related by pseudo-twofold symmetry and pack against each other to form the central structural unit of TM1246 [Fig. 2(A)]. These domains adopt a two-layered α+β fold whose core is composed of a mixed four-stranded β-sheet (strand order 1423) and two α-helices arranged in a βαβαββ motif. These domains resemble the N-terminal domain of PurM and belong to the “Bacillus chorismate mutase-like” fold.10 The other domains of smPurL (B1: 167–345 and B2: 508–603) have a curved β-sheet and adopt a three-layered α+β fold composed of a four-stranded antiparallel β-sheet and three α-helices arranged in a βαβαβαβ motif. This fold has some resemblance to the ferredoxin fold and has been classified in the SCOP database10 under the “PurM C-terminal domain-like” fold. A linker peptide (residues 346–361) connects the two PurM-like units [Figs. 1(A) and 2(A)]. The domains of smPurL are likely a result of a tandem duplication of an ancestral PurM-like subunit and are arranged like the PurM dimer (A1B1-A2B2) [Figs. 1(A) and 2(A)].

Figure 2.

Domain arrangement and structural alignment of smPurL and lgPurL. A: Domain arrangement of PurM, ThiL, smPurL, and lgPurL proteins. PurM and ThiL are homodimers. smPurL (gray) and the central domain of lgPurL (blue) have the same domain arrangement as a PurM dimer. lgPurL has two additional domains as compared with smPurL. The N-terminal domain homologous to the PurS protein is colored green and the C-terminal glutaminase domain is colored red. B: Stereo ribbon diagram of a superposition of smPurL (gray) and residues 183–960 of lgPurL from S. typhimurium (blue). The lgPurL N-terminal domain of unknown function and the C-terminal glutaminase domain are colored green and red, respectively. These extra domains of lgPurL correspond to the PurS (TM1244) and PurQ (TM1245) proteins in T. maritima. The structures were aligned using the DALI server.

The crystallographic packing of the TM1246 structure, as well as analytical size exclusion chromatography in combination with static light scattering, indicates that a monomer is the biologically-relevant form. A search performed with the coordinates of TM1246 using the DALI server11 showed structural similarity to residues 183–960 of formylglycinamide synthetase from S. typhimurium (PDB: 1t3t) (Z = 28.6).6 The root-mean-square deviation (RMSD) for this structural alignment is 2.8 Å over 553 aligned Cα atoms with 21% sequence identity [Fig. 2(B)]. The smPurL is also homologous to the structures of PurM and ThiL.4–6, 12 The structural alignment of PurM from E. coli4 (PDB: 1cli) with the N-terminal domains of smPurL superimposes 229 Cα atoms with an RMSD of 3.2 Å, whereas the ThiL protein from Aquifex aeolicus (PDB: 1vqv) can be aligned over 210 Cα atoms with an RMSD of 3.6 Å. Because TM1246 represents the first structure of a smPurL, it is interesting to compare it with the lgPurL structure. Although the overall fold between TM1246 and the FGAM synthetase domain of lgPurL is very similar, two large insertions are found in the lgPurL structure [Fig. 2(A,B)]. These insertions, which occur on A1 and B2, are placed in close structural proximity, creating a new surface on one side of the enzyme [Fig. 2(A,B)]. The very compact nature of the T. maritima enzyme structure seems to suggest that this represents a minimal version of the FGAM synthetase domain.

The active site of the smPurL from T. maritima was inferred from sequence and structural comparison to the lgPurL from S. typhimurium.6 The putative active site is located in the cleft formed by the “Bacillus chorismate mutase-like” fold and the succeeding “PurM C-terminal domain-like” fold [Figs. 2(B) and 3(A)]. Unlike lgPurL, which binds two sulfate ions in its putative active site, the smPurL structure does not contain any bound ligands. The secondary structural elements around the active site of smPurL superimpose very well with the lgPurL structure. Seven of the nine residues that interact with the sulfate ions in the active site region of lgPurL are conserved in smPurL and adopt similar side-chain conformations [Fig. 3(A)].

Figure 3.

A: Stereo diagram of a close-up of the putative active site of TM1246 superimposed on the lgPurL structure shown in ribbon representation. The sulfate ions are from the lgPurL structure. The start and end regions of the glycine-rich loop in both the structures are labeled in red. B: Stereo diagram of a close-up of the lgPurL ADP-binding site superimposed on the smPurL structure shown in ribbon representation. The ADP moiety and Mg2+ ions are from the lgPurL structure. In A and B, residues are numbered according to TM1246 structure (PDB 1vk3) and the equivalent residues of lgPurL (PDB 1t3t) are shown in parentheses.

The lgPurL structure contains an auxiliary ADP-binding site that is related to the active site by pseudo- twofold symmetry [Figs. 2(B) and 3(B)]. The sequence and structural conservation at this region is less pronounced than for the putative active site. Recent biochemical studies on the Bacillus subtilis smPurL have indicated that ADP is required for the assembly of the PurSLQ complex.5 However, in the TM1246 structure, no ADP or bound anions are observed at this site. Of the five highly conserved residues of the lgPurL protein family that are involved in interactions with the ADP moiety and the Mg2+ ions (K649, E718, N722, D884, and D887 of PDB 1t3t), only E425 (E718 in lgPurL) is conserved in TM1246 [Fig. 3(B)]. None of the residues that are involved in forming the hydrophobic pocket in lgPurL are conserved in TM1246, although these regions are structurally similar. The structural differences occur mainly around the regions that accommodate the base ring and the sugar moiety of ADP.

smPurL contains a glycine-rich loop that is structurally disordered (residues 187–202) and is located close to the active site [Fig. 3(A)]. Interestingly, the equivalent region of lgPurL (448–466) is also disordered and is positioned to cover the active site upon binding of the ATP moiety. It is likely that this loop will become ordered in the ATP-bound form of the enzyme.

As noted before, the lgPurL structure has two extra domains as compared with the smPurL structure: an N-terminal domain of unknown function [Fig. 2(A,B), colored green] and a C-terminal glutaminase domain [Fig. 2(A,B), colored red]. The structural study of lgPurL revealed that the N-terminal domain of unknown function is structurally homologous to a dimer of PurS.6 It is likely that the PurS and PurQ domains of T. maritima (TM1244 and TM1245) occupy similar positions to the additional N- and C-terminal domains of lgPurL [Fig. 2(A,B)].

smPurL is the last remaining enzyme in the purine biosynthetic pathway to have its structure determined. The smPurL family contains hundreds of sequence homologs. Models for TM1246 homologs can be accessed at

The crystal structure of TM1246 represents a smPurL protein. The information reported herein, in combination with the structure of lgPurL and further biochemical and biophysical studies, will yield valuable insights regarding the role of this protein in purine biosynthesis.

Materials and Methods.

Protein production and crystallization.

Phosphoribosylformylglycinamidine synthase II from T. maritima (TIGR: TM1246, Swissprot: Q9X0X3) was amplified by polymerase chain reaction (PCR) from genomic DNA using PfuTurbo (Stratagene) and primer pairs encoding the predicted 5′- and 3′-ends. The PCR product was cloned into plasmid pMH2T7, which encodes an expression and purification tag (MGSDKIHHHHHH) at the amino terminus of the full-length protein. The cloning junctions were confirmed by sequencing. Protein expression was performed in a selenomethionine-containing medium using the E. coli methionine auxotrophic strain DL41. Lysozyme was added to the culture at the end of fermentation to a final concentration of 250 μg/mL. Bacteria were lysed by sonication after a freeze/thaw procedure in Lysis Buffer [50 mM Tris pH 7.9, 50 mM NaCl, 10 mM imidazole, 0.25 mM Tris(2-carboxyethyl)phosphine hydrochloride (TCEP)], and the cell debris was pelleted by centrifugation at 3,400 × g for 60 min. The soluble fraction was applied to nickel-chelating resin (GE Healthcare) pre-equilibrated with Lysis Buffer. The resin was washed with Wash Buffer [50 mM potassium phosphate pH 7.8, 300 mM NaCl, 40 mM imidazole, 10% (v/v) glycerol, 0.25 mM TCEP], and the target protein was eluted with Elution Buffer [20 mM Tris pH 7.9, 300 mM imidazole, 10% (v/v) glycerol, 0.25 mM TCEP]. The eluate was buffer exchanged with Buffer Q [20 mM Tris pH 7.9, 5% (v/v) glycerol, 0.25 mM TCEP] containing 50 mM NaCl and was applied to a RESOURCE Q column (GE Healthcare) preequilibrated with the same buffer. The target protein was eluted using a linear gradient of 50–500 mM NaCl in Buffer Q. The appropriate fractions were pooled, buffer exchanged with Crystallization Buffer [20 mM Tris pH 7.9, 150 mM NaCl, 0.25 mM TCEP], and concentrated for crystallization assays to 15 mg/mL by centrifugal ultrafiltration (Millipore). Molecular weight and oligomeric state of the target protein were determined using a 1.0 × 30 cm Superdex 200 column (GE Healthcare) in combination with static light scattering (Wyatt Technology). The mobile phase consisted of 20 mM Tris pH 7.9, 150 mM NaCl. The protein was crystallized using the nanodroplet vapor diffusion method13 with standard JCSG crystallization protocols.7 The crystallization reagent contained 30% polyethylene glycol (PEG)-200, 7% PEG-4000, 0.1 M Tris pH 8.0. No additional cryoprotectant was required. The crystals were indexed in orthorhombic space group P212121 (Table I).

Data collection.

MAD data were collected at the Advanced Light Source (ALS, Berkeley, CA) on beamline 8.2.1 at wavelengths corresponding to the low energy remote (λ1) and the peak (λ2) of a selenium MAD experiment. In addition, a second crystal was used to collect a 2.15 Å high-resolution dataset (λ0) on beamline 8.2.2. The datasets were collected at 100 K using ADSC CCD detectors. Data were integrated and reduced using Mosflm14 and then scaled with the program SCALA from the CCP4 suite.15 Data statistics are summarized in Table I.

Structure solution and refinement.

The initial structure was determined with the 2.20 Å selenium MAD data (λ1, 2) using the CCP4 suite15 and SOLVE/RESOLVE.16 Model building and refinement were performed on the high-resolution data set (λ0) using O17 and REFMAC5.15 Refinement statistics are summarized in Table I. The final model includes one protein monomer, one chloride ion, and 221 water molecules in the asymmetric unit. No electron density was observed for residues 1, 187–202, and the expression and purification tag.

Validation and deposition.

Analysis of the stereochemical quality of the model was performed using AutoDepInputTool (, MolProbity,9 SFcheck 4.0,18 and WHAT IF 5.0.19 Figure 1(B) was adapted from an analysis using PDBsum ( and all others were prepared with PyMOL (DeLano Scientific). Atomic coordinates and experimental structure factors of TM1246 have been deposited within the PDB and are accessible under the code 1vk3.


This work was supported by the National Institutes of Health Protein Structure Initiative grants P50 GM62411 and U54 GM074898 awarded by the National Institute of General Medical Sciences ( Portions of this research were performed at the Stanford Synchrotron Radiation Laboratory (SSRL) and the Advanced Light Source (ALS). The SSRL is a national user facility operated by Stanford University on behalf of the United States Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research, and by the National Institutes of Health (National Center for Research Resources, Biomedical Technology Program, and the National Institute of General Medical Sciences). The ALS is supported by the Director, Office of Science, Office of Basic Energy Sciences, Materials Sciences Division, of the United States Department of Energy under contract no. DE-AC03-76SF00098 at Lawrence Berkeley National Laboratory.