Crystal structure of a novel Thermotoga maritima enzyme (TM1112) from the cupin family at 1.83 Å resolution


The TM1112 gene of Thermotoga maritima encodes a conserved hypothetical protein with a molecular weight of 10,626 Da (residues 1–89) and a calculated isoelectric point of 5.5. Currently, no functional annotation has been made for this protein, but fold recognition methods, such as the Fold and Function Assignment System (FFAS),1 recognized significant sequence similarity to the family of cupins.2 Here, we report the crystal structure of TM1112 that was determined using the semiautomated high-throughput pipeline of the Joint Center for Structural Genomics (JCSG).3

The structure of TM1112 [Fig. 1(A)] was determined to 1.83 Å resolution by the molecular replacement (MR) method using the TM1112 NMR structure as the search model (PDB: 1lkn). Data collection, model, and refinement statistics are summarized in Table 1. The final model includes two protein monomers (residues 2–89), two unknown ligands (UNL), and 332 water molecules. The Matthews' coefficient (Vm) for TM1112 is 2.48 Å3/Da and the estimated solvent content is 50.0%. The Ramachandran plot, produced by Procheck 3.4,4 shows that 96.8% of the residues are in the most favored regions and 3.2% in additional allowed regions.

Figure 1.

Crystal structure of TM1112. A: Ribbon diagram of Thermotoga maritima TM1112 color coded from N-terminus (blue) to C-terminus (red) showing the domain organization viewed along (left) and normal (right) to the barrel axis. Helices (H1, H2), β-sheets (A and A′), and β-strands (β1–β7) are indicated. B: Diagram showing the secondary structure elements in TM1112 superimposed on its primary sequence. The β-sheets are indicated by a red A or A′ and the β-hairpin is depicted as red loops. Residues adjacent to the the unknown ligand (UNL) molecule are marked with red dots (also see Fig. 2).

Figure 2.

A: The proposed active site of TM1112 is depicted with the unknown ligand molecule (UNL) bound to Lys84 and its coordinating residues (Trp24, Trp33, Glu39, Cys41, Tyr35, and Trp76) in ball and stick. B: Close up view of the active site with a 2Fo–Fc map around Lys84, the covalently-bound UNL and Cys41 contoured at 1σ (marine blue). The atoms are indicated as follows: carbon (grey), oxygen (red), nitrogen (blue), sulfur (yellow), and UNL (pink). Potential covalent bonds for the UNL ligand are represented as dashed pink lines, but until ligand identification, these are quite speculative.

Table I. Summary of Crystal Parameters, Data Collection, and Refinement Statistics for TM1112 (PDB: 1o5u)
  • a

    highest resolution shell.

  • ESU = Estimated overall coordinate error.12, 17

  • Rmeas = [ΣhwΣi|〈Ih〉| − |Ih,i|]/ΣhΣi|Ih,i| where w = [nh/(nh − 1)]1/2 and 〈Ih〉 = [ΣinIh,i]/nh. This is the multiplicity-weighted Rsym.18

  • Rcryst = Σ| |Fobs| − |Fcalc| | /Σ|Fobs| where Fcalc and Fobs are the calculated and observed structure factor amplitudes, respectively.

  • Rfree = as for Rcryst, but for 5.0% of the total reflections chosen at random and omitted from refinement.

Space groupP21  
Unit cell parametersa = 39.63 Å, b = 74.73 Å, c = 42.21 Å, α = γ = 90°, β = 91.02°
Data Collectionλ0  
Wavelength (Å)0.9700  
Resolution range (Å)27.98–1.83  
Number of observations53,783  
Number of unique reflections21,322  
Completeness (%)98.2 (84.6)a  
Mean I/σ (I)18.0 (3.1)a  
Rmeas on I0.045 (0.307)a  
Sigma cutoff0.0  
Highest resolution shell (Å)1.86–1.83a  
Model and refinement statistics   
Resolution range (Å)27.98–1.83Data set used in refinementλ0
Number of reflections (total)21,303Cutoff criteria|F| > 0
Number of reflections (test)1,071Rcryst0.170
Completeness (% total)98.1Rfree0.222
Stereochemical parameters   
Restraints (RMS observed)   
 Bond length0.017 Å  
 Bond angle1.65°  
Average isotropic B-value33.5 Å2  
ESU based on free R value0.12 Å  
Protein residues/atoms176/1508  
Ligand/solvent molecules2/332  

The TM1112 monomer is composed of seven β-strands (β1–β7), one α-helix (H1), and one short 310-helix (H2). The total β-strand content is 59.1%. The TM1112 structure is characterized by an antiparallel β-sheet that forms a jelly roll β-sandwich with a topology that is reminiscent of the cupin barrel fold2 [Fig. 1(A)]. The seven-stranded β-sheet (β1–β7) can be viewed as composed of two connected β-sheets, A with 16472 topology and A′ with 3745 topology, fused together via two strongly bent β-strands β4 and β7 [Fig. 1(A)]. Because of this variation, the Structural Classification of Proteins database (SCOP)5 classified TM1112 as a new subfamily of RmlC-like cupins. The root-mean-square deviation (RMSD) between the crystal structure and the averaged NMR structure (PDB: 1lkn) of TM1112 is 1.3 Å over 88 aligned residues. Both structures indicate that a monomer is the biologically-relevant form of TM1112.

An alignment of the TM1112 sequence with homologous-cupin-like sequences, derived from a FFAS1 search, identifies a cluster of strictly conserved residues (Trp24, Trp33, Glu39, Cys41, Tyr35, Trp76, and Lys84) located in the center of the β-barrel [Fig. 2(A)]. The conservation of these side-chains within a groove in the center of the β-barrel indicates a proposed location for the TM1112 active site. SigmaA-weighted OMIT maps show additional compact density contiguous with the side-chain amino group of Lys84 [Fig. 2(B)]. Connecting density also suggests a hydrogen bond (distance 2.75 Å) between Lys84 and the adjacent sulfhydryl group of Cys41. Despite extensive model building and database searching, the density could not be unambiguously interpreted and was, therefore, modeled as an unknown ligand (UNL) consisting of five atoms covalently bound to Lys84 which suggests a catalytic role for Lys84 and Cys41. The apparent covalent nature of this adduct suggests either a post-translational modification or interaction with an unknown substrate. However, we were unable to identify a similar active site configuration in the PDB which indicates that TM1112 represents a functionally novel enzyme from the cupin family. Clearly, further work is needed to define the enzymatic activity and mechanism of these cupins.

A structural similarity search, performed with the coordinates of TM1112 using the DALI server,6 indicated that the closest structural homologue is quercetin 2,3-dioxygenase, an RmlC-like cupin from Aspergillus japonicus (PDB: 1 juh),7 with an RMSD of 2.4 Å over 82 aligned residues with 12% sequence identity. Another structural homologue is the N-terminal domain of the Catabolite Gene Activator Protein (CAP) from Escherichia coli (PDB: 2cgp),8 where the RMSD is 2.7 Å over 79 aligned residues with 11% sequence identity.

According to FFAS,1 TM1112 has four distant homologues in the Thermotoga maritima proteome: TM1010 with 10% sequence identity, TM1287 (14%), TM1459 (10%), and TM0656 (14%). Sequence similarity searches with the TM1112 sequence against the non-redundant protein sequence database (NCBI) revealed more than one hundred homologues in prokaryotes and eukaryotes, all of which are designated as conserved hypothetical proteins. This new cupin sub-family comprises single-domain proteins like TM1112, as well as multi-domain proteins. Models for TM1112 homologues can be accessed at

The crystal structure reported here represents a novel enzyme from the cupin family that was determined by MR using the TM1112 NMR structure as a template. The information reported here, in combination with further biochemical and biophysical studies, will yield valuable insights into the functional determinants of this protein and the thermostability of these organisms.

Materials and Methods:

Protein production and crystallization:

TM1112 (TIGR: TM1112; Swissprot: Q9X0J6) was amplified by polymerase chain reaction (PCR) from Thermotoga maritima strain MSB8 genomic DNA using PfuTurbo (Stratagene) and primer pairs encoding the predicted 5′- and 3′-ends of TM1112. The PCR product was cloned into plasmid pMH1, which encodes an expression and purification tag (MGSDKIHHHHHH) at the amino terminus of the full-length protein. The cloning junctions were confirmed by sequencing. Protein expression was performed in a modified Terrific Broth [24 g/liter yeast extract, 12 g/liter tryptone, 1% (v/v) glycerol, 50 mM 3-(N-Morpholino)propanesulfonic acid (MOPS), pH 7.6] using the E. coli strain GeneHogs® (Invitrogen). Lysozyme was added to the culture at the end of fermentation to a final concentration of 1 mg/ml. Bacteria were lysed by sonication after a freeze-thaw procedure in Lysis Buffer [50 mM Tris, pH 7.9, 50 mM NaCl, 1 mM MgCl2, 5 mM 2-Mercaptoethanol, 3 mM DL-methionine, 2.5 U/ml Benzonase® (Sigma)], and cell debris pelleted by centrifugation at 3400 × g for 60 min. The soluble fraction was applied to a nickel-resin (Amersham Biosciences) pre-equilibrated with Equilibration Buffer (50 mM potassium phosphate, pH 7.8, 0.25 mM Tris(2-carboxyethyl)phosphine hydrochloride (TCEP), 10% (v/v) glycerol, 400 mM NaCl, 100 mM KCl, 20 mM imidazole, 3 mM DL-methionine). The nickel-resin was washed with Equilibration Buffer, and the protein eluted with Elution Buffer (20 mM Tris, pH 7.9, 10% (v/v) glycerol, 0.25 mM TCEP, 200 mM imidazole, 3 mM DL-methionine). The eluate was buffer exchanged into Crystallization Buffer (20 mM Tris, pH 7.9, 150 mM NaCl, 0.25 mM TCEP) and concentrated for crystallization assays to 19 mg/ml by centrifugal ultrafiltration (Millipore). The protein was crystallized using the nanodroplet vapor diffusion method9 using standard JCSG crystallization protocols.3 Crystals grew in Hampton Crystal Screen Cryo #31 [25.5% polyethylene glycol (PEG) 4000, 15% glycerol, and 0.17 M (NH4)2SO4]. The crystals were indexed in the monoclinic space group P21 (Table I).

Data collection:

Native diffraction data were collected on beamline 9-1 at the Stanford Synchrotron Radiation Laboratory (SSRL, Stanford, USA) using the BLU-ICE10 data collection environment (Table I). The dataset was collected at 100K using a Quantum 315 CCD detector. Data were integrated and reduced using Mosflm11 and then scaled with the program SCALA from the CCP4 suite.12 Data statistics are summarized in Table I.

Structure solution and refinement:

The structure was determined by molecular replacement using the program MOLREP from the CCP4 suite.12 The ten models from the NMR structures of TM1112 (PDB: 1lkn), solved by the Northeast Structural Genomics Consortium,13 were used as search models. The correct solution could only be obtained with model number 9 and gave an Rfree = 0.48 and Rcryst = 0.46 after initial rigid body and restrained refinement in REFMACS.12 Structure refinement was performed using TLS refinement in REFMAC5,12 O,14 and Xfit.15 Refinement statistics are summarized in Table I. The final model includes two protein monomers (residues 2–89), two unknown ligands (UNL), and 332 water molecules in the asymmetric unit. No electron density was observed for the expression or purification tag.

Validation and deposition:

Analysis of the stereochemical quality of the models was accomplished using Procheck 3.4,4 SFcheck 4.0,12 and WHAT IF 5.0.16 Figure 1(B) was adapted from an analysis using PDBsum ( and all others were prepared with PYMOL (DeLano Scientific). Atomic coordinates of the final model and experimental structure factors of TM1112 have been deposited with the PDB and are accessible under the code 1o5u.


This work was supported by NIH Protein Structure Initiative grant P50-GM 62411 from the National Institute of General Medical Sciences ( Portions of this research were carried out at the Stanford Synchrotron Radiation Laboratory, a national user facility operated by Stanford University on behalf of the U.S. Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research, and by the National Institutes of Health (National Center for Research Resources, Biomedical Technology Program, and the National Institute of General Medical Sciences).