The TM1553 gene of Thermotoga maritima encodes a lipoprotein with a molecular weight of 39,409 Da (residues 1–352) and a calculated isoelectric point of 5.6. Sequence analysis reveals that TM1553 belongs to a predominantly prokaryotic family of proteins that are homologous to the ApbE family of periplasmic lipoproteins. ApbE is involved in thiamine (vitamin B1) biosynthesis and has been proposed to carry out the conversion of aminoimidazole ribotide (AIR) to 4-amino-5-hydroxymethyl-2-methyl pyrimidine (HMP).1, 2 Although the precise biochemical function of ApbE is not known, mutagenesis studies have indicated that ApbE is important for Fe-S cluster metabolism.3, 4 The exact role played by ApbE in either of these activities remains unclear.
Herein, we report the crystal structure of TM1553, the first structural representative of the ApbE family, which was determined using the semiautomated, high-throughput pipeline of the Joint Center for Structural Genomics (JCSG).5
Materials and Methods.
Protein production and crystallization.
The TM1553 gene (TIGR: TM1553; Swiss-Prot: Q9X1N9), was amplified by polymerase chain reaction (PCR) from genomic DNA using Pfu Turbo and primer pairs encoding the predicted 5- and 3-ends (residues 40-352; cloned without the N-terminal transmembrane helix 1-39). The PCR product was cloned into plasmid pMH4, which encodes an expression and purification tag (MGSDKIHHHHHH) at the amino terminus of the protein. The cloning junctions were confirmed by DNA sequencing. Protein expression was performed in a selenomethionine-containing medium using the Escherichia coli methionine auxotrophic strain DL41. Lysozyme was added to the culture at the end of fermentation to a final concentration of 250 μg/mL. Bacteria were lysed by sonication after a freeze/thaw procedure in Lysis Buffer [50 mM Tris pH 8.0, 50 mM NaCl, 10 mM imidazole, 1 mM Tris(2-carboxyethyl)phosphine hydrochloride (TCEP)], and the cell debris was pelleted by centrifugation at 32,500 × g for 30 min. The soluble fraction was applied to a nickel-chelating resin (GE Healthcare) preequilibrated with Lysis Buffer. The resin was washed with Wash Buffer [50 mM Tris pH 8.0, 300 mM NaCl, 40 mM imidazole, 10% (v/v) glycerol, 1 mM TCEP], and the target protein was eluted with Elution Buffer [20 mM Tris pH 8.0, 300 mM imidazole, 10% (v/v) glycerol, 1 mM TCEP]. The eluate was diluted 10-fold with Buffer Q [20 mM Tris, pH 7.9, 50 mM NaCl, 5% (v/v) glycerol, 1 mM TCEP] and applied to a RESOURCE Q column (GE Healthcare) preequilibrated with the same buffer. The flow-through fraction which contained the target protein was buffer exchanged with Crystallization Buffer [20 mM Tris pH 7.9, 150 mM NaCl, 1 mM TCEP] and concentrated for crystallization assays to 14 mg/mL by centrifugal ultrafiltration (Millipore). Molecular weight and oligomeric state of TM1553 were determined using a 1.0 × 30 cm Superdex 200 column (GE Healthcare) in combination with static light scattering (Wyatt Technology). The mobile phase consisted of 20 mM Tris pH 8.0, 150 mM NaCl, and 0.02% (w/v) sodium azide. The protein was crystallized using the nanodroplet vapor diffusions method6 with standard JCSG crystallization protocols.5 The crystallization reagent contained 10% (v/v) 2-methyl-2,4-pentanediol (MPD), 0.1 M citrate pH 5.0. Twenty percent (v/v) MPD (final concentration) was added as a cryoprotectant. The crystals were indexed in the orthorhombic space group P212121 (Table I).
Table I. Summary of Crystal Parameters, Data Collection, and Refinement Statistics for TM1553 (PDB: 1 vrm)
Multiwavelength anomalous diffraction (MAD) data were collected at the Advanced Light Source (ALS, Berkeley, CA) on beamline 8.2.1 at wavelengths corresponding to the inflection (λ1) and low energy remote (λ2) of a MAD experiment. The datasets were collected at 100 K using an ADSC CCD detector. The MAD data were integrated and reduced using Mosflm9 and then scaled with the program SCALA from the CCP4 suite.7 Data statistics are summarized in Table I.
Structure solution and refinement.
The structure was determined with 1.58 Å selenium MAD data using the CCP4 suite7 and SOLVE/RESOLVE.10 Automatic model building was performed with iterative ARP/wARP runs.11 Model completion and refinement were performed with dataset λ1 using O12 and REFMAC5.13 Refinement statistics are summarized in Table I.
Validation and deposition.
Analysis of the stereochemical quality of the model was accomplished using AutoDepInputTool,14 MolProbity,15 SFcheck 4.0,16 and WHATIF 5.0.17 Protein quaternary structure analysis used the PQS server.18 Figures were prepared with PyMOL (DeLano Scientific). Atomic coordinates and experimental structure factors for TM1553 at 1.58 Å resolution have been deposited in the Protein Data Bank (PDB) and are accessible under the code 1vrm.
Sequence similarity search.
The PSI-BLAST19 program was used to detect homologs of DUF375 proteins in the NCBI nonredundant protein sequence database (March 8, 2005; 2,354,365 sequences; 800,120,167 total letters). Initially, a PSI-BLAST search (inclusion threshold 0.001) was performed until profile convergence, using as a query one of the DUF375 family members, hypothetical protein MTH727 from Methanothermobacter thermautotrophicus (gi|2621816). Subsequently, collected sequences were subjected to further transitive PSI-BLAST searches to identify other distantly related proteins.
Additional searches were performed with the meta profile alignment method Meta-BASIC,20 which is available via the GRDB system (http://basic.bioinfo.pl/meta). Meta-BASIC combines the use of sequence profiles and secondary structure predictions (meta profiles) for the query sequence and given protein families with various scoring systems, and meta profile alignment algorithms to detect distant similarity between proteins, even if the structure of the reference protein is not known. Specifically, the consensus sequence of DUF375 was compared with all 7,418 PfamA families21 and with 10,128 proteins (clustered at 90% sequence identity) extracted from the PDB.22
Results and Discussion.
The crystal structure of TM1553 [Fig. 1(A)] was determined to 1.58 Å resolution using the MAD method. Data collection, model, and refinement statistics are summarized in Table I. The final model includes a monomer (residues 44–352), three MPD molecules, an unknown ligand (UNL), and 404 water molecules in the asymmetric unit. No electron density was observed for residues 40–43 or for the expression and purification tag. The Matthews' coefficient (Vm)23 for TM1553 is 2.37 Å3/Da, and the estimated solvent content is 47.8%. The Ramachandran plot produced by MolProbity15 shows that 98.3% and 100.0% of the residues are in favored and allowed regions, respectively.
TM1553 is composed of 15 β-strands (β1–β15), nine α-helices (H1–H4, H6, H8, H10–H12), and four 310-helices (H5, H7, H9, H11′) [Fig. 1(A,B)]. The total β-sheet, α-helical, and 310-helical content is 26.2, 32.4, and 3.6%, respectively.
The TM1553 monomer contains three structural domains. The N-terminal (40–87, 191–226) and C-terminal (254–352) domains are extremely similar in topology and adopt a fold described previously as the tunneling fold (T-fold).24, 25 The T-fold core is composed of a four-stranded, antiparallel β-sheet (strand order 1234) with a pair of antiparallel α-helices placed between the second and third β-strands (ββααββ) [Figs. 1(A) and 2(A)]. This fold is present in a uricase and in a group of tetrahydrobiopterin biosynthesis enzymes whose known members form large, tunnel-shaped, homo-oligomeric barrels of different sizes [Fig. 2(A,B)]. When the two T-fold domains of TM1553 are structurally aligned, the root-mean-square deviation (RMSD) between 65 Cα atoms from each domain is 2.6 Å, indicating that TM1553 may have arisen from an ancestral gene duplication event. Although this alignment indicates only 8% sequence identity, it reveals a strong conservation of hydrophobic and polar residues between the domains. The two T-fold domains are connected by a short linker region (227–253) composed of a 310-helix (H9) and a β-hairpin (β8–β9). In addition to the T-fold domains, TM1553 contains a predominantly helical domain (88–190) that is inserted between α-helices H1 and H8 of the N-terminal T-fold domain [Fig. 1(A)]. This domain is composed of three short β-strands, four α-helices, and two 310-helices. A DALI26 structure similarity search of this domain did not return any significant hits, although it has a remote structural resemblance to members of the SCOP 3-helical bundle fold.27
So far, the structures of GTP cyclohydrolase I, 6-pyruvoyl tetrahydropterin synthase, 7,8-dihydroneopterin aldolase, 7,8-dihydroneopterin triphosphate epimerase, and uricase are known to adopt the T-fold. With the exception of uricase, whose monomeric unit also contains a tandem duplication of the T-fold unit, all these structures are homo-oligomers of a single T-fold domain. In the uricase and TM1553 structures, the duplicated T-fold domains are arranged in tandem so that β-strand 7 of the N-terminal β-sheet hydrogen bonds with β-strand 10 of the C-terminal β-sheet to form an eight-stranded, antiparallel β-sheet [Fig. 1(A)]. TM1553 and uricase (PDB 1r51) can be structurally aligned with an RMSD of 4.0 Å and a sequence identity of 8% over 141 Cα atoms [Fig. 2(C)]. This structural superposition reveals a similar overall fold in both structures, with the α-helices present on the concave surface of the curved, eight-stranded β-sheet. The T-fold domains of uricase contain long, N-terminal β-strands that are involved in oligomerization. TM1553, however, contains shorter β-strands as compared with uricase, and lacks the corresponding extensions of the β-strands that are involved in oligomerization. Therefore, it is unlikely that TM1553 would form an oligomeric complex, like the other members of the T-fold.
Indeed, analysis of the crystallographic packing of TM1553 using the PQS server18 indicates that a monomer is the biologically relevant form [Fig. 1(A)]. This finding is also consistent with results from analytical size exclusion chromatography in combination with static light scattering. This observation is noteworthy because all previously determined structures that adopt the T-fold form large homo-oligomeric assemblies. In fact, it has been suggested that the T-fold may not be stable in isolation and must be associated with identical subunits to form barrel-shaped, homo-oligomeric complexes in order to be functional.24 TM1553 represents the first structure of a stable and functional enzyme that possesses T-fold domains, but does not oligomerize.
Another interesting difference among the current members of the T-fold family and TM1553 is the location of the active site. All known members of the T-fold for which structures are available possess multiple symmetry-related active sites in their tunnel-shaped, homo-oligomeric complexes. Typically, each active site is located at the interface between T-fold monomers. In uricase, which contains a tandem duplication of the T-fold domain, the functional active site is also formed at the interface between two monomers upon oligomerization [Fig. 2(B)]. Surprisingly, TM1553 possesses a putative active site that is contained within the monomeric unit and is located at the interface between the two T-fold units and the helical insert domain [Fig. 1(A)].
An analysis of the TM1553 structure using the CastP server28 reveals a deep cavity of 1,300 Å3 at this interface. Location of additional electron density in this cavity that corresponds to a noncovalently bound small molecule, as well as sequence conservation between TM1553 homologs in this region, point to its role as the active site of this protein. Based on the electron density, different purines, nucleosides, and nucleotides where modeled and adenosine monophosphate proved to be the best fit. However, mass spectrometry failed to reveal the precise chemical identity of the ligand, and the “ligand” electron density was not sufficiently resolved to model any known small molecules with confidence. Therefore, it was modeled as an UNL. This UNL is coordinated by the side-chains of Phe130, Val134, Leu138, Asp188, Asp221, Ala258, Ser260, Glu264, His276, Ile277, Pro280, Asp303, Ser306, and Thr307, the main-chain of Ala129, Asp131, Gly190, Gly191, Thr259, and Leu278, and seven water molecules [Fig. 2(D)]. Of these, Asp188, Thr259, Ser260, His276, Pro280, Asp303, and Thr307 are particularly well conserved among homologs of TM1553, suggesting that they are essential for the enzyme's function.
A DALI26 structural similarity search using the complete protein as a query did not find any significant hits to other protein structures. A search using the program ProSMoS (Grishin NV, unpublished) revealed similarities between the N- and C-terminal domains of TM1553 and members of the T-fold. A subsequent DALI26 search using the individual T-fold domains of TM1553 as a query then found links to members of the T-fold of SCOP.
Pfam domains of unknown function (DUFs) are clusters of related protein sequences for which no fold or functional assignment could be made based on similarities to other functionally annotated protein sequences in the Pfam database.21 The crystal structure of TM1553 provides a structural template for DUF375 that encompasses many uncharacterized proteins from bacterial and archaeal species, including sequences belonging to an uncharacterized cluster of orthologs COG2122. The evolutionary relationship between ApbE (PF02424 in Pfam database) and DUF375 (PF04040 in Pfam database) families was indicated by the meta profile alignment method, Meta-BASIC, with a Z-score of about 25 (n.b. predictions with Z-score >12 have <5% probability of being incorrect).20 In addition, a PSI-BLAST19 search against the NCBI nonredundant protein sequence database (E-value threshold 0.001) initiated with one of the DUF375 family members, hypothetical protein MMP1236 from Methanococcus maripaludis (gi|45358799), found the putative thiamine biosynthesis lipoprotein ApbE (gi|29377698) from Enterococcus faecalis with an E-value of 4e-04 in the second iteration. In the next iteration, about 200 ApbE family proteins were detected with statistically significant E-values, including COG1477 (membrane-associated lipoprotein involved in thiamine biosynthesis), as well as TM1553, the only ApbE protein for which a crystal structure has been determined. A global multiple sequence alignment of DUF375 and representative ApbE family sequences was generated using the PCMA program29 and then adjusted manually according to TM1553 structure (gi|15644301, PDB 1vrm) (Fig. 3). The alignment reveals a strong conservation of hydrophobic residues and a highly conserved ligand binding site, indicating that DUF375, which belongs to the superfamily of ApbE-like proteins, is likely to possess its active site at a position similar to that observed in the ApbE structure.
The TM1553 crystal structure reported herein represents the first structure determined for the ApbE-like protein superfamily. The structure includes an unknown ligand that suggests a putative active site location. TM1553 reveals unexpected similarities to enzymes from the tetrahydrobiopterin biosynthesis pathway and offers a structural template for the so far uncharacterized DUF375 family in addition to the ApbE-like proteins. The evolutionary relationship between TM1553 and the enzymes from the tetrahydrobiopterin biosynthesis pathway, however, remains unclear. The information presented herein, in combination with further biochemical and biophysical studies, should yield valuable insights into the functional role of this enzyme.
This work was supported by NIH Protein Structure Initiative grants from the National Institute of General Medical Sciences (www.nigms.nih.gov). Portions of this research were performed at the Stanford Synchrotron Radiation Laboratory (SSRL) and the Advanced Light Source (ALS). The SSRL is a national user facility operated by Stanford University on behalf of the U.S. Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research, and by the National Institutes of Health (National Center for Research Resources, Biomedical Technology Program, and the National Institute of General Medical Sciences). The ALS is supported by the Director, Office of Science, Office of Basic Energy Sciences, Materials Sciences Division, of the U.S. Department of Energy under Contract No. DE-AC03-76SF00098 at Lawrence Berkeley National Laboratory.