Transcriptional regulators play a crucial role in the adaptation of microorganisms to diverse environmental challenges.1–3 Most microbial transcriptional regulators contain an effector binding regulatory domain and a DNA-binding domain that interacts with a specific operator DNA to either prevent (transcriptional repressors) or stimulate (transcriptional activators) transcription of a nearby gene(s).4 Prokaryotic transcriptional regulators have been classified into a number of families based on amino acid sequence similarity and domain architecture.4–8
The tetracycline repressor (TetR) family of proteins exhibits a high degree of sequence similarity at the N-terminal DNA-binding domain (∼50 amino acids), which adopts a helix-turn-helix (HTH) motif. In contrast, the regulatory domain is more variable, possibly reflecting the need to specifically accommodate different effectors.4, 9 TM1030 from Thermotoga maritima, a hyperthermophilic bacterium that typically thrives in high temperature ecosystems, is a 200 amino acid protein with a molecular weight of 24 kDa and an isoelectric point of 6.25. The N-terminal DNA-binding domain of TM1030 shows sequence similarity to members of the TetR family, but no significant similarity is found for the regulatory C-terminal region (∼150 amino acids). Here, we present the crystal structure of a ligand-bound form of TM1030, which was determined to 2.3 Å resolution, using the semiautomated, high-throughput pipeline of the Joint Center for Structural Genomics (JCSG)10 as part of the National Institute of General Medical Sciences (NIGMS)-funded Protein Structure Initiative (PSI).
Materials and Methods.
Protein production and crystallization:
The TM1030 gene (GenBank: AAD36107.1, GI: 4981571, Swiss-Prot: Q9×0C0) from Thermotoga maritima was amplified by polymerase chain reaction (PCR) from genomic DNA using PfuTurbo (Stratagene) and primers corresponding to the predicted 5′- and 3′-ends. The PCR product was cloned into plasmid pMH1, which encodes an expression and purification tag (MGSDKIHHHHHH) at the amino terminus of the full-length protein. The TM1030 gene uses an alternate start codon (GUG) that results in a valine at position 1 when expressed as a fusion with the expression and purification tag. The cloning junctions were confirmed by DNA sequencing. Protein expression was performed in a selenomethionine-containing medium using the Escherichia coli methionine auxotrophic strain DL41. At the end of fermentation, lysozyme was added to the culture to a final concentration of 250 μg/mL, and the cells were harvested. After one freeze/thaw cycle, the cells were sonicated in lysis buffer [50 mM Tris pH 8.0, 50 mM NaCl, 10 mM imidazole, 0.25 mM Tris(2-carboxyethyl)phosphine hydrochloride (TCEP)], and the lysate was clarified by centrifugation at 32,500g for 30 min. The soluble fraction was passed over nickel-chelating resin (GE Healthcare) pre-equilibrated with Lysis Buffer, the resin was washed with Wash Buffer [50 mM potassium phosphate, pH 7.8, 300 mM NaCl, 40 mM imidazole, 10% (v/v) glycerol, 0.25 mM TCEP], and the protein was eluted with elution buffer [20 mM Tris pH 8.0, 300 mM imidazole, 10% (v/v) glycerol, 0.25 mM TCEP]. The eluate was diluted ten-fold with Buffer Q [20 mM Tris pH 7.9, 50 mM NaCl, 5% (v/v) glycerol, 0.25 mM TCEP] and applied to a RESOURCE Q column (GE Healthcare) pre-equilibrated with the same buffer. The flow-through fraction, which contained TM1030, was further purified on a Superdex 200 column (GE Healthcare), with isocratic elution in Crystallization Buffer [20 mM Tris pH 7.9, 150 mM NaCl, 0.25 mM TCEP]. The protein was concentrated for crystallization assays to 15 mg/mL by centrifugal ultrafiltration (Millipore) and crystallized using the nanodroplet vapor diffusion method11 with standard JCSG crystallization protocols.10 The crystallization reagent contained 30% (w/v) polyethylene glycol (PEG) 8000, 0.2M Mg(NO3)2, and 0.1M citrate pH 4.5. Ethylene glycol was added as a cryoprotectant to a final concentration of 5% (v/v). Initial screening for diffraction was carried out using the Stanford Automated Mounting system (SAM)12 at the Stanford Synchrotron Radiation Laboratory (SSRL, Stanford, CA). The crystals were indexed in monoclinic space group P21 (Table I). The molecular weight and oligomeric state of TM1030 were determined using a 1 cm × 30 cm Superdex 200 column (GE Healthcare) in combination with static light scattering (Wyatt Technology). The mobile phase consisted of 20 mM Tris pH 8.0, 150 mM NaCl, and 0.02% (w/v) sodium azide.
Table I. Summary of Crystal Parameters, Data Collection, and Refinement Statistics for TM1030 (PDB accession code: 1zkg)
Highest resolution shell in parentheses. ESU, estimated overall coordinate error14,15; Rsym = Σ|Ii−〈Ii〉|/Σ|Ii|, where Ii is the scaled intensity of the ith measurement and 〈Ii〉 is the mean intensity for that reflection. Rcryst = Σ| |Fobs|−|Fcalc| |/Σ|Fobs|, where Fcalc and Fobs are the calculated and observed structure factor amplitudes, respectively. Rfree is same as for Rcryst, but for 5.2% of the total reflections chosen at random and omitted from refinement.
Typically, the number of unique reflections used in refinement is slightly less than the total number that were integrated and scaled. Reflections are excluded because of systematic absences, negative intensities, and rounding errors in the resolution limits and cell parameters. The removal of systematically absent reflections also affects the percent completeness calculation.
Unit cell parameters
a = 50.52 Å, b = 87.11 Å, c = 59.16 Å, β = 110.73°
Data collection, structure solution, and refinement:
Multi-wavelength anomalous diffraction (MAD) data sets were collected at 100 K using a charge-coupled device detector (ADSC Q315) on SSRL beamline 11-1 using the BLU-ICE13 data collection environment (Table I). Data were collected at wavelengths corresponding to the high energy remote (λ1) and inflection (λ2) of a selenium MAD experiment. Data were indexed and reduced with Mosflm16 and scaled using SCALA from the CCP4 suite.14 Diffraction data statistics are summarized in Table I. The selenium substructure was solved using SOLVE.17 Refinement of the Se sites resulted in a mean figure of merit of 0.39 to a resolution of 2.5 Å. Phase extension to 2.3 Å was performed using RESOLVE,17 with a solvent content of 0.5 and a starting two-fold noncrystallographic symmetry (NCS) matrix derived from the substructure solution. Automatic model building was performed with RESOLVE, resulting in a dimer model containing 288 residues (72%), with 89 (22%) of the side chains fitted. This initial model was rebuilt using iterative ARP/wARP runs,18 which built 354 residues (88%), with 345 residues docked into the sequence (86%). Model completion and refinement were performed with the remote (λ1) data set using COOT19 and REFMAC5.20 Refinement statistics are summarized in Table I.
Validation and deposition:
Analysis of the stereochemical quality of the structure was accomplished using AutoDepInputTool,21 MolProbity,15 SFcheck 4.0,14 and WHATIF 5.0.22 Protein quaternary structure analysis was performed using the PQS server.23 Figure 1 was adapted from an analysis using PDBsum,24 and all other figures were prepared with PyMOL (DeLano Scientific). Atomic coordinates and experimental structure factors for TM1030 at 2.3 Å resolution have been deposited in the PDB and are accessible under the code 1zkg.
Results and Discussion.
The crystal structure of TM1030 (Fig. 1) was determined to 2.3 Å resolution using the MAD method (Table I). The asymmetric unit includes two TM1030 subunits, two unknown ligands (UNLs) and 56 water molecules. Electron density was not observed for residues from the expression and purification tag for both subunits and residue Val 1 of subunit B. The Matthews' coefficient (Vm)25 for TM1030 is 2.6 Å3/Da, and the estimated solvent content is 52.2%. The Ramachandran plot,26 as produced by Molprobity,27 shows that 97.2 and 99.8% of the main chain torsion angles are in the favored and allowed regions, respectively. The only outlier is the surface exposed residue R74 (subunit A), which is poorly defined in the electron density map.
TM1030 is an all-helical protein, comprised of 10 α-helices (H1-H7, H7A-H9) and a 310-helix (H6A), and adopts a two-domain architecture similar to TetR (Fig. 1). The N-terminal DNA-binding domain is composed of the first three α-helices. The H2 and H3 α-helices of this domain form a canonical HTH motif. The regulatory domain is made up of an antiparallel helical bundle (H4-H5 and H7-H9) and helix H6 that is packed nearly orthogonal to the long axis of this helical bundle. A DALI28 search revealed structural similarity to several microbial transcriptional regulators. The top 13 hits in the search belong to the TetR family and include proteins from Pseudomonas aeruginosa (PDB accession code: 2gen, 2fbq, 2fd5), Salmonella typhimurium (1t33), Staphylococcus aureus (1jty), Mycobacterium tuberculosis (1t56), Rhodococcus sp. (2gfn, 2g7g), Bacillus cereus (1sgm, 2fx0, 1zk8, 2fq4), and Streptomyces coelicolor (1ui5). The Z-scores for the structural alignments of TM1030 with these top hits were in the range of 14.1–7.4 and the corresponding RMSDs are in the range of 3.0–6.8 Å where at least 75% of the Cα atoms (of the total 200 amino acids) were included. Notably, these structures share less than 21% sequence identity to TM1030, and nine of these structures were determined at PSI-funded Structural Genomics (SG) centers.
The N-terminal DNA-binding domain of TM1030 displays remarkable structural similarity to the homologous domains in all TetR-like proteins, while differences are much greater in the C-terminal regulatory domain. In particular, the relative orientation of the α-helices in the regulatory domain that mediates homodimerization4 differs significantly among members of the TetR family. A pair of α-helices (H8 and H9) in the TM1030 regulatory domain is involved in mediating most of the inter-subunit interaction. The inter-subunit interactions are mostly hydrophobic (V145, I149, F153, W156, F157, F161, V164, V189, M190, I193, and L194) and are further stabilized by four salt-bridges [D144(A) – R192(B), K152(A) – E186(B), D144(B) – R192(A), and K152(B) – E186(A)], and three hydrogen bonds [E163(A) – E163(B), K196(A) – T199(B), and K196(B) – T199(A)]. An analysis using size exclusion chromatography coupled with static light scattering supports the assignment of TM1030 as a dimer in solution. Furthermore, the biologically relevant homodimerization in the TetR family is mediated by similar helix-to-helix contacts,4, 29–31 suggesting that the dimer observed for TM1030 is functionally relevant.
The Midwest Center for Structural Genomics (MCSG) has also determined the crystal structure of a TM1030 construct to 2.0 Å resolution (PDB code 1z77). A structural superposition revealed a significant global conformational difference between the two structures (Fig. 2), despite very similar crystallization conditions (Table II). Two modes of structural alignments were explored using as anchors either the conserved N-terminal DNA-binding domain or the C-terminal homodimerization-mediating α-helices (H8 and H9). The corresponding RMSD values for the N-terminal or C-terminal based structural alignments for all 200 Cα atoms of TM1030 (JCSG, subunits A and B) with TM1030 (MCSG) are in the range of 3.3–5.0 and 5.1–6.4 Å, respectively. In spite of the large structural difference, the global fold and individual structural elements are mostly retained in both structures, with noticeable differences confined to the lengths of α-helices H1, H3, H4, and H6A. Moreover, the inter-subunit interactions mediated by α-helices H8 and H9 in TM1030 (JCSG) are largely retained in TM1030 (MCSG), even though the biological dimer in TM1030 (JCSG) is formed from subunits related by twofold NCS, while the TM1030 (MCSG) subunits are related by exact crystallographic symmetry. The calculated total buried surface area between the monomers in TM1030 (JCSG) and TM1030 (MCSG) is also quite comparable [1497 Å2 (JCSG) vs. 1401 Å2 (MCSG)]. In addition, the residues involved in inter-subunit interactions are largely unperturbed in these structures suggesting that the conformational changes between the two TM1030 structures are unlikely to be caused by the inter-subunit/crystal-packing interactions. When the structural superpositions are restricted to individual domains of TM1030 [Fig. 2(B,C)], regions of large conformational differences are mostly confined to the regulatory domain (RMSD of 1.75 Å for 153 Cα atoms) rather than in the DNA-binding domain (RMSD of 0.5 Å for 47 Cα atoms).
Table II. Crystallization Conditions Used for TM1030 by JCSG and MCSG
0.05M Na citrate pH 4.5
0.2M citrate pH 4.5
30% PEG 2000 monomethylether
30% PEG 8000
P 21 21 2
Unit cell (Å, °)
a = 56.0, b = 65.7, c = 55.7
a = 50.5, b = 87.1, c = 59.2, β = 110.7
While searching for a plausible basis for the conformational differences in the regulatory domain, we identified a ∼12 Å deep cavity in each of the TM1030 subunits that is located within the helical bundle of the regulatory domain [Fig. 3(A)]. The binding pocket, whose total volume is approximately 2000 Å3, is predominantly lined by hydrophobic residues. The TM1030 cavity has a 10–17 Å wide opening, that is formed by residues from α-helices H6A, H7, H7A, and H8 [Figs. 1(A) and 3(A)], which is likely to serve as an entrance to this putative binding pocket. The location of the cavity is in proximity to the ligand-binding pocket of TetR, but the volume of the cavity and the nature of the residues lining the cavity in the two proteins are quite different. Interestingly, each TM1030 (JCSG) subunit contains a semi-circular region of positive electron density in both the omit Fo − Fc and 2Fo − Fc maps in the ligand-binding pocket, indicative of a bound ligand (Fig. 3(B,C)]. However, such density was not observed in the TM1030 (MCSG) structure. The residues within 4 Å of this electron density in TM1030 (JCSG) are T59, L62, and F66 from α-helix H4; W85, I86, and K89 from α-helix H5; S124, Q125, and F128 from α-helix H7 and helix H7A; and F158, F161, E162, and Y165 from α-helix H8. The bound ligand is surrounded by hydrophobic, polar, and electrostatic groups including a cluster of aromatic rings. No biologically relevant ligand that could fit such electron density was added during the protein purification or crystallization of TM1030 (JCSG). Consideration of the shape and length of the density suggests it might represent a lipid molecule acquired in vivo within the E. coli host cells used for heterologous expression. However, the electron density is not consistent with any lipid with a head group or a carboxylate, such as palmitic acid. On the other hand, the density can be modeled by a relatively short fragment of PEG, in particular heptaethylene glycol [Fig. 3(D)], that probably originates from the crystallization solution (see e.g. Koepke et al., 2003; Zhu et al., 2006).32, 33 Nevertheless, as the precise identity of the ligand molecule has yet to be determined, we modeled and deposited it in the PDB as an UNL. The absence of a ligand in TM1030 (MCSG) together with the difference in conformation strongly suggests that this structure represents the apo-form of TM1030. A DNA-binding model for TM1030 based on TetR-tetO complex shows that the two DNA-binding domains from apo-TM1030 fit into the major groove of dsDNA, whereas the ligand-bound form of TM1030 is not in a favorable conformation to bind dsDNA [Fig. 3(E)], suggesting that TM1030 is most likely a transcription repressor.
One of the fundamental means by which bacteria adapt to varying environmental conditions is based on the regulation of gene expression at the transcriptional level.1–3, 34 Structural information regarding these transcriptional regulators is crucial to our understanding of how transcriptional regulatory networks control the microbial responses to different environmental challenges, including multidrug resistance, solvent tolerance, stress response, and pathogenesis. The efforts of the PSI-funded SG centers have resulted so far in the determination of nine TetR-like protein structures. Although these TetR-like structures share a high degree of overall fold similarity, their structures, particularly those of the regulatory domains, are very divergent and cannot be readily predicted. The ability to create novel binding sites for various effectors within the regulatory domain of proteins is perhaps driven by mutations that have little effect on the overall three-dimensional structure, but exert a large effect on the plasticity of effector binding sites. A detailed structural analysis, including identification of a biologically relevant ligand for TM1030, will offer insights into the structural basis of its transcription repressor function.
Portions of this research were carried out at the Stanford Synchrotron Radiation Laboratory (SSRL), the Advanced Light Source (ALS), and the Advanced Photon Source (APS). The SSRL is a national user facility operated by Stanford University on behalf of the U.S. Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research, and by the National Institutes of Health (National Center for Research Resources, Biomedical Technology Program, and the National Institute of General Medical Sciences). The ALS is supported by the Director, Office of Science, Office of Basic Energy Sciences, Materials Sciences Division, of the U.S. Department of Energy under Contract No. DE-AC03-76SF00098 at Lawrence Berkeley National Laboratory. Use of the Argonne National Laboratory Structural Biology Center beamlines at the APS was supported by the U. S. Department of Energy, Office of Biological and Environmental Research, under Contract No. W-31-109-ENG-38.