Solution NMR structure of the cold-shock protein from the hyperthermophilic bacterium Thermotoga maritima


H. R. Kalbitzer, Institut für Biophysik und physikalische Biochemie, Universität Regensburg, D-93040 Regensburgh, Germany. Fax: + 49 941 9432479, Tel.: + 49 941 9432594, E-mail:


Cold-shock proteins (Csps) are a subgroup of the cold-induced proteins preferentially expressed in bacteria and other organisms on reduction of the growth temperature below the physiological temperature. They are related to the cold-shock domain found in eukaryotes and are some of the most conserved proteins known. Their exact function is still not known, but translational regulation, possibly via RNA chaperoning, has been discussed. Here we present the structure of a hyperthermophilic member of the Csp family. The NMR solution structure of TmCsp from Thermotoga maritima, the hyperthermophilic member of this class of proteins, was solved on the basis of 1015 conformational constraints. It contains five β strands combined in two antiparallel β sheets making up a β barrel structure, in which β strands 1–4 are arranged in a Greek-key topology. The side chain of R2, which is exclusively found in thermophilic members of the Csp family, probably participates in a peripheral ion cluster involving residues D20, R2, E47 and K63, suggesting that the thermostability of TmCsp is based on the peripheral ion cluster around the side chain of R2.


Bacillus caldolyticus


Bacillus subtilis


Bacillus stearothermophilus


Thermotoga maritima


Escherichia coli


cold-shock protein


heteronuclear single quantum coherence.

The cold-shock response in micro-organisms is a transient phenomenon affecting the growth rate of the cell and the saturation of fatty acids as well as the rate of synthesis of DNA, RNA, and protein at temperatures significantly lower than the normal physiological temperature [1]. For most proteins, the expression under cold shock is dramatically decreased, whereas for a few it is increased. Of the latter, one group of small acidic proteins shows an extremely high induction level and high affinity to single stranded nucleic acids [2]. Because of their high sequence homology, they are classified together in the family of cold-shock proteins (Csps) (reviewed in [3–5]). The biological function is still not known, but translational regulation, possibly via RNA chaperoning, is the possibility currently favored [6].

The first hyperthermophilic member of this family has been cloned from Thermotoga maritima (TmCsp); its sequence and physicochemical characteristics were recently reported [7]. TmCsp is a small globular protein of 66 amino acids with a molecular mass of 7474 Da. It exhibits extreme intrinsic stability making it the most thermostable Csp known at present [7].

The Csps with known 3D structure are Greek-key β barrel proteins and belong to the ‘OB-fold family’[8], which includes the known structures of CspA from Escherichia coli[9–11], CspB from Bacillus subtilis[12,13], and CspB from Bacillus caldolyticus[14], as well as several oligonucleotide-binding and/or oligosaccharide-binding proteins. The structural analysis of CspA from E. coli revealed putative RNP1 and RNP2 sequence motifs commonly found in ssRNA-binding proteins [15,16]. These structural features suggest that all Csps may function as single stranded nucleic acid-binding proteins [9–14] and possibly have a regulatory role. It has been shown that CspA activates transcription of the hns and gyrA genes encoding H-NS proteins and subunit A of DNA gyrase, which are also cold-inducible [17,18]. The single stranded nucleic acid interaction has been verified by chemical-shift perturbation analysis of complexes between CspA and ssDNA [10], ssRNA-binding gel-shift assays [19,20], tryptophan fluorescence-quenching studies [21], and site-directed mutagenesis of the ssDNA-binding function of E. coli CspA [21] and B. subtilis CspB [22]. For EcCspA, the binding specificity has not yet been determined; for ssDNA the sequences CCAAT and ATTGG seem to be preferred while ssRNA is bound unspecifically. A thorough analysis of the interaction of BsCspB with ssDNA templates revealed that BsCspB preferentially binds to polypyrimidine but not polypurine ssDNA templates [2]. Thymine-based ssDNA templates bind with high-affinity and salt-independently, whereas binding of cytosine-based ssDNA templates is strongly salt-dependent, indicating that a large electrostatic component is involved in the interactions [2]. On binding, each BsCspB seems to cover a stretch of 6–7 thymine bases on T-based ssDNA [2]. The binding of BsCspB to T-based ssDNA template is enthalpically driven, indicating the possible involvement of interactions between aromatic side chains on the protein with the thymine bases [2]. The precise role of the single stranded nucleic acid-binding function in the cell biology of the cold-shock response is still not fully understood (reviewed in [1,23,24].

Regarding their relative intrinsic stabilities against guanidinium hydrochloride and temperature, bacterial Csps were found to reflect the optimum growth temperatures of their mesophilic, thermophilic, and hyperthermophilic origin. The corresponding melting points are: Tm(BsCspB) ≈ 325 K, Tm(EcCspA) ≈ 333 K, Tm(BcCspB) ≈ 345 K, Tm(TmCsp) ≈ 360 K [25–28]. In this respect, the recently published crystal structure of BcCspB [14] showed that the location, rather than the number of charged residues, is important for stability. Given the close sequence similarity of Csps, what is the mechanism by which TmCsp gains its higher stability compared with its mesophilic and moderately thermophilic counterparts? A prerequisite for understanding its thermostability is the determination of its structure.

Materials and methods

Protein purification

Protein expression of unlabeled protein was performed as described [7]. For the expression of 15N isotopically enriched protein, E. coli BL21 (DE3) cells containing the plasmid coding for TmCsp were grown in M9 minimal medium containing 0.5 g·L−1 15NH4Cl and 1 g·L−1 glucose at 310 K (37 °C) in the presence of 500 µg·mL−1 ampicillin to an D550 of 1. Expression was induced by adding 1 mm isopropyl β-d-thiogalactopyranoside, and bacterial growth was continued at 299 K (26 °C) for about 16 h. Purification of labeled protein was performed as described [7]: to remove the bulk of the E. coli proteins without significant coprecipitation of TmCsp, the cell-free extract was diluted fivefold and then heated to 80 °C for 30 min. Pure TmCsp was obtained after hydrophobic interaction chromatography at pH 8 and size exclusion chromatography. The final yields were between 5 and 20 mg pure 15N-enriched TmCsp per liter of cell culture. NMR samples of unlabeled and 15N-labeled TmCsp contained 1.5 mm protein in 50 mm sodium phosphate buffer (pH 6.5), 20 mm NaCl, and 0.2 mm EDTA (sodium salt) in 1H2O/2H2O (92%/8%) or 2H2O.

NMR spectroscopy

The NMR experiments were carried out on Bruker DMX 500-MHz, Bruker DRX 600-MHz and Bruker DMX 800-MHz spectrometers. 1H-1H 2D NOESY spectra were recorded at 500 and 800 MHz as described by Jeener et al. [29] with a mixing time of 80, 100, and 150 ms at temperatures 288 K and 303 K. At 800 MHz, 1H-1H 2D TOCSY spectra were recorded as described by Braunschweiler and Ernst [30] using an MLEV-17 decoupling sequence [31] of 60 ms for isotropic mixing at temperatures of 288 K, 303 K and 318 K and 1H-1H 2D DQF-COSY spectra at 303 K. Phase-sensitive detection in ω1 was obtained by the TPPI method [32]. For echo suppression, a pair of orthogonal spin lock pulses of 5 ms and 2.5 ms were applied before the low-power water suppression pulse. 15N-resolved 3D NOESY-heteronuclear single-quantum coherence (HSQC) (mixing time 150 ms) and TOCSY-HSQC experiments (mixing time 80 ms) using TPPI in t1, and echo/anti-echo gradient selection in t2[33], and 15N GARP decoupling in t3 were carried out on a Bruker DMX-500 operating at a proton frequency of 500 MHz and a 15N frequency of 50.68 MHz at 303 K. Proton resonances of the side chains were assigned from the 2D-TOCSY spectrum recorded with 60 ms mixing time at 800 MHz both in 1H2O and 2H2O and from the 15N-resolved TOCSY-HSQC spectrum recorded with 80 ms mixing time at 500 MHz in 1H2O. In addition, sequence-specific resonance assignments were carried out on the basis of HNCA, CBCA(CO)NH, HNCO all of them recorded on the 600-MHz spectrometer in 1H2O at 303 K. For the spin-system recognition, a HCCH-TOCSY spectrum in 2H2O was recorded at the same magnetic field strength and 303 K. Distance restraints were obtained from the two homonuclear 2D NOESY spectra with 80 ms mixing time at 800 MHz in 1H2O and 2H2O and from the 15N-resolved NOESY-HSQC with 150 ms mixing time at 500 MHz in 1H2O. The NOE evaluation from the 3D-NOESY spectra was iteratively refined using NOE back-calculation methods based on the complete relaxation matrix formalism [34,35]. Dihedral φ angle restraints were based on the 3JHNHα coupling constants measured in high-resolution DQF-COSY experiments (digital resolution of 8192 complex points in t2). Slowly exchanging amide protons were identified from a series of 1H-15N HSQC spectra recorded in 2H2O buffer.

The proton chemical shifts were referenced to sodium-2,2-dimethyl-2-silapentane-5-sulfonate (DSS) used as internal reference. The 15N chemical shifts were indirectly referenced to DSS [36] using the frequency ratio given by Wishart et al. [37]. Spectral analysis, peak picking, volume integration, and relaxation matrix back calculation were performed using the program aurelia[38].

Structure calculations

Approximate interproton distances were obtained from the NOE cross-peaks in the 2D 1H-1H 2D NOESY spectra deriving classifications of very strong (up to 0.2 nm), strong (up to 0.25 nm), medium (up to 0.35 nm), weak (up to 0.45 nm), and very weak (up to 0.55 nm) distances with upper bound distance restraints set to 0.35 nm, 0.40 nm, 0.50 nm, and 0.60 nm, respectively, as described elsewhere [39]. In the case of spectral overlap, NOE distance restraints were derived from the 3D 15N-resolved NOESY-HSQC spectra, referencing the NOE intensities to a known distance of 0.23 nm and setting an upper bound distance restraint of an additional 40%. The minimum sum of the van der Waals radii (0.18 nm) was used as the lower bound distance restraints.

The amide hydrogen-bond donors were identified as slowly exchanging amide protons, and acceptors were identified from the constraints imposed by interstrand NOEs. In the final structure calculations, the protein backbone hydrogen-bond restraints were included, using upper limits of 0.19 nm for NH-O and 0.29 nm for N-O distances.

J-coupling constants were determined by 2D DQF-COSY experiments at 500 MHz and 303 K in 1H2O, using a sinebell window apodizaton. Peak-to-peak separation was determined with the program aurelia[38], and J-coupling was taken as the average distance in the ω2 direction of the multiplet components. The backbone torsion angle φ was determined using the Karplus relation, and an estimated error of 20% was added in the structure calculation.

Calculations were performed using the program xplor, version 3.851 [40] and the dynamic simulated annealing protocol for extended strand starting structures [41]. In the structure calculation, 1015 conformational NMR constraints were included (Table 1). The initial structure was energy minimized with 1000 cycles of Powell minimization. High-temperature dynamics were run for 30 ps at an initial temperature of 1000 K. The system was then slowly cooled to a temperature of 100 K in 50-K steps over a period of 20 ps. At 100 K, a final stage of 2000 steps of Powell minimization was performed to yield the final simulated annealing structures. The stereochemical quality of the structures was examined using the program prochecknmr[42].

Table 1. Structural statistics.
 Intraresidual (i, i)358
 Sequential (i, i + 1)283
  Backbone–side chain153
  Side chain–side chain  5
 Intermediate range (i, i + n;n ≤ 5) 47
  Backbone–backbone  8
  Backbone–side chain 29
  Side chain–side chain 10
 Long range (i,j;j > i + 5)245
  Backbone–backbone 89
  Backbone–side chain 98
  Side chain–side chain 58
φ-Angle constraints 28
Hydrogen bonds 54
Structural statistics for the 21 lowest-energy structures
(from 50 calculated)
  • a

     All backbone atoms, values in parentheses all non-hydrogen atoms.

Etotal5140.6 ± 258.8 kJ·mol−1
ENOE2269.2 ± 190.6 kJ·mol−1
NOE violations > 0.05 nm6.95 ± 1.65
Edihed190.0 ± 29.7 kJ·mol−1
Ebond70.4 ± 15.1 kJ·mol−1
Eangle1200.4 ± 82.9 kJ·mol−1
EvdW952.9 ± 66.6 kJ·mol−1
Eimp231.5 ± 23.9 kJ·mol−1
 Amino acid 1–66 (whole protein)0.094 nm (0.151 nm)
 Core regions0.025 nm (0.087 nm)

Analysis of backbone torsion angles

We used the program cyclist [43] to calculate the mean of the backbone torsion angles in our ensemble of 21 structures. For each single structure, the backbone angles were calculated using the program aurelia[38]. To compare the backbone torsion angles of TmCsp with the other known structures of Csps, the alignment of BsCspB, EcCspA, BcCspB and TmCsp displayed in Fig. 1 was used to adjust the positions of the amino acids. The gaps between the different amino-acid sequences were filled manually with x for the torsion angle values to keep them unprocessed at this position [44]. In the case of the NMR structure of EcCspA [11], for which an ensemble of 16 structures has been deposited in the Protein Data Bank (PDB), we calculated the mean of the backbone torsion angles of these.

Figure 1.

Sequence alignment of 30 prokaryotic Csp sequences. BACST, B. stearothermophilus; LISMO, Listeria monocytogenes; BACSU, B. subtilis; BACCE, B. cereus; LACPL, Lactobacillus plantarum; PSEAE, Pseudomonas aeruginosa; STRCO, Streptomyces coelicolor; SALTY, Salmonella typhimurium; STIAU, Stigmatella aurantiaca, PSEFR, Pseudomonas fragi; ECOLI, E. coli; ARTGO, Arthrobacter globiformis; MYCTU, Mycobacterium tuberculosis; STRCL, S. clavuligerus; HAEIN, Haemophilus influenzae. Residues conserved more than 80% are colour coded in red, residues conserved more than 60% are colour coded in yellow, and residues conserved more than 40% are colour coded in light blue. The pairwise identity of TmCsp with BcCspB is 63%, TmCsp with BsCspB is 64%, TmCsp with EcCspA is 53%, BcCspB with BsCspB is 82%, BcCspB with EcCspA is 54% and BsCpB with EcCspA is 57%.

BioMagResBank and PDB accesion numbers

A full list of 1H, 15N and 13C chemical shifts has been deposited in the BioMagResBank under accession number 4895. The co-ordinates of TmCsp have been deposited with the PDB under accession number 1G6P.


Assignment of the NMR resonance lines

Figure 2 shows the 2D 15N-1H HSQC spectrum of TmCsp at pH 6.5 and 303 K. As shown by MS, recombinant TmCsp expressed in E. coli contains 66 amino acids, with methionine as N-terminal amino acid; this is also found in the natural protein [7]. All expected backbone amide cross-peaks for residues R2–E66 (except for P57 which contains only an imino group) are observed. As usual, the cross-peak of the N-terminal amino group is not visible because the protons exchange too rapidly with the bulk water. There are 58 well-resolved backbone 15N-1HN signals and two totally overlapping backbone 15N-1HN signals, which are assigned to residues K63/Q51 and F48/V46, and two partially overlapping backbone 15N-1HN signals assigned to H28/V65 and E47/F8 in the 2D 15N-1HN HSQC spectrum. The NεH resonance of the side chain of R2 is folded into the spectrum. The NMR samples contained a minor amount of Csp with the N-terminal methionine formylated, as shown directly by NMR spectroscopy and independently confirmed by MS. This leads to a couple of doubled signals in the NMR spectra which could be sequentially assigned (see e.g. the signal of the amide proton of formylated M1, labeled as form. M1 in Fig. 2). The sequential assignment was performed using a set of HNCA and CBCACONH experiments. The assigned primary structure was identified in the 2D spectra (2D 1H-1H TOCSY, and 2D 1H-1H NOESY) in 1H2O and 2H2O and 3D spectra (15N-1H TOCSY-HSQC and 15N-1H NOESY-HSQC) in 1H2O as well as in the HCCH-TOCSY spectra. All proton, nitrogen, carbon α, carbon β, and carbonyl resonances could be assigned (see Materials and methods). The secondary-structure elements and the long-range NOEs extracted from the 2D and 3D NOESY are depicted in Fig. 3.

Figure 2.

2D 15N-1H HSQC spectrum of TmCsp at pH 6.5 and 303 K. Backbone amide 15N-1HN correlation peaks are labeled using the one-letter code followed by the sequence position number. Side-chain amide 15N-1H correlation peak pairs are connected with solid lines and indicated with an asterisk (*). The side-chain indole 15Nε1H correlation peaks of W7 and W29 are also labeled. The signals of the side-chain amide of R2 is folded into the spectrum (R2*). Because in a small population of the protein the N-terminal methionine (form. M1) is formylated, a couple of signals that belong to the N-terminal formylated subspecies are doubled.

Figure 3.

Summary of sequential and intermediate-range NOE connectivities, amide hydrogen life times, 3JNHα coupling constants, and deviations of 1Hα, 13Cα and 13C1 chemical shifts from random-coil values. The NOE correlations were determined from 2D NOESY and 3D 15N-resolved NOESY-HSQC spectra, all recorded at pH 6.5 and 303 K (30 °C). Sequential NOE intensities d(i,i + 1) between amino acid i and i + 1 are represented by the bar heights as strong, medium, or weak. For medium-range NOEs, lines denote the pairs of protons correlated. The life times NHex of slowly exchanging backbone amides are indicated by the height of the bars: high intensity, not exchanged after 17 h; medium intensity, exchanged after 6 h; weak intensity, exchanged after 30 min. Values of the conformation-dependent secondary shifts Δδ1Hα, Δδ13Cα and Δδ13C′ are plotted with solid bars. Secondary-structure elements deduced from the NOESY data are indicated below the NOE data: β strands by arrows, an α-helical turn by a circle, and unstructured regions by lines.

Characterization of secondary-structure elements in TmCsp

Secondary-structure elements of TmCsp were identified using a combination of data, including medium-range and longe-range interstrand NOEs, 3JHN-Hα coupling constants, amide 1H–2H exchange rates and deviations of backbone atom shifts from random-coil values derived from tetrapeptides [45]ΔδHα, Δδ13Cα and Δδ13C′ values, where ΔδHα=inline image − inline image and Δδ13Cα,′ = δ13inline image − δ13inline image. From these data (Fig. 3), five β strands corresponding to segments containing the residues R2–D9, G13–K19, D24–W29, G43–Q51, and Q58–V65 can be identified. Interstrand hydrogen bonds were determined on the basis of the interstrand backbone–backbone NOEs and the locations of slowly exchanging amide protons. From the interstrand NOEs the five β strands can be arranged into an antiparallel β pleated sheet (Fig. 4), in which β strands 1–4 form a Greek-key topology. Strands 1 and 5 contain β bulges at positions 6 and 61, respectively. In general, the amide 1H–2H exchange rates for TmCsp at pH 6.5 and 303 K were found to be rather low. In particular, the amide protons of residues G3–W7, Y14–T18, V25, V27, Q44, and V46–E49 did not exchange after 17 h at 303 K, suggesting high stability of the β sheet. Only the amide protons of the residues at the termini of the protein (R2, E66), near the bulge position in β1 (F8) and β5 (H61) and in the loop regions S10–K12, D20–D24, S30, A31, E33–L40, E42, G43, I50, and E52–P57, exchanged so rapidly that they were not detectable in the HSQC spectrum after the sample had been dissolved in D2O (that is after 20 min).

Figure 4.

Schematic diagram of the β sheet topology of TmCsp. Arrows denote unambiguous interstrand NOEs. Dashed lines represent hydrogen bonds derived from the hydrogen exchange and NOE data. In the lower left corner, arrows indicate the direction and location of the five strands of antiparallel β sheet.

3D structure of TmCsp

The 3D structure of TmCsp was calculated with the molecular dynamics program xplor as described in Materials and methods. The constraints used for the simulated annealing approach and the structural statistics are summarized in Table 1. Figure 5 shows the distribution of the NOE-derived distance restraints along the amino-acid sequence of the protein with an average number of 13.9 distance constraints per residue.

Figure 5.

Plot of the number of 1H-1H NOE constraints as a function of the position of amino acid i in the sequence. Heavily shaded bar, intraresidual NOEs d(i,i); solid bar, sequential NOEs d(i,i + 1); open bar, intermediate-range NOEs d(i,i + n; 1 < n ≤ 5); lightly shaded bars, long-range NOEs d(i, i + n;n > 5).

The overall solution structure of TmCsp is well defined by our NMR data as shown in Fig. 6A. It depicts superposition of the backbone atoms of a family of the 21 lowest-energy conformers of TmCsp generated with xplor. Large structural differences are only found for a part of the long surface loop between β strands 3 and 4, ranging from E35 to T39. As Fig. 5 shows, only a relatively small number of NOEs were observed in this part of the structure, indicating that this part of the structure may be mobile or disordered in solution. The structures calculated by xplor had the tendency to cluster into two groups with different conformations for loop L3; however, it was not possible to verify unambiguously the existence of two local conformational states on the basis of NOE patterns. As the use of a closely related molecular dynamics program (CNS) with the same restraint files as used for the xplor calculations could not reproduce two well-separated conformational states, at present the occurrence of the two states can be considered consistent with the experimental data but not proven. Backbone rmsds are 0.094 nm for all residues to the mean structure (Table 1). The ordered regions within the β sheets are much better defined and have a backbone rmsd of 0.025 nm to the mean structure. Apart from residues in poorly defined regions of the protein, all φ and ψ values occur in low-energy regions of the Ramachandran plot. Figure 6B shows a representative conformer of the family of TmCsp structures indicating the secondary-structure elements identified by the program molmol[46].

Figure 6.

Stereoviews of TmCsp. (A) Stereoview of a superposition of the backbone atoms (N, Cα, and C′) of the 21 accepted structures of TmCsp. The corresponding rmsd value is 0.094 nm. (B) Stereoview of a representative ribbon diagram of TmCsp. The secondary-structure elements identified by the program molmol are indicated. (C) Stereoview of the solution NMR structure of TmCsp (ribbon diagram, side view rotated). Location of a possible peripheral ion cluster. The side-chain atoms of the basic residues K63 and R2 are shown in blue, and the side-chain atoms of the acidic residues E47 and D20 are shown in red.

A characteristic and useful parameter for the description and characterization of protein structures determined by X-ray crystallography or NMR spectroscopy is the backbone torsion angle. We used the program cyclist [43] to visualize the differences between the known structures of Csps and our ensemble of 21 structures. Figure 7A shows a comparison of TmCsp with the NMR-derived structure of EcCspA [11], Fig. 7B with the NMR-derived structure of BsCspB [13], Fig. 7C with the crystal structure of EcCspA [9], Fig. 7D with the crystal structure of BsCspB [12], and Fig. 7E with the crystal structure of BcCspB [14]. Generally, the backbone torsion angles in the β strand regions of our TmCsp structure are similar to those observed in the β strand region of the other Csp structures. In the loops, especially the long loop L3, there is great variability between the backbone torsion angles of all the Csp structures. The angular standard deviation of backbone dihedral angle φ of L40 is rather large in TmCsp, indicating that the value of this angle is a main determinant for the possible existence of two states of L3.

Figure 7.

Comparison of the backbone torsion angles φ with those in different Csps. The mean values of the φ angles of the TmCsp (21 conformers) are represented as bars; the standard error is indicated. The backbone torsion angles of the NMR structures of (A) EcCspA (16 conformers [11]) and (B) BsCspB [13] and the crystal strucures of (C) EcCspA [9], (D) BsCspB [9], and (E) BcCspB [14] are indicated by black diamonds. The numbers indicate the position of amino acids in the TmCsp sequence. Secondary-structure elements deduced from the NOESY data are indicated below: β strands by arrows, an α-helical turn by a circle, and unstructured regions by lines.


Solution structure of TmCsp

From the NMR structure determination we conclude that TmCsp forms a closed β barrel consisting of five β strands: β1, R2–D9; β2, G13–K19; β3, D24–W29; β4, G43–Q51; β5, Q58–V65. The first four, β1–β4, show a Greek-key fold. β1 and β5 contain β bulges including the residues V5–K6–W7–F8 in β1 and A59–A60–H61–V62 in β5. They break β1 into β1′ and β1′′, and β5 into β5′ and β5′′. As a result, two β sheet surfaces are formed, β sheet 1 by strands β1′′–β2–β3–β5′, and β sheet 2 by strands β1′–β4–β5′′ (in EcCspA similar sheets are defined by strands β1′′–β2–β3 and β1′–β4–β5′–β5′′[11]). The core region of TmCsp showed very slowly exchanging backbone amide protons; the amide protons of residues G3–W7, Y14–T18, V25, V27, Q44, and V46–E49 did not exchange at all in 17 h at 303 K. This indicates that these amide groups are involved in strong hydrogen bonds well shielded against the bulk water. In line with this observation, the structurally important hydrophobic residues V5, I17, V25, V27, V46, and F48 are located in these regions. Their side chains point to the interior of the protein and form a hydrophobic core together with I50 and V62. The β barrel is stabilized by the hydrophobic interactions between the two antiparallel β sheets and the hydrogen-bond network pattern within the β sheets. β strands 1 and 2 as well as 2 and 3 are connected by very short (three and four residues in length) surface loops L1 and L2 , respectively (L1, S10–K12; L2, D20–G23). Both loops form very tight turns. β strands 3 and 4 are connected by a very long surface loop (L3) 13 residues in length (S30–E42), which contains at the beginning a very short α-helical turn at positions S30, A31, I32 and E33. Loop L4 between β strands 4 and 5 is also located on the surface of the protein and encompasses six residues (E52–P57).

One important structural feature of TmCsp is the large number of charged residues; this implies a relatively high probability of ion-pair formation, which could contribute to the high stability of the protein [47]. A possible peripheral ion cluster of positively and negatively charged residues includes the side chain of R2: D20–R2–E47–K63 (Fig. 6C). In all our structures, these side chains are spatially very close to each other (i.e. the distance from the nitrogen atom of one guanidinium group of R2 to the γ-carbon atom of D20 is 6.15 ± 0.09 Å, the distance from the nitrogen atom of the other guanidinium group of R2 to the δ-carbon atom of E47 is 6.37 ± 0.13 Å, and the distance from the δ-carbon atom of E47 to the ζ-nitrogen atom of K63 is 5.46 ± 1.46 Å), which allows the formation of a peripheral ion cluster. The arginine side chain plays an important role in the geometry of this ion cluster and is located in the center. It is only conserved in thermophilic or hyperthermophilic members of the Csp family: BstCspB, BcCspB, and TmCsp. In BcCspB, the arginine corresponding to R2 of TmCsp is at position 3 in the amino-acid sequence. R3 was shown to be very close to the glutamate side chain of E46 in BcCspB, indicating a strong localized ionic interaction between the two side chains [14].

In a site-directed mutagenesis study, Perl et al. [48] showed that, in BcCspB, only the two side chains of R3 and L66 were important for thermostability. Changing the two corresponding positions in the mesophilic protein BsCspB (E66L and E3R) resulted in a thermostable protein similar to BcCspB, and inverting these two residues in BcCspB (L66E and R3E) to the BsCspB sequence decreased the thermostability to the value for the mesophilic protein BsCspB [48]. In TmCsp, a valine residue (V65) is found at the corresponding position of L66 in BcCspB, which forms a hydrophobic patch with V45, a hydrophobic residue unique to TmCsp and Csp from Streptomyces clavuligerus. As seen in BcCspB, the thermostability is based on the arginine R2 and the valine V65. In addition, the side chains of E49 and H61, as well as K6 and D24, which are also close together (i.e. 5.47 ± 2.47 Å for the δ-carbon atom of E49 to the ε2-nitrogen atom of H61, and 3.98 ± 0.19 Å for the ζ-nitrogen atom of K6 to the γ-carbon atom of D24), could form single peripheral ion pairs. The mutagenesis of H61N had no effect on protein stability in TmCsp [49]. Therefore, H61 does not contribute to a stable ion cluster which is displayed in the high standard deviation of 50% for the distance between H61 and E49 in our family of 21 structures. In contrast, the mutagenesis of D9 to an asparagine residue showed a considerable destabilizing effect [49], which is due to the fact that D9 is involved in the β sheet 2 formed by β strands β1′′-β235′.

Csps are thought to bind to nucleic acids. The corresponding binding regions include the RNA-binding motifs RNP1 and RNP2 commonly found in ssRNA-binding proteins [15,16]. As taken from the NMR structure, in TmCsp the side chains of the residues involved in the RNP1 (G13–Y14–G15–F16–I17) and RNP2 motif (V25–F26–V27–H28–W29) are located on the same side of the surface of the protein, supporting the hypothetical complex formation. Chemical-shift perturbation analysis of complexes between CspA from E. coli and ssDNA [9,10] showed that residues K10, W11, K16, F18, F20, F31 H33, F34, and K60 were involved in nucleic acid binding in CspA. In TmCsp, these residues correspond to K6, W7, K12, Y14, F16, F26, H28, W29, and K55. Of these, again K12 and K55 are located in loop regions on the surface of the protein; their interactions may be complemented by two further nearby lysine residues (K11 and K54).

Another property typical of Csps is the presence of a considerable number of solvent-exposed aromatic side chains. In TmCsp, these are W7, Y14, F16, F26, W29, and F37, five forming a cluster on one side of the protein surface (W7, Y14, F16, F26, and W29). Mutational analysis of solvent-exposed aromatic side chains in BsCspB showed that the mutation F38A has no effect on protein stability whereas F15A, F17A, and F27A decreased the stability dramatically [50]. In TmCsp, the corresponding residues are F37, Y14, F16, and F26. In our NMR-derived structure, residues Y14, F16, and F26 form a hydrophobic patch within the presumed nucleic acid-binding surface area, and are likely to be important for both the function and stability of the protein. F37 is located in a different region and does not seem to be essential for either function or stability. In all, the results derived from the mutagenesis in BsCspB [50] are in good agreement with the NMR-derived 3D structure of TmCsp. This is in line with the extreme phylogenetic conservation of the Csp family [6].

Comparison of the TmCsp structure with the NMR-derived structures of EcCspA and BsCspB and the crystal structure of BcCspB

A comparison of the surface charge distribution between the NMR-derived structures of EcCspA [10,11] and BsCspB [13] and TmCsp is shown in Fig. 8. TmCsp, the hyperthermophilic protein, shows a larger number of positive charges on the surface compared with its mesophilic counterparts. The high temperature leads to a decrease in the affinity for nucleic acids; this effect can be compensated for by a higher number of charges involved in the interaction. In EcCspA, the hydrophobic core is formed by the eight residues V9, I21, V30, V32, V51, F53, I55, and V67 [11]. In TmCsp, the corresponding residues (V5, I17, V25, V27, V46, F48, I50, and V62) are also present. One important feature of the EcCspA, BsCspB, BcCspB and TmCsp structures is the large number of solvent-exposed aromatic residues. As shown in Fig. 1, these aromatic residues are highly conserved in the family of Csps. In the case of TmCsp, the RNP1 and RNP2 RNA-binding motifs, K–G–Y–G–F–I and V–F–V–H–W, in β strands 2 and 3 differ slightly from those found in EcCspA and BsCspB: K–G–F–G–F–I (RNP1) and V–F–V–H–F (RNP2). It is worth mentioning that the tyrosine residue is only found in RNP1 from thermophilic micro-organisms and that tryptophan is only found in RNP2 from T. maritima. All these residues (including W7 and W29) are exposed as potential candidates for specific interactions with single stranded nucleic acids. As mentioned above, TmCsp contains two more basic residues in this region. The differences in the binding surface between the hyperthermophilic and mesophilic proteins may be essential for the specific regulatory functions of the proteins in their different natural environments.

Figure 8.

Comparison of the surface charge distribution of the solution NMR structures of EcCspA [10,11] and BsCspB [13] with the surface charge distribution of the solution NMR structure of TmCsp. Basic amino acids are shown in blue, acidic amino acids in red, and hydrophobic amino acids in white. Surface charges were calculated in molmol[46].

A recently published crystal structure of the thermophilic member of the Csp family, BcCspB, showed strong localized ionic interaction between Νη1 of the arginine side chain R3 and the Oε1 of the glutamate side chain E46 [14]. The corresponding residues in TmCsp are R2 and V45 (Fig. 1). Our NMR-derived structure shows the possibility of a peripheral ion cluster involving residues D20–R2–E47–K63. The arginine side chain exclusively found in thermophilic and hyperthermophilic Csps seems to play a major role in the thermostability of these proteins. In BcCspB, the arginine residue located in strand β1 seems to interact with the side chain of E46, the first residue located in strand β4 in BcCspB [14]. In contrast, the arginine side chain located in strand β1 in TmCsp seems to interact with the glutamate E47 located in the center of strand β4 in TmCsp. This may allow the involvement of more charged residues in localized ionic interactions, forming a whole peripheral ion cluster centered around the arginine side chain (D20–R2–E47–K63).


Given the close similarity of the known 3D structures of Csps, what is the mechanism by which TmCsp gains its higher stability compared with its mesophilic and moderately thermophilic counterparts? The multiple alignment of 30 prokaryotic Csp sequences (Fig. 1) with TmCsp allows the following conclusions. Regarding aromatic residues, two phenylalanine residues, which are highly conserved in other Csps, are replaced by tyrosine and tryptophan. The first is unique in Csps from thermophiles, while the latter is found only in TmCsp. As we pointed out, they are located in the single stranded nucleic acid-binding site and thus may account for the specific regulatory roles of the proteins. In general, the number of aromatic residues is high: five Phe, one Tyr, two Trp. At neutral pH, the number of charged residues is increased to more than one-third of the total number of amino acids; 24 in TmCsp compared with 19, 17, and 16 in BsCspB, BcCspB, and EcCspA, respectively. TmCsp contains the maximum Lys content; only four of its 11 Lys residues are conserved. The single Arg is only present in BcCspB, BstCspB, and TmCsp; this is apparently a characteristic of thermophiles. In BcCspB, the single Arg3 was shown to confer the thermostability [48] by interacting with Glu46 [14]. In TmCsp, this Arg seems to participate not only in a single ion pair but in a peripheral ion cluster together with Asp20, Glu47 and Lys63. With respect to hydrophobic residues, Val45 is unique in TmCsp and Csp from S. clavuligerus; in most other sequences, polar residues prevail at this position. In TmCsp, it forms a hydrophobic patch with Val65, where the corresponding side chain in BcCspB (Leu66) has been shown to be important for thermostability [48].

Bearing in mind that the stability of proteins is a cumulative effect of small increments [51], we can conclude that, in the case of TmCsp, a single peripheral ion cluster around the side chain of R2, as well as increased hydrophobic stacking of side chains at the surface area, are used to increase the thermal stability of TmCsp.


We are grateful to the Deutsche Forschungsgemeinschaft (Ja 78/34) and the European Union (Biomed Program) for supporting this work. M. G. acknowledges support by the Peter and Traudl Engelhorn Stiftung. We thank E. Hochmuth and D. Deutzmann for the mass spectroscopy.