A novel type of carbohydrate–protein linkage region in the tyrosine-bound S-layer glycan of Thermoanaerobacterium thermosaccharolyticum D120-70


P. Messner, Zentrum für Ultrastrukturforschung, Universität für Bodenkultur Wien, Gregor-Mendel-Str. 33, A-1180 Wien, Austria. Fax: + 43 1 478 91 12, Tel.: + 43 1 476 54, E-mail: crs@edu1.boku.ac.at


The surface-layer (S-layer) protein of Thermoanaerobacterium thermosaccharolyticum D120-70 contains glycosidically linked glycan chains with the repeating unit structure →4)[α-d-Galp-(1→2)]-α-l-Rhap-(1→3)[β-d-Glcp-(1→6)]-β-d-Manp-(1→4)-α-l-Rhap-(1→3)-α-d-Glcp-(1→. After proteolytic degradation of the S-layer glycoprotein, three glycopeptide pools were isolated, which were analyzed for their carbohydrate and amino-acid compositions. In all three pools, tyrosine was identified as the amino-acid constituent, and the carbohydrate compositions corresponded to the above structure. Native polysaccharide PAGE showed the specific heterogeneity of each pool. For examination of the carbohydrate–protein linkage region, the S-layer glycan chain was partially hydrolyzed with trifluoroacetic acid. 1D and 2D NMR spectroscopy, including a novel diffusion-edited difference experiment, showed the O-glycosidic linkage region β-d-glucopyranose→O-tyrosine. No evidence was found of additional sugars originating from a putative core region between the glycan repeating units and the S-layer polypeptide. For the determination of chain-length variability in the S-layer glycan, the different glycopeptide pools were investigated by matrix-assisted laser desorption ionization-time of flight mass spectrometry, revealing that the degree of polymerization of the S-layer glycan repeats varied between three and 10. All masses were assigned to multiples of the repeating units plus the peptide portion. This result implies that no core structure is present and thus supports the data from the NMR spectroscopy analyses. This is the first observation of a bacterial S-layer glycan without a core region connecting the carbohydrate moiety with the polypeptide portion.


heteronuclear multiple bond correlation spectroscopy


high-performance anion-exchange chromatography with pulsed electrochemical detection


heteronuclear single quantum coherence


matrix-assisted laser desorption ionization-time of flight


longitudinal eddy current delay difference spectroscopy


crystalline bacterial cell surface layer

Recent taxonomic analyses have led to a new classification scheme for bacteria belonging to the genus Clostridium[1,2]. The thermophilic clostridia now include the novel genera Thermoanaerobacter (cluster V) and Thermoanaerobacterium (cluster VII). Two decades ago, Sleytr & Thorne [3] investigated the crystalline cell surface layers (S-layers) of Thermoanaerobacter (formerly Clostridium) thermohydrosulfuricus L111-69 and Thermoanaerobacterium (formerly Clostridium) thermosaccharolyticum D120-70 by freeze-etching and chemical analyses (reviewed in [4]). Freeze-etched preparations showed the presence of a hexagonally arranged S-layer lattice on strain L111-69, whereas strain D120-70 was covered by an S-layer with square lattice symmetry [5]. The S-layers of both organisms were found to be glycosylated and were thus the first bacterial glycoproteins to be described [3]. Detailed structural analyses of several S-layer glycoproteins from different Tb. thermohydrosulfuricus strains showed the occurrence of novel O-glycosidic linkages via the hydroxyamino acid tyrosine [6–9]. In addition, core structures consisting of three α1,3-linked l-rhamnose residues were observed between the glycan repeating units and the S-layer polypeptide [8,9]. These data suggested a tripartite architecture for S-layer glycoproteins from the domain Bacteria, which has been found to exist in all bacterial S-layer glycoproteins investigated so far (reviewed in [10,11]). This comprises the S-layer glycan chain constructed of O-antigen-like repeats, a core region, and the polypeptide part, and thus resembles the structure of lipopolysaccharides of Gram-negative bacteria [12].

As differences between the S-layer glycoproteins of the taxonomically closely related strains T. thermosaccharolyticum and Tb. thermohydrosulfuricus have already been observed at the electron-microscopic level in freeze-etched preparations, we examined to what extent these differences were also reflected in the chemical composition and structural organization of these S-layer glycoproteins. The structures of the S-layer glycan repeating units of Tb. thermohydrosulfuricus L111-69 [8,13] and T. thermosaccharolyticum strains D120-70 [14] and E207-71 [15] have previously been elucidated. Until now, among the aforementioned organisms, structural information on the core region of the S-layer glycoprotein glycan has been available only for Tb. thermohydrosulfuricus L111-69 [8].

In this paper, we report the complete structure of the carbohydrate–protein linkage region of the S-layer glycoprotein of T. thermosaccharolyticum D120-70 and demonstrate the absence of a core region in this cell surface glycoconjugate. A detailed knowledge of the complete structure of the S-layer glycoprotein glycan of T. thermosaccharolyticum D120-70 is an absolute requirement for any future application of this glycoprotein in nanotechnology and biomimetics (reviewed in [16]).

Materials and methods

Growth of bacteria

The closely related strains Thermoanaerobacterium (formerly Clostridium) thermosaccharolyticum D120-70 and E207-71 were obtained from F. Hollaus (Österreichisches Zuckerforschungs-Institut, Tulln, Austria). To verify their taxonomic affiliations, pure cultures were maintained under anaerobic conditions on semisolid TYG agar [0.8% Bacto tryptone (Difco), 0.1% yeast extract (Oxoid), 0.6% glucose, 1% Bacto agar (Difco), 0.02% sodium sulfite, 0.01% sodium thiosulfate·5H2O, 0.001% FeSO4·7H2O, and 0.0001% resazurine] in the Gas-Pak system (Becton-Dickinson). For continuous culture in a 14-L Biostat C fermenter (Braun) at 60 °C, the pH of the TYG broth (FeSO4·7H2O and Bacto agar were omitted) was regulated at 5.8–6.2 by addition of NaOH. The fermentation was carried out at an average dilution rate, D, of 0.10 h−1[15].


Light microscopy and electron microscopy were performed according to published procedures [17].

Determination of 16S rRNA gene sequence

Genomic DNA extraction, PCR-mediated amplification of the 16S rRNA, and purification of PCR products were carried out using procedures described previously [18]. Purified PCR products were sequenced using the Taq Dye Deoxy Terminator Cycle Sequencing Kit (Applied Biosystems) as directed in the manufacturer's protocol. The Applied Biosystems 310 DNA Genetic Analyzer was used for the electrophoresis of the sequence reaction products.

Comparison of 16S rRNA gene sequences

The ae2 editor [19] was used to align the partial 16S rRNA gene sequences of T. thermosaccharolyticum strains D120-70 and E207-71 against the 16S rRNA gene sequences of the validly described species of the genera of thermophilic anaerobes within the low G + C Gram-positive phylum, available from the public databases. The strain designations and nucleotide sequence accession numbers of the analyzed sequences are as follows: Clostridium thermoamylolyticum DSM 2335 (X76743), Thermoanaerobacter ethanolicus JW-200T (L09162), Thermoanaerobacterium aotearoense JW/SL-NZ613T (X93359), Thermoanaerobacterium saccharolyticum DSM 7060T (L09169), Thermoanaerobacterium thermosaccharolyticum ATCC 7956T (M59119), Thermoanaerobacterium thermosulfurigenes E-100-69T (L09161), Thermoanaerobacterium xylanolyticum DSM 7097T (L09172), Moorella thermoautotrophicaJW701//5T (X58354), and Caldicellulosiruptor saccharolyticus Tp8T.6331T (L09178).

The 16S RNA gene sequence of T. thermosaccharolyticum strain D120-70 has been deposited in the GenBank database under the accession number AF247003. Pairwise similarity values were calculated using the ae2 editor [19].

Analytical methods

High-performance anion-exchange chromatography with pulsed electrochemical detection (HPAEC/PED), amino-acid analysis and SDS/PAGE were performed according to published procedures [15]. N-Terminal sequencing of glycopeptides followed standard protocols [8].

Preparation of truncated S-layer glycopeptides

The isolation and purification of both S-layer glycoproteins and S-layer glycopeptides essentially followed published methods [14]. Briefly, the glycopeptide mixture obtained after proteolytic degradation of the S-layer material of T. thermosaccharolyticum D120-70 with Pronase E (Sigma), which was eluted in the void volume of a BioGel P-4 (BioRad) column (1.5 × 120 cm), was subjected to cation-exchange chromatography on Dowex 50 W-X8, H+ forming, to remove coeluted peptide fragments. After further separation of the glycopeptides by BioGel P-30 (BioRad) chromatography (column dimensions: 1.5 × 120 cm), the final purification step of the appropriate pool was performed by chromatofocusing using the Polybuffer™ system (Amersham Pharmacia Biotech). The material (225 mg) was applied to a column (1 cm × 50 cm) of PBE 94™ exchanger gel, equilibrated with 25 mm Tris/HCl, pH 9.0, and eluted with 9 bed vol. Polybuffer 74™ (1 : 10 dilution), pH 4.5, developing a pH gradient between 9.0 and 5.8. Fractions of volume 2 mL were collected and the elution profile was recorded on-line by UV at 280 nm as well as by colorimetric determination of neutral sugars by the orcinol assay [20]. After desalting of the individual pools (CFI, CFII, CFIII) on a BioGel P-2 (BioRad) column (1.0 cm × 120 cm), their homogeneity was checked by analytical RP(C18)-HPLC as described elsewhere [21]. Subsequently, pool CFI was chromatographed on a semipreparative RP(18)-HPLC column. The appropriate material was collected and either used directly for NMR studies or subjected to partial hydrolysis with trifluoroacetic acid for truncation of the polymeric S-layer glycan chains. In a typical truncation experiment, 1.5 mg of the purified S-layer glycopeptide pool CFI of T. thermosaccharolyticum D120-70 was treated with 200 µL preheated 25% trifluoroacetic acid at 80 °C for 7 min. The reaction was stopped by immediately cooling down the reaction mixture on ice, diluting it fivefold with distilled water and drying the sample under an atmosphere of nitrogen. The acid was completely removed by three washes with distilled water. The truncated forms of S-layer glycocpeptides from 10 reactions were combined and applied to the RP(18)-HPLC column referred to above. The major hydrolytic fraction was isolated and analyzed by NMR.

Periodate oxidation and Smith-type hydrolysis

NMR data were supported by analysis of chromatofocused, periodate-oxidized and Smith-type-hydrolyzed glycopeptides. The protocol followed a three-step procedure which is described in detail elsewhere [14]. Briefly, it includes oxidation of glycopeptide material (pool CFII), solubilized in 0.1 m sodium acetate, pH 4.5, with 0.1 m sodium meta-periodate, reduction with sodium borohydride and hydrolysis with 0.5 m trifluoroacetic acid. The reaction products were finally recovered after chromatography on a BioGel P-2 column (1.0 cm × 120 cm) with 0.1 mm NaCl as eluent.

Separation of glycoforms

The chromatofocused glycopeptide pools CFI, CFII, and CFIII of T. thermosaccharolyticum D120-70 were analyzed with regard to their content of different glycoforms, i.e. identical peptide portions exhibiting differences in glycan chain length, by native polysaccharide PAGE [22,23]. Glycopeptide material equivalent to 20 µg carbohydrate was dissolved in a mixture containing 10 µL Tris/borate/EDTA buffer (89 mm Tris, 89 mm boric acid, 2 mm EDTA), pH 8.3, and 1 µL 2 m sucrose in Tris/borate/EDTA buffer (sample buffer) and loaded on a polyacrylamide/Tris/borate/EDTA gel consisting of a 5% stacking and a 12.5–20% resolving gel. Separation was performed at 4 °C, after pre-electrophoresis at 200 V for 60 min, for a duration of 4 kVh at a constant voltage of 250 V using the continuous, non-denaturing McLellan buffer system, pH 8.7 (50 mm Tris, 25 mm boric acid) [24]. Bands corresponding to individual glycoforms were vizualized after fixation with Alcian blue and oxidation with periodate in a silver-staining reaction [25–27]. For preparative isolation of glycoforms, 10 mg of pool CFIII, solubilized in 500 µL sample buffer, was applied to continuous native polysaccharide PAGE and eluted from the gel for 20 min at a constant current of 250 mA with McLellan buffer, pH 8.7, using the Whole Gel Eluter™ (BioRad) gel elution apparatus. Individual fractions were collected, lyophilized and desalted by gel-permeation chromatography using a Sephadex G-10 (Amersham Pharmacia Biotech) column (1.0 cm × 45 cm) connected to an FPLC system at a flow rate of 2 mL·min−1 (Amersham Pharmacia Biotech). The elution profile was recorded simultaneously with a refractive index detector and a UV detector at 220 nm. Fractions of interest were pooled, lyophilized and stored at −20 °C until use.

Matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) MS

The different glycopeptide pools of T. thermosaccharolyticum D120-70 (CFI, CFII, CFIII) and selected glycoforms from pool CFIII obtained after gel elution were further characterized by MALDI-TOF MS. For mass determination, aliquots of diluted samples were applied to a sample plate and air-dried. Matrix solution (2% 2,5-dihydroxybenzoic acid in water containing 30% acetonitrile) was then added and the samples were immediately dried under mild vacuum. Mass spectra were acquired on a Dynamo mass spectrometer (Thermo BioAnalysis, Santa Fe, New Mexico). The instrument was operated in the positive ion mode with a dynamic extraction setting of 0.3 and calibrated with a partial dextran hydrolysate [28].

NMR spectroscopy

All NMR spectra were recorded on a Bruker Avance DRX 600 NMR spectrometer using a 5-mm inverse triple resonance (1H, 13C, broad-band) probe with triple axis gradient coils at frequencies for 1H at 600.13 MHz and for 13C at 150.90 MHz. All samples (4.3 mg intact glycopeptide pool CFI; 1 mg of the partially hydrolyzed fraction of glycopeptide pool CFI; 6.7 mg of Smith degradation product from glycopeptide pool CFII) were dissolved in 0.6 mL D2O and measured at a temperature of 300 K, the chemical shifts and coupling constants for which are given in Table 3. In addition, some data were recorded at 280 K to use small chemical-shift deviations for analyzing heavily overlapping signals.

Table 3. 1H and13C NMR chemical shift data (in p.p.m.).
Sugar residue1234566′
  1. a  The chemical shift data in the first row are for 1H relative to external acetone (d = 2.225 p.p.m.), and in the second row for 13C relative to external dioxane (d = 67.40 p.p.m.); all data were acquired at 300 K. b For the arrangement of the sugar residues see Fig. 7, where A′ and B′ are located at the terminal nonreducing site, and C′′ and D′′ at the terminal reducing site of the glycan; d′′ and g are for the β-D-Glcp-(1→O)-L-Tyr fragment after partial hydrolysis of the glycan. c Numbers in parentheses are 1JH,C coupling constants in Hz. d Numbers in parentheses are JH,H coupling constants in Hz.

Abα-d-Galp-(1→   5.120 3.840 3.966 4.013 4.248 3.733
 (171.0) c     
A′α-d-Galp-(1→   5.082 3.840 3.942 4.022 4.265 3.733
B→4)-α-l-Rhap-(1→   5.124 4.086 4.097 3.618 4.111 1.378
2  94.4877.1569.1981.2969.1917.56
B′α-l-Rhap-(1→   5.124 4.080 3.958 3.538 3.973 1.284
 2  94.4877.0469.6872.7169.2017.20
C→4)-α-l-Rhap-(1→   5.063 4.042 3.932 3.694 4.085 1.319
C′′→4)-α-l-Rhap-(1→   5.166 4.083 3.958 3.708 4.102 1.3350
D→3)-α-d-Glcp-(1→   5.062 3.665 3.749 3.481 4.065 3.823
D′′→3)-β-D-Glcp-(1→   5.134 3.685 3.704 3.571 3.639 3.926
E→3)-β-d-Manp-(1→   4.896 4.304 3.733 3.730 3.566 4.227
6 101.1667.5377.7465.6476.02
Fβ-d-Glcp-(1→   4.540 3.315 3.501 3.383 3.457 3.926
Tyrosine αββ′γδε
GO)-l-Tyr 3.9773.2473100  7.284  7.125
  56.63   36.21130.42131.46117.78156.74
Sugar residue 123456
d′′β-d-Glcp-(1→   5.123  3.564  3.603  3.493  3.626  3.921
    (7.9) d
   100.77 73.6776.28 70.14 76.74
Tyrosine αββ′γδε

   56.69    36.30 130.36 131.59 117.80156.46

The parameters for the experiments normally used for the assignment in carbohydrate NMR (gradient-selected DQF-COSY, TOCSY, NOESY, gradient-selected HSQC and HMBC) are given elsewhere [29]. In order to complete the assignment, HSQC-TOCSY (12.5 KHz spin-lock field) and HSQC-NOESY (300 ms mixing time) [30] spectra were recorded using pulsed-field gradient selection and the echo/anti-echo time proportional phase incrementation procedure for pure absorption mode data. In addition, these experiments were also performed as 13C band-selective variants by applying 1-ms 13C inversion band-selective, uniform response, pure phase (I-BURP) [31] pulses on the rhamnose C6 methyl signals or on the anomeric carbons.

For the diffusion-edited experiments, performed on the major hydrolysis fraction originating from the partially hydrolyzed glycopeptide pool CFI, a longitudinal eddy current delay sequence with 1-ms sinusoidal bipolar gradient pulse pairs [32] was used, with the diffusion delay set to 150 ms. Two spectra were recorded, one with 5% and the second with 50% gradient amplitude. The resulting two 1H-NMR spectra were subtracted by scaling the signal intensity of the more slowly diffusing molecules to equal height, obtaining the resonances for the faster moving parts as difference spectrum.

All processing was performed off-line on Silicon Graphics workstations using the Bruker software XWIN-NMR 2.6. The two-dimensional data were zero filled, doubling the data points in the direct dimension, and in the indirect dimension data points were extended two or four times by linear prediction forward using 64 coefficients. In both dimensions, the data were multiplied with a 90 ° shifted square sine window function and the spectra were phase-corrected to absorption mode, except the HMBC spectra, which were calculated to magnitude mode. For the analyses the program azara 2.0 (provided by W. Boucher and the Department of Biochemistry, University of Cambridge, UK), and for spin system simulations the program xsim 970501 (provided by K. Marat, University of Manitoba, Winnipeg, MB, Canada) were also used on SGI workstations.


Morphological characterization and 16S rRNA gene sequence comparison

Light-microscopic examination of sporulating cells of T. thermosaccharolyticum D120-70 and E207-71 showed the typical appearance of drumstick-like cells with terminal spores, indicative of organisms belonging to the large group of clostridia [33] (not shown). To distinguish unambiguously the bacteria from other thermophilic anaerobes and to assess their taxonomic affiliations, a partial 16S rRNA gene sequence comparison was performed. The partial 16S rRNA gene sequences determined for T. thermosaccharolyticum strains D120-70 and E207-71, comprising 500 nucleotide positions at the 5′ end of the molecules, were found to be identical. An almost complete 16S rRNA gene sequence was determined for strain D120-70, comprising 1452 nucleotides (> 95% of the Escherichia coli sequence [34]). The 16S rRNA gene sequence similarity values of the sequence of T. thermosaccharolyticum strain D120-70 to other thermophilic anaerobe species of the low G + C Gram-positive phylum are in the range 93.0% (Caldicellulosiruptor saccharolyticus) to 99.0% (Thermoanaerobacterium thermosaccharolyticum). These values indicate that strain D120-70 is a member of the genus Thermoanaerobacterium and shows a high degree of relatedness to Thermoanaerobacterium thermosaccharolyticum. It could be considered that strain D120-70 represents a new species of the genus Thermoanaerobacterium given that the 16S rRNA gene sequence similarity between Thermoanaerobacterium thermosulfurigenes and Thermoanaerobacterium xylanolyticum is 99.3%. Further characterization and DNA–DNA homology studies are required to determine the species status of strain D120-70. Until such data are available, it is assigned as a strain of Thermoanaerobacterium thermosaccharolyticum.

Electron microscopy

The complete coverage of the cell surface of T. thermosaccharolyticum D120-70 with a squarely arranged S-layer lattice has previously been demonstrated [2,13]. The S-layer consists of identical glycoprotein monomers with an apparent molecular mass in the range 80 000–170 000 as determined by SDS/PAGE (not shown), and its morphological units possess a centre-to-centre spacing of ≈ 11 nm.

Characterization of S-layer glycoforms

In the present study, we characterized the complete structure of the S-layer glycoprotein glycan of T. thermosaccharolyticum D120-70 including its glycosidic linkage to the S-layer polypeptide. After thorough proteolytic degradation of the S-layer glycoprotein, which has a total carbohydrate content of ≈ 7%, with Pronase E and purification of the reaction mixture by several gel-permeation steps, including chromatography on BioGel P-4 and P-30 columns, and cation-exchange chromatography followed by chromatofocusing in the pH range 9.0–5.8, three glycopeptide pools, designated CFI, CFII and CFIII, were obtained (Fig. 1). Final purification of these pools was achieved by RP(18)-HPLC. Carbohydrate analysis by HPAEC/PED revealed in each pool mannose, rhamnose, galactose and glucose in the approximate molar proportions 1 : 2 : 1 : 2 as constituents of the S-layer glycan chain. The variability of the peptide part of these glycopeptides, which was also reflected by differences in the isoelectric points, as estimated from the elution interval on the PBE 94™ exchanger column, was verified by amino-acid analysis. Pool CFI contained exclusively the linkage amino acid tyrosine, whereas in pools CFII and CFIII, threonine, serine, alanine, proline, and aspartic acid were found in addition to tyrosine. Whereas CFI and CFIII contained a single peptide portion each, CFII consisted of two glycopeptide species which could be completely separated by RP(18)-HPLC. Sequence analysis indicated the presence of the glycosylated amino acid tyrosine at the N-terminus of each peptide. From sequencing data, it can be concluded that there are at least three different glycosylation sites on the intact S-layer polypeptide of T. thermosaccharolyticum D120-70 (Table 1).

Figure 1.

Chromatofocusing of the glycopeptide mixture after digestion of the S-layer glycoprotein of T. thermosaccharolyticum D120-70 with Pronase E. The bars indicate pooled fractions.

Table 1. Characterization of the glycopeptide pools (CFI, CFII, CFIII) derived after chromatofocusing of the glycopeptide mixture from the S-layer glycoprotein of T. thermosaccharolyticum D120-70. The occurrence is calculated as molar ratio of individual glycopeptide species, with an average S-layer glycan chain length of seven repeating units in pools CFI and CFIII, and five in pool CFII, respectively.
PoolpI intervalSequenceOccurrence

According to the results of HPAEC/PED analyses together with amino-acid data, the glycan chains of the chromatofocused glycopeptides of pools CFI and CFIII have an average length of seven repeating units; the degree of polymerization of the repeats from the two glycopeptide species of pool CFII was determined to be five. More accurate information on chain-length variability was obtained by MALDI-TOF MS analysis. The mass distribution of the pools CFI, CFII and CFIII was in accordance with the pattern obtained by native polysaccharide PAGE (Fig. 2), indicating the existence of individual glycoforms in all glycopeptide pools. Because of its well-resolved band pattern on the gel, pool CFIII was chosen for detailed analysis. MALDI-TOF MS analysis of CFIII showed the linkage of three to 10 hexasaccharide repeats to the tyrosine–aspartic acid peptide portion, with the most abundant molecular ions corresponding to five repeating units (Fig. 3). These data were confirmed by MALDI-TOF mass spectra of the individual glycoforms of CFIII, obtained by preparative gel elution after separation of CFIII on the polyacrylamide gel (not shown). In the other chromatofocusing pools, less chain-length variability was observed. The MALDI-TOF mass spectrum of pool CFI showed five glycoforms, and the spectra of the two glycopeptide species from pool CFII only three. All masses were assigned to multiples of the hexasaccharide repeating unit plus the peptide portion with mass deviations lower than 0.05% (Table 2). This assessment is unambigous evidence that there is no core structure in the S-layer glycoprotein from T. thermosaccharolyticum D120-70, connecting the O-antigen-like S-layer glycan chain with the S-layer polypeptide.

Figure 2.

Native polysaccharide PAGE of the three glycopeptide pools (CFI, CFII, CFIII) from the S-layer glycoprotein of T. thermosaccharolyticum D120-70 on combined Alcian blue and silver staining.

Figure 3.

Positive-ion MALDI-TOF mass spectrum of glycopeptide pool CFIII from T. thermosaccharolyticum D120-70. Values in parentheses denote the number of hexasaccharide repeats of the different glycoforms.

Table 2. MALDI-TOF MS data of the glycopeptide pools CFI, CFII and CFIII from T. thermosaccharolyticum D120-70 indicating the presence of different glycoforms. Values correspond to [M + Na+] molecular ions. Masses are calculated on the basis of 959.2 for the hexasaccharide repeating unit of the S-layer glycan chain.
(peptide sequence)
of repeats
deviation (%)
CFI (Tyr)54909.24907.20.020
CFII (Tyr-Thr)44069.14068.00.011
CFII (Tyr-Ser-Pro-Ala)44223.24221.40.019
CFIII (Tyr-Asx)33141.93141.80.001

NMR studies

The S-layer glycopeptide pool CFI from T. thermosaccharolyticum D120-70 was analyzed by applying a series of 1D and 2D NMR techniques, which confirmed that the structure of the repeating unit consists of six sugars in the arrangement described [14]. The published data were completed by gradient-selected proton-detected 1H–13C correlation experiments, showing all the linkage information from the appropriate cross-peaks in the HMBC spectra. They confirmed the anomeric configuration through one-bond 1H–13C coupling constants derived from HSQC spectra without 13C decoupling during the acquisition, and gave the complete set of 13C chemical-shift data (Table 3). In addition, a Smith-degradation experiment yielded the same trisaccharide-1-deoxyerythritol glycoside with identical 1H-NMR and 13C-NMR data as previously published [14]. The low-intensity signals for the terminal α-l-Rhap and α-d-Galp units at the non-reducing end of the glycan were elucidated from combined heteronuclear experiments such as HSQC-TOCSY and HSQC-NOESY, also in a 13C band-selective manner. Quantification of the 1H-NMR spectra by comparing the integral values of the anomeric protons relative to the aromatic protons of tyrosine gave a ratio of 7 : 1, which is in agreement with the average value of seven repeats for the five glycoforms of glycopeptide pool CFI from T. thermosaccharolyticum D120-70, found by MALDI-TOF MS analysis (see Table 2, pool CFI).

Structure elucidation of the glycosidic carbohydrate–protein linkage region was started from a 1H-13C HMBC cross-peak over three bonds from the phenolic OH carbon of the amino acid to an anomeric proton of the directly linked sugar. Further analysis was restricted because of overlapping signals of four anomeric protons and two anomeric carbons at the chemical-shift regions for this residue which followed tyrosine (Fig. 4). Comparison of the results with data for tyrosine-bound glycans from related species, e.g. from the S-layer glycoprotein glycan of Tb. (formerly Clostridium) thermohydrosulfuricus S102-70 [6], indicated that the linkage sugar could be a glucose residue. However, the two other glucoses present in the repeating unit prevented unambiguous assignment for this residue in the intact glycan. Moreover, they obscured any information on linkage from this glucose residue to the following sugar, suggesting that the question of whether this or other sugars belong to a putative core region cannot be answered.

Figure 4.

1H–13C HSQC NMR spectrum of the intact glycopeptide pool CFI, showing carbohydrate core signals and the anomeric region (inset).

Therefore, a truncated form of the glycopeptide of pool CFI was prepared by treatment with 25% trifluoroacetic acid. The major hydrolytic fraction showed, in comparison with the intact glycopeptide material, a significant shift in retention time on the RP(18)-HPLC column, indicating truncation of the carbohydrate moiety (Fig. 5). The 1H-NMR spectrum showed the aromatic protons as well as the β-CH2 protons for the amino acid and at least one repeating unit (Fig. 6, upper trace), but the continuing analysis led to the conclusion that this hydrolysis pool contained more components.

Figure 5.

RP(C18)-HPLC profile of glycopeptide pool CFI from T. thermosaccharolyticum D120-70 before and after truncation of the glycan chain. Line (a) represents the intact material, line (b) the fragments obtained after treatment with 25% trifluoroacetic acid. The arrow indicates the shift in retention time of the major hydrolysis fraction, which was further investigated by NMR. The material was eluted from the RP-HPLC column (Supersphere 100; 4 µm; 8 × 125 mm; Merck) with a water/acetonitrile gradient, containing 0.1% trifluoroacetic acid, at a flow rate of 3 mL·min−1.

Figure 6.

NMR spectra of the main fraction from the partially hydrolyzed glycopeptide pool CFI. Upper trace: 1H-NMR spectrum. Lower trace: LED-DIFF spectrum showing the α-d-Glcp-(1→O)-l-Tyr fragment.

To decide, in a fast NMR experiment, whether different fragments of varying size were present, a diffusion-edited NMR approach was applied. Most of the published applications in this field, such as diffusion-ordered spectroscopy (DOSY) [35] and diffusion-encoded spectroscopy (DECODES) [36], improved DECODES and heteronuclear diffusion-encoded spectroscopy (HETDECODES) [37], depend crucially on the determination of the diffusion coefficients. As only the identification and assignment of carbohydrate NMR signals and not the numeric values of the diffusion coefficients were of concern, only two fast 1H-NMR spectra using a bipolar gradient pulse pair-longitudinal eddy current delay sequence (for details see Materials and methods) were acquired. On one hand, this approach saved experiment time and, on the other, circumvented the problem of accurate calibration of the pulsed field gradient field strength. In the two experiments, the gradient amplitude of 5% in the first was changed to 50% in the second. The first data set served as a reference spectrum, showing only minor diffusion effects, from which the second one, which showed significant changes in the signal intensities, was subtracted. By adjusting the vertical scaling of the resonances lines, which showed relatively weak intensity reduction and therefore belonged to more slowly diffusing molecules, to equal height, the resulting difference spectrum showed signals for the faster moving molecules, which are components of lower molecular mass and of the solvent. As an additional advantage of this longitudinal eddy current delay difference (LED-DIFF) procedure, the receiver gain did not need to be kept the same for every experiment, as in the above mentioned diffusion-ordered experiments. It can be set to an optimized value for each spectrum, therefore giving better signal-to-noise ratios, as the overall signal intensity will decrease when the gradient amplitudes are increased.

Applying this LED-DIFF experiment to the major fraction of the partial hydrolysate of glycopetide pool CFI (Fig. 5), the resulting difference spectrum showed only signals for a small molecule consisting of a tyrosine and a sugar residue, the structure of which was elucidated to be β-d-Glcp-(1→O)-l-Tyr (Fig. 6, lower trace). All 1H-NMR parameters, derived from the LED-DIFF experiment, and the 13C chemical shifts, derived from an HSQC spectrum of the hydrolyzed glycan, of this fragment (given in Table 3) match the published NMR data [38,39].

By comparing the different 2D spectra of the intact glycan with those of the degraded glycan chain of pool CFI, the appropriate signals of the β-d-Glcp-(1→O)-l-Tyr portion of the non-hydrolyzed glycan, which are essential for obtaining information on the linkage to the following sugar residue, could be identified. From an HMBC cross-peak over three bonds from carbon C3 of the β-glucose to an anomeric proton, the adjacent monosaccharide was assigned as an α-l-Rhap residue, 1→4 linked to a β-Manp residue. This sugar sequence was also confirmed from an appropriate HMBC cross-peak. Therefore, these monosaccharides are already constituents of the first repeating unit.

Based on the results from 1D and 2D NMR analyses, in combination with the MALDI-TOF MS evidence, the linkage region of the S-layer glycan from T. thermosaccharolyticum D120-70 was established as a β-d-glucose residue, O-glycosidically linked to tyrosine. The β-d-glucose residue is a constituent of the first repeating unit of the S-layer glycan chain, but exhibits a different anomeric configuration from all other α-glucose residues of the repeating unit backbone structure (Fig. 7).

Figure 7.

Schematic representation of the complete S-layer glycoprotein glycan of T. thermosaccharolyticum D120-70. The degree of polymerization of repeating units (structure in square brackets) varies between three and 10.


In this paper, the complete glycan structure of the S-layer glycoprotein glycan from T. thermosaccharolyticum D120-70 is described, focusing on its glycosidic linkage to the S-layer polypeptide (Fig. 7). In addition, glycosylation sequences were identified by N-terminal sequencing of the glycopeptides obtained on proteolytic degradation of the S-layer glycoprotein. From these data, we propose a model for the assembly of the S-layer glycan chains of T. thermosaccharolyticum D120-70.

The combination of different separation techniques was crucial in the identification of all S-layer glycopeptide species from T. thermosaccharolyticum D120-70. From the proteolytic digest of the S-layer glycoprotein, coeluting cell wall-associated components were removed by gel-permeation chromatography, on the basis of differences in molecule size. Remaining contaminating peptide fragments were bound to a cation-exchanger resin and thus could be removed from the glycosylated fraction, which was recovered in the flow-through of the column. Further splitting of the glycopeptide pool was achieved by chromatofocusing, on the basis of differences in the isoelectric points of the individual glycopeptide species. The results clearly show that not only the peptide moiety but, to a lesser extent, also the uncharged carbohydrate portion of the T. thermosaccharolyticum D120-70 S-layer glycan influences the elution behavior of the glycopeptides. Finally, RP(18)-HPLC is a useful means of separating S-layer glycopeptides that differ only in a single amino-acid constituent, but otherwise possess identical glycan compositions. Sequence analysis of the individual S-layer glycopeptide species from T. thermosaccharolyticum D120-70 revealed that one intact S-layer glycoprotein monomer carries at least three different glycosylation sites. This result is in accordance with data derived from other S-layer glycoproteins, which have between two and six gylcosylation sites (reviewed in [11]). The sequences Tyr* (the asterisk designates the glycosylated tyrosine residue), Tyr*-Thr and Tyr*-Ser-Pro-Ala have a calculated pI of 5.52; the pI of the Tyr*-Asp dipeptide is 4.30 (pool CFIII, Table 2), which explains its longer retention time on the PBE™ exchanger column. From the comparison with other S-layer proteins containing tyrosine-linked glycan chains, e.g. Tb. thermohydrosulfuricus L111-69, which has the glycosylation sequences Tyr*, Tyr*-Pro, Tyr*-Pro-Val, Gln-Tyr*[8], and T. thermosaccharolyticum E207-71, with the glycosylation sequences Tyr*, Tyr*-Asn-Pro, Tyr*-Asp-Gly-Asn-Ser [15], it becomes evident that all isolated glycopeptides have a pI value of ≈ 5.5. It remains to be investigated whether this finding has any implications for the function as a potential glycosylation site. To define a consensus sequence for a glycosidic linkage via the hydroxyamino acid tyrosine, as is the case for eukaryotic Asn-linked N-glycans exhibiting the sequon Asn-X-Ser/Thr [40], and to predict secondary-structure requirements, further glycosylation sequences will have to be analyzed.

With T. thermosaccharolyticum D120-70, for the first time, the chain-length variability of an S-layer glycoprotein glycan was fully evaluated. For the identification of individual glycoforms, i.e. identical peptide portions exhibiting differences in the degree of polymerization of hexasacharide repeats constituting the S-layer glycan chain, the separation criteria of native polysaccharide PAGE was used. The electrophoretic pattern (Fig. 2) reflects glycoforms that differ in chain length by single hexasaccharide units. Obviously, the peptide portion has a subtle, but not negligible, influence on the separation on the gel.

The detection of discrete bands on the gel, corresponding to glycopeptide species differing in single hexasaccharide repeats, is in accordance with the proposed biosynthetic concept for S-layer glycoprotein glycans [11,12]. This includes the stepwise assembly of the S-layer glycan chain by the addition of single repeating units on a nucleosidediphosphate carrier in the cytoplasm. Although not yet demonstrated experimentally, the separate independent biosynthesis of a putative core region is assumed. The activated biosynthetic intermediate should be subsequently transferred to a lipid carrier and transported across the cytoplasmic membrane. On its outside, the transfer of the S-layer glycan chains to certain tyrosine residues on the S-layer polypeptide should occur to complete the S-layer glycoprotein (compare with [41]). We do not yet know whether the formation and glycosidation of the core region takes place on the S-layer glycan chain or on the S-layer polypeptide. Interestingly, no interlinking core region was observed in the S-layer glycoprotein of T. thermosaccharolyticum D120-70. Instead, the anomeric configurations of the glucose residues at the reducing end of every repeating unit and that of the glucose involved in the O-glycosidic linkage are different. Whereas all glucoses in the backbone of the glycan chains are in the α-configuration, the tyrosine-linked glucose residue is in the β-configuration. This implies that glycosyl transferases act there with different specificities, enabling sequential assembly of the S-layer glycan chains. Similar observations have recently been made for the reducing-end sugars of the lipopolysaccharide O-antigens of Pseudomonas aeruginosa[42] and Salmonella enterica[43]. In both cases, only speculations about the reasons for the anomeric inversion were presented. With respect to the polymerization of the S-layer glycan repeats from T. thermosaccharolyticum D120-70, detailed NMR analyses did not give any evidence for 3-O-methylation of the terminating rhamnose residue, which has been suggested to be a signal for the termination of glycosylation reactions [44]. This indicates that the polymerization is controlled by a different mechanism in this organism. As it lacks a core region connecting the S-layer glycan and the polypetide portion, the structure of the S-layer glycoprotein from T. thermosaccharolyticum D120-70 is novel among glycosylated S-layers of the domain Bacteria and is presumably a strain-specific adaptation of this organism to the environment. It further indicates that, even in closely related strains, such as Tb. thermohydrosulfuricus and T. thermosaccharolyticum, different architectural concepts have evolved for their S-layer glycans during evolution. The S-layer glycoprotein glycans of Tb. thermohydrosulfuricus L111-69 [8] and T. thermosaccharolyticum D120-70 [14] are both O-glycosidically bound to tyrosine, but, whereas in the S-layer glycan of Tb. thermohydrosulfuricus L111-69 a →3)-α-l-Rha-(1→3)-α-l-Rha-(1→3)-α-l-Rha-(1→3)-β-d-Gal-(1→ core oligosaccharide is present [9], T. thermosaccharolyticum D120-70 possesses a core-less S-layer glycan (this paper). Core regions have so far also been described for the S-layer glycoproteins of Paenibacillus alvei CCM 2051 [9] and Aneurinibacillus thermoaerophilus GS4-97 [21] and DSM 10155 [45]. Interestingly, all cores referred to are composed of rhamnose residues.

The structures of all S-layer glycoprotein glycans elucidated so far have been summarized recently (reviewed in [11]). The compositional differences reflect the complexity of the glycosylation of bacterial S-layer proteins. As S-layers represent the outermost cell surface structure of these organisms, they obviously play a pivotal role in diversification of surface structures for the survival of the bacteria in a competitive habitat. Further biochemical and genetic investigations will be required to unravel the details of the biosynthesis of their S-layer glycoprotein glycans (reviewed in [11]).


We express our thanks to Friedrich Altmann for performing the MALDI-TOF MS experiments, Karola Vorauer for N-terminal sequencing, and Sonja Zayni for help with carbohydrate and amino-acid analyses. This work was supported by the Austrian Science Fund, project P12966-MOB (to P.M.) and the Austrian Federal Ministry for Education, Science and Culture.