Correspondence: Lei Wang, TEDA School of Biological Sciences and Biotechnology, Nankai University, 23 Hongda Street, TEDA, Tianjin 300457, China. Tel.: +86 22 66229588; fax: +86 22 66229596; e-mail: email@example.com
This review covers the O antigens of the 46 serotypes of Shigella, but those of most Shigella flexneri are variants of one basic structure, leaving 34 Shigella distinct O antigens to review, together with their gene clusters. Several of the structures and gene clusters are reported for the first time and this is the first such group for which structures and DNA sequences have been determined for all O antigens. Shigella strains are in effect Escherichia coli with a specific mode of pathogenicity, and 18 of the 34 O antigens are also found in traditional E. coli. Three are very similar to E. coli O antigens and 13 are unique to Shigella strains. The O antigen of Shigella sonnei is quite atypical for E. coli and is thought to have transferred from Plesiomonas. The other 12 O antigens unique to Shigella strains have structures that are typical of E. coli, but there are considerably more anomalies in their gene clusters, probably reflecting recent modification of the structures. Having the complete set of structures and genes opens the way for experimental studies on the role of this diversity in pathogenicity.
The O antigen, which contains many repeats of an oligosaccharide unit (O unit), is part of the lipopolysaccharide present in the outer membrane of Gram-negative bacteria. The O antigen exhibits variation in the types of sugar present, their arrangement within the O unit and the linkages within and between O units, making lipopolysaccharide one of the most variable cell constituents. The variability of the O antigen provides the major basis for serotyping schemes for many Gram-negative bacteria, and the only antigen used for Shigella, which are important pathogens, as they lack the H and K antigens also used in Escherichia coli.
The O antigen is exposed on the cell surface. It is highly immunogenic and also used as a receptor by some bacteriophages, which may both contribute to maintenance of diversity by intermittent selection against specific O antigen forms (Reeves & Wang, 2002). O antigen diversity is also thought to be important in allowing the various clones to present variations in surface structures that may offer selective advantage in their specific niche. It has been estimated that a selective advantage of only 0.1% for one O antigen over another in a given niche is more than sufficient to maintain different alleles in different clones (Reeves, 1992). Some O antigen forms are disproportionately represented in pathogenic clones, which also indicates that specific O antigens contribute to adaptation to that niche.
O antigen is also an important virulence factor, and loss of O antigen makes many pathogens serum sensitive or otherwise seriously impaired in virulence. In the case of E. coli O1, O7 and O18, there is direct evidence that the O antigen differences account for differences in the nature of pathogenicity (Pluschke et al., 1983; Achtman & Pluschke, 1986), and it has also been shown that the virulence of Shigella flexneri is reduced if the O antigen is changed (Gemski et al., 1972). In Francisella tularensis, the complete loss of O antigen leads to serum sensitivity, impaired intracellular replication, and severely attenuated virulence in the mouse model (Raynaud et al., 2007). In Yersinia enterocolitica O8, it has been shown that the presence of the O antigen and the proper distribution of the O antigen chain lengths are required for the full virulence, and mutation of the O antigen may influence the proper function or expression of virulence factors (Bengoechea et al., 2004). Finally in S. flexneri serotype 5, the glucosylation of the O antigen was suggested to promote bacterial invasion and evasion of innate immunity (West et al., 2005).
The above are among the few studies that have been undertaken of the role of O antigen specificity, and indicate that it is important for the full-function and virulence of individual serotypes, but clearly much more work needs to be done. The mere presence of the large number of O antigen specificities in a range of species tells us that there is selection for this diversity.
There are three distinct processes for synthesis and translocation of O antigen, which are the Wzx/Wzy dependent process, the ATP-binding cassette (ABC) transporter dependent process, and the synthase dependent process (Bronner et al., 1994; Keenleyside & Whitefield, 1996; Daniels et al., 1998; Linton & Higgins, 1998; Samuel & Reeves, 2003). Most E. coli and all the Shigella forms use the Wzx/Wzy process in which the O unit, which usually contains two to eight residues from a broad range of sugars, is synthesized by transfer of a sugar phosphate and then sequential transfer of other sugars from the respective sugar nucleotides to the carrier lipid, undecaprenyl phosphate (UndP). These O units are flipped across the membrane while retaining attachment to UndP, and then polymerized to form polysaccharide chains, which are transferred to the independently synthesized core-lipid A to form lipopolysaccharide (Mulford & Osborn, 1983; McGrath & Osborn, 1991; Reeves & Wang, 2002). The genes involved in the biosynthesis of O antigen are generally found on the chromosome as an O antigen gene cluster, and genetic variation in the gene cluster is the major basis for the diversity of O antigen forms. Genes involved in O antigen synthesis are classified into three main classes: (1) nucleotide sugar synthesis genes; (2) genes for transfer of sugars; (3) O unit processing genes encoding flippase (Wzx) and polymerase (Wzy).
Genes for sugar transferases, wzx and wzy genes, are usually specific to each individual O antigen gene cluster, and have the potential to be used in PCR-based assays for rapid identification and detection of relevant strains. Conventional O-serotyping of E. coli and Shigella strains by agglutination reactions using antisera is laborious, time consuming and not practical for analysis of large numbers of specimens. O serotyping also suffers from serological cross-reactions giving equivocal result and the presence of rough strains, that do not produce O antigen and are untypable. A DNA based typing method based on O antigen specific genes can overcome these problems and has been shown to be reliable, rapid and sensitive for detecting relevant strains from clinical, food and environmental samples (Wang & Reeves, 1998; DebRoy et al., 2004, 2005; Feng et al., 2004d; Li et al., 2006; Han et al., 2007). We have developed a DNA microarray targeting O-serotype-specific genes to detect all distinct O antigen forms of Shigella, which can be used as a better alternative to the traditional serotyping procedure (unpublished data).
Shigella is a well-known human pathogen that causes diseases such as diarrhea and bacilliary dysentery (shigellosis) (Wachsmuth & Morris, 1989). Shigella are facultative intracellular pathogens that colonise the intestinal mucosa and continue to threaten public health mainly in less developed countries with conditions of poor sanitation. A low infectious dose (10 cells) (Dupont et al., 1989) allows the disease to be spread effectively. A shigellosis survey between 2000 and 2004 in six Asian countries (Bangladesh, China, Pakistan, Indonesia, Vietnam and Thailand) showed that the overall annual incidence of treated shigellosis was 2.1 per 1000 per year in all ages and 13.2 per 1000 per year in children under 5 years old (von Seidlein et al., 2006), which was much higher than that in industrialized countries. For example, the estimated shigellosis incidence in all ages was 3.7 per 100 000 per year in the United States in 1999 (Gupta et al., 2004), and 3.2 per 100 000 per year in the Netherlands between 1996 and 2000 (van Pelt et al., 2003).
Escherichia coli and Shigella have long been known to be closely related, but in the 1940s, Shigella strains were separated from E. coli and put into their own genus and subgrouped into four species: Shigella boydii, Shigella dysenteriae, S. flexneri and Shigella sonnei, also known as Shigella subgroups A, B, C, and D, respectively. The first three species are typed into multiple serotypes, based on antigenic variation in their O antigens. However, we now know from analysis of housekeeping gene sequences, that most Shigella serotypes fall into three clusters within E. coli (Pupo et al., 2000). There is very little sequence variation within each cluster, suggesting a relatively recent origin. Clusters 1 and 2 include 19 and eight serotypes, respectively, due to the presence of different, generally unrelated, O antigens, indicating that these clusters have undergone diversification by variation in the O antigen. Cluster 3 includes 12 flexneri O antigens that share a common basic structure and differ only in the distribution of four possible glucose side-branch residues and an O-acetyl residue, that are all attached by enzymes encoded by prophage genes. In the case of cluster 3, diversification has been largely due to gain of phages conferring immunodominant side branch residues, but also includes S. boydii type 12 with an unrelated O antigen.
There are five Shigella serovars that constitute independent lineages within E. coli, so the overall picture is of E. coli as a diverse species with several pathogenic forms, of which most Shigella strains fall into three clusters. S. dysenteriae types 1, 8 and 10, S. boydii 13 (but see below) and S. sonnei, are isolated clones within E. coli. It is long overdue for Shigella to be incorporated into E. coli, but as it has not yet been done, we retain here the Shigella nomenclature. We believe that eventually a simpler system will be developed, but mostly refer here to the serovars as B/D/F (B1 for S. boydii type 1, etc), with SS for the single O antigen in S. sonnei.
B13 has been reported to be markedly divergent from other Shigella and was proposed to belong to another species (Ewing et al., 1958; Manolov, 1959; Brenner et al., 1973, 1982). This was confirmed by the Pupo et al. study in which it again fell outside of E. coli (Pupo et al., 2000). However it has recently been shown that B13 fall into two groups, one of which is outside of E. coli but is within the species Escherichia albertii, while the other is within E. coli (Hyma et al., 2005). The strain that we used is not within E. coli (Pupo et al., 2000), and is presumably in the group related to E. alberti, but for the purpose of this review we assume that the organization of the E. coli and E. albertii B13 gene clusters is the same, as this applies to gene clusters for structures common to the more distantly related E. coli and Salmonella enterica. We note from work of the Whittam Lab that the E. coli B13 strains are not in any of the three clusters of Shigella strains but quite closely related to them (Hyma et al., 2005).
This review covers the structure of all 34 Shigella O antigens and their gene clusters, marking the first time that structure and DNA sequence data have been determined for all O antigens of such a group. In the past 2 years, we have sequenced 27 Shigella O antigen gene clusters, 18 of which are reported here for the first time, and determined 12 Shigella O antigen chemical structures, seven of which are reported here for the first time. We also reexamined the structure for those O antigens for which older methods were used, and revised the structure for nine of them. In addition, we sequenced a number of E. coli O antigen gene clusters (22 reported in this paper) and found that of the 34 distinct O antigens in Shigella forms of E. coli, 21 are either identical or closely related to an E. coli O antigen. We now have a much better understanding of the genetic basis and origins of O antigen diversity in Shigella, which can be used as a model for analyzing the antigen diversity of other bacteria, and should also be useful for development of vaccines and rapid DNA-based detection systems for Shigella.
Table 1. Structures of the O antigens of Shigella and related Escherichia coli strains*
Shigella type (E. coli serogroup)
O antigen structure
For abbreviations of monosaccharides and non-sugar constituents, see Table 2. The reference to the article, in which the correct Shigella O polysaccharide structure was reported for the first time, is indicated in bold face.
† One of two possible structures of the biological O unit.
‡ The structure of the biological O unit was established.
The E. coli O polysaccharide lacks O-acetyl groups.
The E. coli O polysaccharide lacks O-acetyl groups from the Fuc residue.
In the E. coli O polysaccharide, the degree of O-acetylation is c. 60%.
The structures of B1, B3, B15, D11, D12 and D13 have not been reported earlier and were studied for the first time in this work. A comparison, using the ‘fingerprint’ method, of the 13C-NMR spectra of their O antigens with published data of E. coli O149, O167, O112ab, O29, O152 and O150, respectively (Adeyeye et al., 1988; Linnerborg et al., 1997; Olsson et al., 2005; Perepelov et al., 2006a, b, 2008a), showed that the Shigella and E. coli structures are pairwise identical. An exception is the O antigen of D11, whose O unit, as opposed to its E. coli counterpart, contains an O-acetyl group in a nonstoichiometric amount. In this case the 13C-NMR spectrum of the O-deacetylated O antigen was analyzed and the site of O-acetylation was determined by comparison of two-dimensional H-detected 1H,13C heteronuclear single-quantum coherence (HSQC) spectra of the initial and O-deacetylated O antigens from D11 based on a deshielding effect of the O-acetyl group on 1H- and 13C-NMR chemical shifts.
Only a partial O antigen structure has been reported for D6 (Dmitriev et al., 1975c), and it was reinvestigated in this work. Surprisingly, D6 was found to lack a long-chain O antigen, and therefore the structural studies were performed with an oligosaccharide composed of the single O unit linked to the lipopolysaccharide core, which was isolated by mild acid degradation of the D6 lipopolysaccharide.
As a result, it was shown that the O unit of D6 has the same structure as that of the E. coli O130 O antigen (Perepelov et al., 2007b), including the presence of a glycerophosphate group and the configurations of all glycosidic linkages. However, the galactose residue is 4-monosubstituted in the D6 oligosaccharide but 3,4-disubstituted in the interior O units of the E. coli O130 O antigen. This followed from a higher-field position at δH/δC 4.20/70.5 of the Gal H3/C3 cross-peak in the 1H,13C HSQC spectrum of the D6 oligosaccharide as compared with its position at δH/δc 4.41/79.2 in the spectrum of the O130 O antigen, where this cross-peak is shifted downfield due to an effect of glycosylation at O3. This finding shows that the D6 strain has only one O unit, and defines the structure of the biological O unit of D6 as shown in Table 1, a structure in agreement with our expectation that the first sugar would be the GalNAc residue.
Finally, the 1H,13C HSQC spectrum of the D6 oligosaccharide contained a cross-peak for an O-acetyl group at δH/δC 2.11/21.8, and c. 55% H3/C3 cross-peak of the lateral phosphorylated GalNAc residue was shifted markedly downfield from δH/δC 3.89/72.1 to 5.04/73.7. This displacement was due to a deschielding effect of the O-acetyl group and indicated partial O-acetylation of this GalNAc residue at position 3.
General features of Shigella O antigens
Most O antigens are heteropolysaccharides built up of linear or branched tri- to hexasaccharide O units. Apart from SS, the sugar composition of the Shigella O antigens is rather typical of enteric bacteria (Table 2). Each O antigen contains either or both of N-acetyl-d-glucosamine (d-GlcNAc) and N-acetyl-d-galactosamine (d-GalNAc). Other typical sugar components are l-rhamnose (l-Rha), d-glucose (d-Glc), d-galactose (d-Gal), d-glucuronic acid (d-GlcA), d-galacturonic acid (d-GalA) and d-mannose (d-Man). In addition, d-ribose (d-Rib) is present in three O antigens and some other sugars, including l-fucose (l-Fuc), N-acetyl-d-mannosamine (d-ManNAc), N-acetyl-l-quinovosamine (l-QuiNAc), N-acetyl-l-fucosamine (l-FucNAc), 4-amino-4-deoxy-d-quinovose (d-Qui4N), l-iduronic acid (l-IdoA) and N-acetyl-d-galactosaminuronic acid (d-GalNAcA), occur in one O antigen each. The most unusual sugar is a higher acidic diamino sugar, a derivative of 5,7-diamino-3,5,7,9-tetradeoxy-l-glycero-l-manno-non-2-ulosonic (pseudaminic) acid, found in B7. Most sugars exist in the pyranose form, whereas Gal occurs in the furanose form in some O antigens and Rib in all O antigens where present.
Table 2. Composition of the Shigella O antigens
Frequency of occurrence
Monosaccharides are present in the pyranose form unless stated otherwise.
Pyruvic acid (2-oxopropanoic acid), (R)- or (S)-acetal
The O unit of D7 contains two d-GalNAcA residues, one of which is in the amide form. In B3 and B8, the d-GalA residue is amidated with l-alanine and 2-amino-2-deoxyglycerol, respectively. Some unusual amino sugars bear acyl groups other than acetyl, such as N-acetylglycyl on Qui4N in D7 or (R)-3-hydroxybutanoyl on pseudaminic acid in B7. O-acetylation of one or more sugars occurs in the O antigens of about half of the Shigella type strains.
All Shigella O antigens, except for D1, B18 and F1–F5, are acidic due to the presence of either a sugar acid (hexuronic or pseudaminic acid) or a noncarbohydrate acidic component, such as lactic acid ethers (glycolactilic acids), pyruvic acid acetals or alanine. In E. coli, acidic O antigens are less common, and most of them are O antigens shared with Shigella. The bias in Shigella towards the presence of acid groups has been noted before (Knirel & Kochetkov, 1994) and is confirmed by this analysis, in which we find the frequency of GlcA in particular to be much higher in the Shigella O antigens (7.55%) than for the typical E. coli serovars (2.08%). There are four phosphorylated O antigens, of which D12 and B13 have the glycosyl phosphate linkage in the main chain. D11 has a glycerol moiety in the main chain with a phosphodiester linkage as in teichoic acids, while D6 has glycerol-2-phosphate as a side group.
The O antigen of SS possesses an exceptional structure and composition. It has a disaccharide O unit containing N-acetyl-l-altrosaminuronic acid (l-AltNAcA) and 4-amino-4-deoxy-N-acetyl-d-fucosamine (FucNAc4N) residues, the latter being the first in the O unit. Neither of the two sugars is found in E. coli or is at all common elsewhere.
O antigens of F1–5 are distinguished by close structural similarity between types. They have the same basic O unit structure and the distinctions result from phage-encoded postpolymerization modifications (glucosylation or/and O-acetylation). Otherwise similarities among O antigen structures are uncommon in Shigella but two closely related pairs of the O antigens exist. These are B10 and B6, differing in the presence or absence of d-Rib; and D2 and B15, which differ in the presence or absence of pyruvic acid acetal and replacement of d-Gal with l-IdoA. In addition, slightly different O antigen structures, owing to replacement of d-Glc with d-GlcNAc, have been reported for two strains classified to D8. However, many Shigella O antigen structures are either identical or closely related to those of non-Shigella E. coli strains (Table 1). For most structurally related O antigens, the corresponding serological identity or relatedness has been demonstrated (Ewing, 1986).
We suggest that it is time to face up to the fact that in E. coli we have five series of names for O antigens, i.e. that for typical E. coli and four for the Shigella‘species’. To simplify our discussion of individual structures, we will refer to shared structures by a combined name, to reflect the reality that there is only one structure. Thus the O antigen found in E. coli O149 and B1 is referred to as the O149/B1 structure, etc.
General features of Shigella O antigen gene clusters
Escherichia coli O antigen gene clusters with the majority Wzx/Wzy synthesis process are located between the galF and gnd genes, and the same applies to all of the Shigella forms except for that of SS, which is located on a plasmid (Shepherd et al., 2000).The sizes of the Shigella O antigen gene clusters range from 9081 bp (B14) to 17769 bp (D5).
In considering the nature of variation in Shigella O antigens, we will discuss the O antigen of SS last as it is atypical in many ways. In this section and ‘Anomalies in the Shigella O antigen gene clusters–The O antigen gene clusters that are unique to Shigella’ we will discuss the 33 O antigen gene clusters that map between galF and gnd, as is usual in E. coli. Their properties are in the general range for E. coli, and in particular the GC% content for most genes is about 30%, similar to that for E. coli O antigen gene clusters, and significantly lower than that of E. coli/Shigella genomes (c. 50%). Some sugar biosynthesis genes, such as rmlBDAC, gmd, fcl, manC and manB, have a relatively high GC% content (about 30–50%) in both Shigella and typical E. coli serovars. A few manB genes have an even higher GC% content, and have sequences suggesting that they have been derived from the manB gene in the adjacent gene cluster for colanic acid (generally present in E. coli), as reported previously (Jensen & Reeves, 2001). These include the manB genes of B6 and B10 (53.4%), D5 (54.6%), and B12 (55.2%).
Of the 148 E. coli O antigens for which we know the gene cluster sequence, 140 use the Wzy/Wzx dependent pathway as shown by presence of the two genes, and the other eight (O8, O9, O52, O89, O95, O99, O101, O162) use the ABC transporter dependent pathway (Kido et al., 1995; Feng et al., 2004a) (authors' unpublished data). In addition, the structure data indicated that the O20ab, O20ac and O97 antigens are also synthesized by the ABC transporter dependent pathway (Stenutz et al., 2006). Escherichia coli O antigens with the Wzy/Wzx dependent pathway other than O45, and all Shigella O antigens other than SS include at least one GalNAc or GlcNAc residue and in all cases the first sugar of the O unit is thought to be one of them, being transferred onto an undecaprenyl carrier by WecA, encoded by a gene in the enterobacterial common antigen (ECA) gene cluster (Alexander & Valvano, 1994; Rick et al., 1994; Yao & Valvano, 1994). This has also been shown for D1 and D6, B17 and F1–5. In most other cases, the exact O unit structure is defined by the occurrence of only one residue of this pair of sugars, whereas in D2, D4, D8, B1, B3, B8, B13 and B15, the structure is ambiguous owing to the presence in the O unit of two or more such residues. None of the Shigella gene clusters with wzx and wzy genes has a gene for the transfer of the first sugar [these initial transferases fall into two groups, both easily recognised by a pattern of transmembrane segments (Valvano, 2003; Lehrer et al., 2007)], suggesting that WecA is used as the initial transferase.
Most ORFs of the Shigella serotypes were assigned functions based on their similarity to those from available databases, although for glycosyltransferases they can mostly not be associated with a specific linkage. They are summarized in supplementary Table S1. In all cases the gene clusters include genes that can account for all functions expected for synthesis of the O antigens. Gene names were given according to the bacterial polysaccharide gene nomenclature (BPGN) system (http://www.microbio.usyd.edu.au/BPGD/default.htm) (Fig. 1).
Anomalies in the Shigella O antigen gene clusters
Although Shigella O antigens are generally typical of those found in other E. coli, there are several atypical features in the gene clusters as described below. A total of 17 anomalies are found in the sequences of the 33 O antigen gene clusters (SS is excluded). Comparison with the situation in other E. coli (49 anomalies are found in 148 E. coli O antigen gene clusters) reveals that Shigella O antigen gene clusters have a higher proportion of anomalies (Table 3), which provides support to the notion that some Shigella O antigen gene clusters have been assembled recently or have undergone adaptive modifications in a newly occupied niche. Some anomalies are found to mediate the formation of Shigella O antigens, which may represent an essential aspect in the origins of Shigella O antigen diversity.
Table 3. Anomalies in Shigella and Escherichia coli O antigen gene clusters
148 E. coli O antigens Number
33 Shigella O antigens (excluding SS) Number (O antigen from)
3 (B6, B12,B13)
4 (B3, B18, D9, D10)
rmlC in unusual location
2 (B2, B9)
Gene in the reverse orientation
3 (B6, B10, B11)
2 (B8, F2a)
3 (D1, D6, D10)
Insertion sequence (IS) and H-repeat elements
IS elements are known to play an important role in evolution of the bacterial genome (Bennett, 2004). Insertion of IS elements can result in the inactivation of genes, and the combination of two or more IS elements can also result in the mobilization of large portions of DNA. The H-repeat is found downstream of the core of several Rhs elements (such as RhsB, RhsC and RhsE in E. coli K12), displaying features of typical insertion sequences. No transposition activity has yet been detected (Zhao et al., 1993; Mahillon & Chandler, 1998), although an H-repeat was proposed to have mediated the intraspecific recombination event which produced the Salmonella group D2 O antigen gene cluster (Xiang et al., 1994). IS elements and/or H-repeats are found in seven Shigella O antigen gene cluster, and it is likely that some of them (in B6 and D9) were directly involved in the formation of the new O antigen forms.
In B6, there is an IS629 element (97.9% identical to IS629 of SS (Matsutani et al., 1987)), that interrupts wbaM, a ribosyltransferase gene (Wang et al., 2001; Senchenkova et al., 2005). This is the only difference between the B6 and B10 O antigen gene clusters, which accounts for the structural difference. The d-Ribf is on a short side branch of B10 and appears to be immunodominant as no cross reaction was reported between B6 and B10 (Ewing, 1986). The structure and sequence differences are such that the B6 and B10 O antigens would normally be treated as variants of a single O antigen form.
In B12, there is an IS630 element (98.6% identical to IS630 in the SS O antigen gene cluster) between rmlD and rmlA. The IS630 transposase gene is in the opposite orientation to the O antigen genes, but the insertion must not affect expression as most of the gene cluster is downstream of the insertion and also l-Rha is present, which indicates that rmlA (the first gene after the IS630 insert) must be expressed.
In B13, a remnant of IS1004 (54% identical to IS1004 of Vibrio cholerae) was found upstream (positions 812–1450) of the wzx gene (Feng et al., 2004b). IS1004 is found mainly in V. cholerae clones (Bik et al., 1996).
In B3, there is a remnant H-repeat unit between positions 1070 and 1718 [80% identical to the RhsE H-repeat of E. coli K12 (Zhao et al., 1993)]. The remnant in B3 had a 650-bp deletion and a number of other mutations.
A remnant H-repeat unit (positions 4768–5822) is found between rmlC and wzx in B18 (94% identical to the RhsB H-repeat of E. coli K12-Note that the RhsB, and RhsE H-repeat sequences of E. coli K12 are 97% identical). The inverted repeat sequence at the 5′ end was almost complete but that at the 3′ end was lost.
In D10, a remnant H-repeat unit is present between positions 10 042 and 11 333 (96% identical to the H-repeat from RhsB of E. coli K12), and with both inverted repeats retained.
There are three remnant H-repeats in D9. One (92% identical to the H-repeat from RhsE of E. coli K12) is upstream of wffO (positions 71–1349). The others (the first is 93% identical to the H-repeat from RhsB of E. coli K12, and the other is 92% identical to a putative H-repeat sequence in E. coli O157:H7) are located downstream of wffO (positions 2610–3248; and 3249–4507). wffO in D9 was probably introduced into the O antigen gene cluster together with the H-repeat.
The deletions, point mutations and the loss of the terminal inverted-repeat sequences in some of the H-repeat remnants discussed above indicates that these elements have been associated with Shigella O antigen gene clusters for a long time since last undergoing transposition.
rmlC gene in unusual location
l-Rha is widely distributed in O antigens of bacteria, and the four rml genes involved in biosynthesis are usually grouped together and easily identified in a range of species. In E. coli and Salmonella, the genes are generally in the order rmlB, rmlD, rmlA, rmlC at the 5′ end of the O antigen gene cluster. In E. coli, rmlB, rmlD and rmlA genes are conserved and have many characteristics of housekeeping genes (Li & Reeves, 2000; Wang et al., 2001), while the rmlC gene is more varied and O antigen specific. It has been concluded that in Salmonella recombination in the 3′ end of the rml gene set plays a role in mediating the transfer of the central serotype-specific genes, which are located downstream of the rmlC gene (Li & Reeves, 2000).
l-Rha is present in 15 Shigella O antigens, and the typical rml gene order was found in 13 of them, the exceptions being B2 and B9, in which the rmlC gene was located seven and four genes respectively downstream of the rmlBDA genes. It is likely that the two separated rmlC genes were recently introduced into the O antigen gene clusters by recombination together with the seven or four gene upstream of them. This is supported by the phylogenetic tree of available rmlC genes of E. coli and Shigella, showing that the rmlC genes of B2 and B9 are separated from most of the others.
wzx gene in the reverse orientation
The 33 gene clusters have all genes in the same transcriptional direction from galF to gnd, except for the wzx genes in B6, B10 and B11, which are transcribed in the opposite direction. The genes are located at the 3′ end of the O antigen gene clusters, so must have a promoter at that end of the gene cluster. The B6 and B10 gene clusters are very similar, and the (identical) wzx genes, which are part of the shared organization, are 67% identical to that of B11 (Senchenkova et al., 2005). The O antigen gene clusters of B10 and B11 are also related, and the ribosyltransferase genes of B10 and B11 (wbaM and wbsZ), the last genes in the normal orientation and adjacent to wzx, also share 67% identity. Because the two ribosyltransferase are responsible for the different linkages (Senchenkova et al., 2005), this region may have diverged some time ago.
Three Shigella serotypes have genes affected by mutation.
D1 is the most virulent serotype of Shigella. The O antigen structure is unique but was found to be a simple variation of the O148 structure, and is proposed to have arisen from O148 by mutation of a glucosyltransferase gene and its replacement functionally by a galactosyltransferase gene on a plasmid. The D1 wbbG glucosyltransferase gene has a deletion of 22 bases apparently due to recombination between 8 bp direct repeats present in O148 (Feng et al., 2007). Deletions are often formed by recombination between repeated sequences and that seems to be the case here as D1 has only one copy and is missing the 14 bases between the two repeats. A galactosyltransferase gene wbbP located on a 9 kb plasmid pHW400 is responsible for the transfer of galactose to make a major antigenic epitope of the O antigen (Gohmann et al., 1994), which may confer a selective advantage in the Shigella intracellular environment.
In D10, there are several remnants of a manC gene (manC*) located between wffT and an H-repeat, and a remnant of manB gene (manB*) between the H-repeat and wffU. ManB and ManC are the two enzymes responsible for the synthesis of GDP-d-Man, and as there is d-Man in the structure, it is probable that the manC and manB genes in the colanic acid gene cluster, which is just upstream of galF in E. coli, are responsible for the synthesis of d-Man in the D10 O antigen. Most likely the H-repeat disrupted the gene, but the organism survived by derepression of the manC and manB genes of the colanic acid gene cluster. If so the homologues in the O antigen gene cluster would no longer be required, and the substantial degradation observed suggests that this is a long-standing situation.
In the case of D6 and O130 the O units are identical, but the D6 lipopolysaccharide contains only one O unit attached to core. Their O antigen gene clusters also share high level identity, but compared to O130, the wzy and wffH′ genes of D6 are in one ORF due to a one base deletion at the 3′ end of wzy, which also changed three Wzy amino acid residues. It appears that the wzy gene in D6 is not functional. However the type strain which we studied has been maintained in laboratory culture for a long time, and may not be typical of D6.
Noncoding regions in bacteria usually indicate the remnants of genetic reorganization. There is a 1931 bp noncoding region between wzy and gne in the B8 O antigen gene cluster and a 3265 bp region with no functional gene between wzy and gnd in F2a, both with no good hits in a blast search. Also the noncoding regions at the ends of the B8 O antigen gene cluster between galF and wfdP (808 bp) and between wfdV and gnd (849 bp) are larger than usual in E. coli and Shigella.
Pathways for biosynthesis of sugars in Shigella O antigens
Twenty different sugars were found in the Shigella O antigen structures (Table 2), and the biosynthesis pathways for the 19 hexoses and derivatives are presented in Fig. 2. ADP-d-Ribf, the only pentose, is available from the NAD salvage pathway (Hillyard et al., 1981). Six of the 20 sugars (d-Ribf, d-Glc, d-Galp, d-GlcNAc, d-GlcA and d-GalA) are found in other structures generally present in E. coli and Shigella, and the genes for their biosynthesis are not in the O antigen gene clusters, but elsewhere in the genome. Most of the genes for biosynthesis of the other 14 sugars are located in the O antigen gene clusters, and the pathways for 12 of them have been described before. The pathways for the other two sugars and three glycolactilic acids (sugar ethers with lactic acid) are first reported in this review and are discussed below. The five pathways are for a 7-N-(3-hydroxybutanoyl) derivative of 5,7-diamino-3,5,7,9-tetradeoxy-l-glycero-l-manno-non-2-ulosonic (pseudaminic) acid, l-iduronic acid, 1-carboxyethyl ethers of d-glucose and l-rhamnose (glucolactilic and rhamnolactilic acids), and 2-acetamido-4-O-[(S)-1-carboxyethyl]-2-deoxy-d-glucose.
Derivative of pseudaminic acid in B7
5-Acetamido-3,5,7,9-tetradeoxy-7-[(R)-3-hydroxybutanoylamino]-l-glycero-l-manno-non-2-ulosonic acid (Pse5Ac7(3OHBut)), which is a di-N-acyl derivative of pseudaminic acid (Pse), was found in the B7 O antigen. Pse is a higher acidic diamino sugar, and its derivatives are constituents of some bacterial polysaccharides, including the O antigens of B7 and Pseudomonas aeruginosa O7–O9, the K antigens of Sinorhizobium meliloti Rm41 and Sinorhizobium fredii HH103 etc (Knirel et al., 2003). The O antigens of B7 and P. aeruginosa O9 are serologically related, but their O units are different and have no common or structurally similar elements other than a Pse derivative. Therefore, pseudaminic acid, which most likely occupies the nonreducing end of the polysaccharide chain, contributes significantly to the immunospecificity of the bacteria and serves as an immunodeterminant.
The CMP-Pse5Ac7Ac biosynthesis pathway has been identified in Campylobacter jejuni and Helicobacter pylori, both of which have six enzymes (PseB, PseC, PseF, PseG, PseH and PseI) (Schoenhofen et al., 2006a). Orf1, Orf2, Orf3 and Orf6 in B7 are homologs of enzymes involved in identified CMP-Pse5Ac7Ac biosynthesis pathway, but the identity is not high. We also found that B7 orfs 1–6 were similar to genes in the O antigen gene clusters of P. aeruginosa O7–O9 and K antigen gene cluster of S. meliloti Rm41, and were in the same order as in these gene clusters. Therefore, we propose that orfs 1–6 in B7 are responsible for the biosynthesis of CMP-Pse5Ac7(3OHBut), and a putative pathway is discussed below and shown in Fig. 3. We can be reasonably confident as discussed at the end of this section that the six genes are involved in the Pse5Ac7(3OHBut) pathway and have named them psb1-6. The genes are given numbers 1–6 rather than letters A–F, pending final resolution of the functions when they will be named psbA-F in function order.
Step1. Psb1 shared 58% identity and 74% similarity to PseB of C. jejuni, the first enzyme in the CMP-Pse5Ac7Ac pathway, which exhibits C6 dehydratase as well as a newly identified C5 epimerase activity that are responsible for the production of both UDP-2-acetamido-2,6-dideoxy-β-l-arabino-hex-4-ulose and UDP-2-acetamido-2,6-dideoxy-α-d-xylo-hex-4-ulose from UDP-d-GlcNAc (Schoenhofen et al., 2006b).
Step2. Psb2 shared 40% identity and 60% similarity to PseC of C. jejuni. PseC is an aminotransferase, which utilizes only the UDP-2-acetamido-2,6-dideoxy-β-l-arabino-hex-4-ulose as substrate producing UDP-2-acetamido-4-amino-2,4,6-trideoxy-β-l-altropyranose (or UDP-4-amino-4,6-dideoxy-β-l-AltNAc), a precursor in the CMP-Pse5Ac7Ac biosynthesis pathway (Schoenhofen et al., 2006b). Therefore, we propose that psb1 is a C6 dehydratase/C5 epimerase gene, and psb2 is an aminotransferase gene, that together are responsible for the synthesis of UDP-4-amino-4,6-dideoxy-β-l-AltNAc from UDP-d-GlcNAc.
Step 3. Psb5 belonged to an acetyltransferase family (PF00583, E-value=5 × e−08). We propose that psb5 encodes an N-(3-hydroxybutanoyl) transferase, responsible for the 4-N-(3-hydroxybutanoylation) of UDP-4-amino-4,6-dideoxy-β-l-AltNAc to produce UDP-2-acetamido-2,4,6-trideoxy-4-[(R)-3-hydroxybutanoylamino]-β-l-altropyranose. Psb5 and PseH have no similarity and carry out different reactions such that the two pathways diverge here.
Step 4. As discussed below, the two pathways later converge and we propose that the UDP moiety is cleaved off in the Pse5Ac7(3OHBut) pathway as in the Pse5Ac7Ac pathway. In B7 there is no good homologue of PseG, but Psb4 shared 36–39% identity to the corresponding genes (the fourth genes in the six-gene-sets, that are probably responsible for the synthesis of derivatives of Pse) in the O antigen gene clusters P. aeruginosa O7–O9 and K antigen gene cluster of S. meliloti Rm41. We propose that psb4 encodes the nucleotidase responsible for converting UDP-2-acetamido-2,4,6-trideoxy-4-[(R)-3-hydroxybutanoylamino]-β-l-altropyranose to the free monosaccharide.
Step 5. Psb6 shared 45% identity and 61% similarity to PseI of C. jejuni, which catalyzes the condensation of phosphoenolpyruvate with 2,4-diacetamido-2,4,6-trideoxy-l-altrose to form Pse5Ac7Ac (Chou et al., 2005). We propose that Psb6 has a similar function, and is responsible for the conversion of 2-acetamido-2,4,6-trideoxy-4-[(R)-3-hydroxybutanoylamino]-l-altrose to Pse5Ac7(3OHBut).
Step 6. Psb3 shared 48% identity to PseF of C. jejuni and belonged to the Cytidylyltransferase family (PF02348, E-value=8.9 × e−37). PseF is responsible for the synthesis of CMP-Pse5Ac7Ac from Pse5Ac7Ac (Schoenhofen et al., 2006a). We propose that Psb3 is a cytidylyltransferase, which activates Pse5Ac7(3OHBut) to CMP-Pse5Ac7(3OHBut).
The Pse5Ac7Ac pathway and proposed Pse5Ac7(3OHBut) pathway share the first two enzymes, and the last two enzymes carry out the same operations on slightly different substrates. The 3rd step determines the difference between the end products which is the group attached to C7. This is an unusual situation with similarity to the d-d-heptose and l-d-heptose pathways, that also start in the same way and then diverge, with some later steps carried out by the same enzymes in both pathways (Kneidinger et al., 2001b).
We can be reasonably confident on the first two steps of the Pse5Ac7(3OHBut) pathway as the homology is good and the reactions are the same as in C. jejuni. The fact that the six genes of the proposed Pse5Ac7(3OHBut) pathway are found together in the same order in two other unrelated species gives us confidence that all are involved in a common pathway and justifies naming them psb1-6, pending final resolution of the functions.
l-Iduronic acid in B15
l-IdoA was found in the O antigen of B15. l-IdoA differs from d-GlcA only in the stereochemistry of the C-5 carbon. IdoA in heparin and heparin sulphate is synthesized by 5-epimerization of d-GlcA at the polymer level, which is crucial for many biological functions of heparin and heparin sulphate in mammals (Crawford et al., 2001; Li et al., 2001). The B15 gene cluster contains only two putative sugar pathway genes, a gne gene as expected as there is a GalNAc in the structure and gae. Gae belongs to the d-glucuronyl C5-epimerase C-terminus family (PF06662, E-value=1.4 × e−24), and is 32% identical to a d-glucuronyl C5 epimerase of Danio rerio (Ghiselli & Farber, 2005), which carries out the polymer lever reaction discussed above. We propose that Gae is the UDP-GlcA C-5 epimerase. There is no signal sequence present so the enzyme is presumably cytoplasmic, and acts either on UDP-d-GlcA as the substrate, as would be expected in bacteria, or possibly on GlcA in the O unit before translocation across the membrane. The ugd gene for synthesis of UDP-d-GlcA from UDP-d-Glc is located downstream of gnd near the O antigen gene cluster in the 14 E. coli and Shigella genomes release to date.
4-O-[(R)-1-Carboxyethyl]-d-glucose (glucolactilic acid) is present in the O antigens of B17 and D3, 3-O-[(R)-1-carboxyethyl]-l-rhamnose (rhamnolactilic acid) is present in the O antigen of D5, and 2-acetamido-4-O-[(S)-1-carboxyethyl]-2-deoxy-d-glucose is present in the O antigen of D13.
The B17, D3, D5 and D13 gene clusters have two genes remaining after assigning functions to all other genes, that have products with potential to be enzymes for synthesis of the glycolactilic acids. For the first three O antigens, the two genes are homologous to murA and murB, responsible for synthesis of UDP-N-acetylmuramate, a lactic acid derivative of GlcNAc found in peptidoglycan. Orf3, Orf3, Orf7 and Orf7 of B17, D3, D5 and D13 respectively belong to the Polysaccharide pyruvyl transferase family (PF04230, E-values 6.3 × e−4, 7.2 × e−7, 4 × e−5 and 0.013 respectively) as does MurA, while Orf2, Orf2 and Orf8 of B17, D3 and D5 respectively, and also MurB, belong to the FrhB_FdhB_C family (PF04432, E-values 4.3 × e−7, 6.2 × e−6, 4.8 × e−8 respectively), that includes the C terminus of F420 hydrogenase/dehydrogenase β subunits. The genes in the three gene clusters are consistent with synthesis of the glycolactilic acids by a pathway parallel to that of UDP-N-acetylmuramate, which is synthesized in a two-step process by the enzymes UDP-N-acetylglucosamine enolpyruvyl transferase (MurA) and UDP-N-acetylenolpyruvylglucosamine reductase (MurB). The first step in the pathway is the transfer of an enolpyruvate residue from phosphoenolpyruvate to position 3 of UDP-d-GlcNAc, which is catalysed by MurA, followed by a MurB-catalysed reduction of the enolpyruvate moiety to d-lactate, yielding UDP-N-acetylmuramate (Gunetileke & Anwar, 1968; Taku et al., 1970). Based on these data, we propose that the first group are genes for enolpyruvate transfer from phosphoenolpyruvate to UDP-d-Glc or dTDP-l-Rha, while the second group are responsible for reduction of the attached enolpyruvate to lactic acid. D13 has Orf7 in the Polysaccharide pyruvyl transferase family, but Orf5 belongs to a different reductase family, the Nitroreductase family (PF00881, E-value=3.9 × e−9). We propose that Orf7 is the transferase and Orf5 is the dehydrogenase. The reduction step creates an anomeric carbon, which for B17, D3 and D5 is in the R form and for D13 is in the S form. It is presumably for this reason that the dehydrogensases are from different families.
The pyruvyl transferases and dehydrogenases in B17 and D3, both with (R)-lac-4-β- d-Glcp, have identity levels of 33% and 25%, respectively. As expected the D13 enzyme for the reduction reaction has no similarity to those of B17 and D3 as D13 forms the (S) isomer whereas the others form the (R) isomer. The D5 pyruvyltransferase that transfers enolpyruvate to position 3 of dTDP-l-Rha has little or no similarity to the others adding enolpyruvate to position 4 of UDP-Glc or UDP-GlcNAc, but the reductases making the (R) isomer of the Rha-linked lactate has 19% and 16% identity to those making the Glc-linked lactate of B17 and D3 respectively. The patterns of similarity are consistent with our proposals, but all have lower identity levels than usual for genes for the same pathway within a species. This would be expected if the reactions occurred after transfer of the sugar to the growing oligosaccharide, but this is not likely as all substitutions are stoichiometric, whereas for O-acetylation, which is thought to generally occurs after the sugar is added to the oligosaccharide, is often nonstoichiometric. We have named the enolpyruvate transferase genes lat and the reductase genes lar. Both are provisional names but do serve to group genes with similar functions. It will require experimental data to establish the details of the pathways and the degree of specificity. As for the psb genes discussed above we have used numbers in place of letters for the different genes pending further analysis.
We have shown earlier that mutation in lat and lar of B17 do not block polymeriasation but lead to a different pattern of O-antigen chain lengths (Senchenkova et al., 2006), indicating that the addition of the glucolactilic acid is important for the later processing steps.
O antigen gene clusters shared by E. coli and Shigella
As discussed above E. coli and Shigella are in effect one species. Many Shigella O antigens cross react serologically with E. coli O antigens, and to facilitate the comparisons we have sequenced the relevant E. coli O antigen gene clusters to determine the extent of genetic similarity. The comparisons revealed that 18 Shigella type strains share their O antigen gene cluster with one found in a typical E. coli, and 3 Shigella type strains had O antigen gene clusters that are closely related to an E. coli gene cluster. The data are summarized in Table 4 and some examples (see Fig. 4) are discussed below.
Table 4. Summary of Shigella and Escherichia coli sharing the identical or closely related O antigen structures and gene clusters†
There is serological cross reaction between F2a and O13, and F4b and F5 are serologically identical to O135 and O129, respectively (Ewing, 1986), and the O antigen gene clusters of O13, O129 and O135 are identical to that of the flexneri group (Morona et al., 1994), except as noted below. The gene clusters have only two glycosyltransferase genes for three linkages, so presumably one is bifunctional, most probably generating the two l-Rhap-α (1→3) linkages. There is also an c. 3.2-kb segment downstream of wzy thought not to include any functional genes. This is supported by the observation that O135 has a deletion of 2.16 kb in this region (Fig. 4a). The antigenic diversity of flexneri serotypes is due to addition of glucosyl and/or O-acetyl groups on a common backbone structure, by transferases encoded by genes in temperate bacteriophages (Allison & Verma, 2000). Therefore, we suggest that the O135 and O129 structures are the same as those of F4b and F5, respectively, and that the same converting bacteriophages are present in the genomes, and also that the O13 structure has the same four sugar main chain but the modification may be different from those of the flexneri serotypes. It is also interesting that the nonfunctional region includes an Orf with about 45% identity to GtrA of F2a and others. GtrA is one of the genes involved in the addition of side branch glucose residues, and its presence indicates that at some time at least part of the pathway was present in the chromosome.
The D13 structure resembles that of the backbone for this group, but has an additional GlcNAc residue. Orf9 and ORf10 of D13 are 48% identical to WbgF and WbgG respectively of F2a, confirming the allocation of these genes to the three rhamnosyltransferase steps, and the particular biological repeat unit shown in Table 1 (there are two GlcNAc residues so otherwise ambiguity).
D3, O124 and O164
D3 and O124 are serologically identical (Ewing, 1986), and the O antigen structures are also identical (Table 1). Their O antigen gene clusters have the same nine genes with identity levels ranging from 99.4% to 100% (Fig. 4b). We also found that the O antigen of O164 is similar with the only difference being the presence of a glucose residue in place of glucolactilic acid. The role of wfeP in glucolactilic acid synthesis in O124/D3 is supported by the observation that there is a frame shift mutation in the O164 wfeP gene of an otherwise near identical gene cluster. The other significant difference is the insertion of an IS element into the 5′ end of the wzx gene of O164. The IS element (from positions 1109 to 2550) of O164 shared 99.7% identity to ISEc11 (Prosseda et al., 2006), which is a new insertion sequence of the IS1111 family. The insertion site of the IS was between bases 25 and 28 bases of the wzx gene, leading to loss of 11 amino acid residues at the 5′ end of the Wzx protein, but its function must be retained/or complemented.
D1 and O148
D1 is the most virulent serotype of Shigella and attracts special attention for frequency of epidemics, the severity of symptoms, high attack rate, high case-fatality rate, and complications during infection (Bennish et al., 1990). In a previous study, we determined the O antigen structure of O148, which differs from that of D1 by having a glucose residue in place of a galactose residue (Feng et al., 2007). The O antigen gene clusters of the two strains have the same organization and a high level of DNA identity (89.8–99.5%) (Fig. 4c), except that in D1 wbbG is interrupted by a deletion, and a galactosyltransferase gene wbbP, located on a plasmid, is responsible for the transfer of galactose to make a major antigenic epitope of the D1 O antigen. It seems clear that the D1 O antigen gene cluster is derived from the O148 gene cluster by loss of the proposed glucosyltransferase gene wbbG and gain of a plasmid-borne galactosyltransferase gene (Feng et al., 2007), which may make the D1 O antigen have a selective advantage in the intracellular environment. No serological cross-reactivity between D1 and O148 has been reported.
D6 and O130
As mentioned above, the D6 and O130 gene clusters are near identical with eight genes, and the only difference is that the wzy and wffH′ genes in D6 are in one ORF due to a single base deletion, which apparently blocks Wzy function (Fig. 4d). We suggest that the D6 form should be treated as a Wzy deficient form of O130, as it would be if found in a non-Shigella strain.
D9 and O40
The O40 structure differs from the D9 structure only in lacking the pyruvate acetal group (Table 1). The two O antigen gene clusters contain six genes. The O40 gene cluster differs from the D9 gene cluster in having an IS3 element inserted between wzx and wffR (Fig. 4e). The IS3 element, which is 1258 bp in length, shared 99.6% identity to that identified earlier (Timmerman & Tu, 1985), and two 39-bp inverted repeats at each end are still recognizable. The insertion of the IS3 element into the O40 antigen gene cluster lead to the loss of 12 amino acid residues at the 5′ end of wffR, which is a putative pyruvyltransferase gene. It appears that the gene is not expressed and this accounts for the lack of the pyruvate acetal. Again the difference is such that the D9 and O40 forms would be treated as variants of O40 if in the same species. However there does not seem to be any report of serological cross reaction.
B15, D2 and O112
Ewing showed that the B15 O antigen was serologically identical to that of O112ab, while the D2 O antigen was serologically identical to that of O112ac (Ewing, 1986). O112ab and O112ac are serologically related with common component ‘a’ significant enough for them to have been variants of the one serogroup. We found that B15 and O112ab have the same O antigen structure and gene cluster, but they are different from those of D2 and O112ac (Table 1 and Fig. 1), which are also identical, with the exception that the O112ac O antigen is devoid of O-acetyl groups. The O112ac/D2 O antigen differs from the O112ab/B15 O antigen, in having a d-Gal residue in place of l-IdoA in the main chain and has a pyruvate group. However, there is no similarity between the O antigen gene clusters of B15 and D2, and it is surprising that these two closely related O antigen structures are encoded by gene clusters without homology.
The O antigen gene clusters that are unique to Shigella
In this study, we found that 12 O antigen gene clusters (excluding SS) are unique to Shigella, and not related to other E. coli O antigen gene clusters. The data are summarized in Table 5 and examples are discussed below.
It has been reported that the O antigens of B2 and O87 are serologically closely related (Ewing, 1953), but there is no structure available for O87. However we found that the two gene clusters share no similarity (supplementary Table S2 and supplementary Fig. S1). The O antigen structures of B12 and O7 are very similar (Table 1), and serological cross reaction also has been reported (Valvano & Marolda, 1991), but again the O antigen gene cluster of B12 has no obvious similarity to that of O7 (Marolda et al., 1999), except for the sugar biosynthesis genes. The same situation is also found for D4/O168 and O159 (supplementary Table S3 and supplementary Fig. S1).
Two D8 antigen structures have been reported (Perepelov et al., 2008d). That in the strain studied by us recently is slightly different from the revised D8 O antigen structure in the strain studied in Russia earlier (replacement of one d-GlcNAc residue with d-Glc) (Table 1). Although we cannot sequence the O antigen gene cluster of the D8 strain studied earlier as export of Shigella strains from Russia is not possible, we suggest there is some difference in the relevant glycosyltransferase gene of two D8 strains.
O antigen gene cluster of SS
SS differs from the other Shigella strains in having a major deletion in the O antigen gene cluster between galF and gnd, and a functional gene cluster on a plasmid (Shepherd et al., 2000). The O unit contains l-AltNAcA and d-FucNAc4N, and an identical O antigen is found in Plesiomonas shigelloides. The high level of sequence similarity establishes that the SS O antigen is derived from P. shigelloides. Uniquely for Shigella the O antigen does not include GlcNAc or GalNAc, but the FucNAc4N residue is attached to the core showing that it is the initial sugar of the O unit (Gamian & Romanowska, 1982). The gene cluster has an initial transferase gene must be responsible for the transfer (Xu et al., 2002). An IS630 element (99.4% identical to another IS630 of SS (Matsutani et al., 1987)), is found between wzy and wbgV in SS O antigen gene cluster. IS630 is commonly found in SS (Tenzen et al., 1990; Yang et al., 2005). It was concluded that the SS O antigen gene cluster was most likely transferred to E. coli (in the broad sense) on the plasmid, and the IS630 element must have inserted during or after transfer of the plasmid from P. shigelloides (Shepherd et al., 2000). The insertion is between two genes in the middle of the operon and must have little if any effect on expression as the downstream genes have essential roles in O antigen synthesis.
This review covers the structure and DNA sequence data for all Shigella O antigen forms, highlighting aspects common to all or most, and areas of diversity. The review also elucidates the relationships between the O antigens of Shigella and E. coli, which is important for our understanding of the rapid expansion of Shigella O antigen diversity.
Shigella and enteroinvasive E. coli (EIEC) have the same mode of pathogenesis, that appears to be human specific as neither has been reported in natural infections of other species. However although EIEC and Shigella strains of E. coli have the same basic mode of pathogenesis, they cluster differently within E. coli on phylogenetic analysis (Lan et al., 2004). Shigella are the major cause of bacterial diarrhea in terms of disease severity and deaths, mostly of children in underdeveloped countries. The different forms are distinguished by their O antigens, and anti-O immunity gives good protection against that serovar where it has been tested.
Shigella and E. coli strains are now generally thought to be sufficiently similar to be placed in the same species, and analysis of sequence variation showed that most Shigella serotypes fall into three clusters within E. coli, with an additional five outliers in E. coli. The ancestors of each of the three clusters may or may not have had one of the O antigens now found in that cluster, but each of the other O antigens must have been transferred by recombination to give the diversity observed. Cluster 3 is different, as apart from B12, their O antigens are variants of a single basic structure, presumed to be shared by O13, O135 and O129, which have the same gene cluster, and that basic structure is treated here as a single Shigella O antigen form, the variation being due to phage encoded additions.
Earlier serological work had shown that of the 34 O antigen forms that we have treated as distinct, 13 overlap known E. coli O antigens, leaving 21 unique to Shigella clones (Ewing, 1986; Parolis et al., 1997). However now that we have structures and sequences for all 34 Shigella O antigens and likewise for the E. coli O antigens thought to be identical to or to cross react at high level with Shigella O antigens, we can say that 21 of the 34 Shigella O antigens and associated gene clusters are identical to, or closely related to, an E. coli form. Most Shigella strains sharing with O antigens identical to or related to forms found in typical E. coli, are in clusters 1 and 2. It seems that these clusters comprise groups of strains that have diversified antigenically mainly by changing O antigens by homologous recombination, while cluster 3 has diversified antigenically by gaining phage encoded modifications of the basic O antigen structure, also found in typical E. coli. It is interesting that 15 of the 19 O antigen forms found in cluster 1 have been found in non-Shigella E. coli strains, but only four of the eight in cluster 2, perhaps indicating a difference in the way the 2 clusters have diversified.
SS, D1, D8, D10, and B13 fell outside of the three major clusters. Perhaps they are recent lineages that have not had time to diversify in the manner of the cluster 1, 2 and 3 lineages, or their O antigens confer special properties that make them an integral part of the adaptations of these lineages.
It is worth mentioning that insertion elements can play a role in the diversification of O antigens, resulting in the inactivation of genes (in B6 and O40) or introduction of new genes into the O antigen gene cluster (in D9), both of which may lead to the formation of new O antigen forms. Also gene mutation (in D1, D6, and O164) may also lead to O antigen divergence. Such events are more common in Shigella strains as discussed in ‘Anomalies in the Shigella O antigen gene clusters’.
It is striking that with the exception of SS, O antigens unique to Shigella strains also fit into the general pattern for E. coli, although they have a higher proportion (eight of 12) of anomalies than those of other E. coli (45 of 148) or O antigens found in both E. coli and Shigella (seven of 21). All but the SS O antigen gene cluster are flanked by galF and gnd and must either be in E. coli, probably as an unidentified serovar, or perhaps transferred from a closely related species. In this regard it should be noted that the O antigens of S. enterica and Citrobacter freundii are also between galF and gnd, which would allow transfer by homologous recombination, and the same may apply to other related genera.
The second striking feature of Shigella O antigens is that those of the three Shigella serovars (F1–5, D1 and SS) that are seen as the major threat are all special in some way. The O antigen of SS, which as discussed above has originated from P. shigelloides, has a novel initiating sugar not found elsewhere in E. coli and a second sugar that is also only found in SS and P. shigelloides O17. The O antigens of D1 and flexneri are not in themselves atypical for E. coli, but the genetics indicates recent reorganization which is unusual. D1 is derived from an E. coli antigen (O148) but the change involved the recruitment of a new transferase gene, which is also on a plasmid, again a unique situation in E. coli. It also seems to have conferred a novel immunogenicity as no cross reaction has been reported. Finally the O antigens of F1–5 have diversified by addition of immunodominant side-branch glucose residues or O-acetyl groups. The diversification of flexneri to give such a wide range of antigens with limited cross protection is quite unusual, but it seems that three E. coli O antigens are also part of the flexneri group. None of the three Shigella serovars are in the two major clusters of Shigella strains (Pupo et al., 2000), as SS and D1 are two of the outliers in the tree, and as cluster 3 contains only B12 in addition to the 12 flexneri strains that we now treat as having one basic O antigen structure, F1–5 are also an outlier with respect to the vast majority of Shigella strains.
The observation that the major pathogenic serovars of Shigella have atypical O antigens, and the earlier finding (Pupo et al., 2000) that they are not closely related to the bulk of Shigella strains, which are in clusters 1 and 2, raised the question as to what extent the three are representative of Shigella, as most of the experimental work on pathogenesis has been done on D1, SS and F1–5, much of the latter on F2a. While it is known that all Shigella strains carry the virulence plasmid, such details as infectious dose and severity of disease do not seem to have been well studied outside of three major serovars. Indeed the often repeated statement that Shigella has a low infectious dose is based on a study involving only D1, SS and F2a (Dupont et al., 1989). The three major serovars have O antigens that appear to have been modified from E. coli antigens (D1 and F1–5) or recruited from outside of the species (SS). It may be that the serovars with more typical E. coli O antigens are reported less often because they are not as pathogenic as the big three, and perhaps should be treated differently in reporting the occurrence of Shigella. While we have not been able to find data on variation in severity of disease etc for most serovars, the significance of the three recognized as a major health problem is borne out by a recent report from WHO on needs and directions for Shigella vaccines (WHO, 2006), which discussed only D1, SS and flexneri.
The finding that D2 and B15 are not related, despite being identical to variants of O112, adds another O antigen to the Shigella list, bringing the total number of O antigens for E. coli in the broad sense to 190, comprising 166 E. coli O antigens (Lior, 1994), 16 O antigens unique to Shigella, three of them related to E. coli O antigens, and eight new E. coli O antigens (from O174 to O181) (Scheutz et al., 2004). The finding that so many of the Shigella O antigens are also present in E. coli does raise issues related to nomenclature that need to be addressed. It is well known that most Shigella strains are really E. coli and it is sensible to have a common name for O antigen structures present in both. In the case of Salmonella and Arizona strains, when Arizona was merged into Salmonella, the common O antigens were given the Salmonella name and those not seen in other Salmonella strains, at least at that time, were given new Salmonella numbers. This has not been done for Shigella and we propose as stated above that until such a scheme is in place, the common structure or gene cluster can be referred to as O124/D3 for example.
As discussed above having all of the structures and genes allows us to see the variation and particularly that of the three major pathogenic forms in perspective. It also allows experimental work on the role of O antigen in virulence as we now have the sequence information needed to replace any Shigella O antigen with any other, or indeed to make strains that express variants of an O antigen. There are many model systems to use in such work. Of course there will be new serotypes but it should be possible to keep up with the sequencing and structure work.
Vaccines consisting of carbohydrates coupled to a protein carrier have proven to be effective for the prevention of invasive bacterial disease. The exploration of glycoconjugate vaccines has been hindered largely by the technical difficulties in chemical preparation of oligosaccharides. The availability of structure and genetic data for all Shigella O antigens will be useful for efficient synthesis of the oligosaccharide, and could pave the way for the development of a well-structured glycoconjugate vaccine for Shigella.
We also suggest that consideration be given to integration of the E. coli and Shigella serotyping schemes. For the 18 O antigens identical in the Shigella and E. coli schemes there are two sera where one should suffice. Also if a pool for O antigens unique to Shigella strains were used on all untypable E. coli, we should soon learn if any had a Shigella O antigen not yet reported in non-Shigella E. coli strains. Perhaps most important is that as molecular methods are developed for O typing, the E. coli and Shigella schemes should be integrated, with the O antigens common to both included only once.
This work was supported by Tianjin Municipal Special Fund for Science and Technology Innovation Grant 05FZZDSH00800, the National Natural Science Foundation of China (NSFC) Key Programs Grants 30530010 and 20536040, NSFC General Program Grant 30670038, the Chinese National Science Fund for Distinguished Young Scholars (30788001), the National 863 program of China grants 2006AA020703 and 2006AA06Z409, the Russian Foundation for Basic Research (projects 05-04-48992 to A.V.P. and 05-04-39015 for Y.A.K.), grant of the Russian Science Support Foundation to A.V.P., the Council on Grants at the President of the Russian Federation for Support of Young Russian Scientists (project MK-157.2007.4 to A.V.P.).